WO2021164750A1 - Method and apparatus for convolutional layer quantization - Google Patents

Method and apparatus for convolutional layer quantization Download PDF

Info

Publication number
WO2021164750A1
WO2021164750A1 PCT/CN2021/076983 CN2021076983W WO2021164750A1 WO 2021164750 A1 WO2021164750 A1 WO 2021164750A1 CN 2021076983 W CN2021076983 W CN 2021076983W WO 2021164750 A1 WO2021164750 A1 WO 2021164750A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
neural network
convolutional neural
weight value
probability
Prior art date
Application number
PCT/CN2021/076983
Other languages
French (fr)
Chinese (zh)
Inventor
韩凯
杨朝晖
王云鹤
许春景
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021164750A1 publication Critical patent/WO2021164750A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a method and device for quantization of a convolutional layer.
  • the deep convolutional neural network has hundreds or even tens of millions of parameters after training, for example, the weight parameters and bias parameters included in the convolutional neural network model parameters, as well as the feature map parameters of each layer of convolutional layer, etc. . And the storage of model parameters and feature map parameters is based on 32 bits. Due to the large number of parameters and the large amount of data, the entire convolution calculation process needs to consume a large amount of storage and computing resources.
  • the development of deep convolutional neural networks is moving in the direction of "deeper, larger and more complex". As far as the model size of deep convolutional neural networks is concerned, it cannot be transplanted to mobile phones or embedded chips at all, even if you want to Through network transmission, high bandwidth occupancy rate often becomes a difficult problem for engineering realization.
  • the solution to reduce the complexity of the convolutional neural network without reducing the accuracy of the convolutional neural network is mainly realized by using a method of quantifying the parameters of the convolutional neural network.
  • the current quantization method uses a straight-through estimator (STE) to approximate the gradient of the network parameters, which is inaccurate, which will affect the update accuracy of the network parameters.
  • STE straight-through estimator
  • this application provides a convolutional layer quantization method, the method includes:
  • the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;
  • the image data is processed by the first convolutional neural network to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value satisfies Preset conditions to obtain a second convolutional neural network, where the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables
  • each probability value in the N probability values corresponds to a hidden variable
  • each probability value is based on the corresponding The hidden variable is calculated
  • the iteratively updating the weight value according to the target loss function includes:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient,
  • the preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the processing of the image data through the first convolutional neural network includes:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process.
  • the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the present application provides a method for quantizing a convolutional layer, the method including:
  • the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values, so Each probability value of the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value of the corresponding candidate quantized value, and the weight value is based on the N probability values and the The quantized expected value determined by the N candidate quantized values;
  • the network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables
  • each probability value in the N probability values corresponds to a hidden variable
  • each probability value is based on the corresponding The hidden variable is calculated
  • the iteratively updating the weight value according to the target loss function includes:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient,
  • the preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the feeding forward the first convolutional neural network includes:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is performed after the first feedforward process.
  • the preset function includes a first temperature coefficient
  • the preset function includes a second temperature coefficient
  • the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the present application provides a convolutional layer quantization device, the device includes:
  • the acquisition module is used to acquire image data, annotated values, a first convolutional neural network and N candidate quantized values.
  • the first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value.
  • the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight
  • the value is a quantization expected value determined according to the N probability values and the N candidate quantization values;
  • the training module is used to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value The difference between satisfies a preset condition, and a second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value,
  • the target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables
  • each probability value of the N probability values corresponds to a hidden variable
  • each probability value is based on the corresponding Calculated by hidden variables
  • the training module is specifically used for:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient,
  • the preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the training module is specifically used for:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process.
  • the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the present application provides a convolutional layer quantization device, the device includes:
  • An acquisition module for acquiring a first convolutional neural network and N candidate quantized values the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N
  • Each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N Probability values and quantized expected values determined by the N candidate quantized values;
  • the training module is used to feed forward the first convolutional neural network and iteratively update the weight value according to the target loss function until the target loss meets a preset condition to obtain a second convolutional neural network.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value,
  • the target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables
  • each probability value of the N probability values corresponds to a hidden variable
  • each probability value is based on the corresponding Calculated by hidden variables
  • the training module is specifically used for:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient,
  • the preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the training module is specifically used for:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is performed after the first feedforward process.
  • the preset function includes a first temperature coefficient
  • the preset function includes a second temperature coefficient
  • the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • an embodiment of the present application provides a neural network structure search device, which may include a memory, a processor, and a bus system, where the memory is used to store programs, and the processor is used to execute the programs in the memory to execute the above-mentioned One aspect and any optional method thereof or the second aspect and any optional method described above.
  • the embodiments of the present application provide a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the first aspect and any one thereof.
  • the embodiments of the present application provide a computer program that, when run on a computer, causes the computer to execute the above-mentioned first aspect and any of its optional methods or the above-mentioned second aspect and any of its optional methods. method.
  • the present application provides a chip system that includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing the data involved in the above methods; Or, information.
  • the chip system further includes a memory for storing program instructions and data necessary for the execution device or the training device.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • An embodiment of the present application provides a method for quantizing a convolutional layer.
  • the method includes acquiring image data, annotated values, a first convolutional neural network, and N candidate quantized values.
  • the first convolutional neural network includes a target volume.
  • Multilayer the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight
  • the value is the probability size of the corresponding candidate quantization value, and the weight value is the expected quantization value determined according to the N probability values and the N candidate quantization values; the image data is processed by the first convolutional neural network.
  • Processing is performed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values; weighting the updated weight value to obtain a third convolutional neural network,
  • the third convolutional neural network includes a target quantized value corresponding to the updated weight value, and the target quantized value is a candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • Figure 1 is a schematic diagram of a structure of the main frame of artificial intelligence
  • Figure 2 is a schematic diagram of a scenario of this application.
  • Figure 3 is a schematic diagram of a scenario of this application.
  • Figure 4 is a system architecture provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of a convolutional neural network provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of a convolutional neural network provided by an embodiment of the application.
  • FIG. 7 is a hardware structure of a chip provided by an embodiment of the application.
  • FIG. 8 is a schematic flowchart of a method for quantizing a convolutional layer provided by an example of this application.
  • FIG. 9 is a schematic diagram of the structure of a convolutional layer in training in an embodiment of the application.
  • FIG. 10 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of this application.
  • FIG. 11 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of this application.
  • FIG. 12 is a schematic flowchart of a method for quantizing a convolutional layer provided by an example of this application.
  • FIG. 13 is a schematic diagram of a structure of a convolutional layer quantization device provided by an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 15 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • Figure 1 shows a schematic diagram of the main framework of artificial intelligence.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • the "IT value chain” from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • basic platforms include distributed computing frameworks and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart medical care, autonomous driving, safe city, etc.
  • the embodiments of this application are mainly applied in fields such as driving assistance, automatic driving, and mobile phone terminals.
  • Application scenario 1 ADAS/ADS visual perception system
  • Application scenario 2 mobile phone beauty function
  • the mask and key points of the human body are detected through the neural network provided by the embodiments of the present application, and the corresponding parts of the human body can be zoomed in and out, such as waist reduction and buttocks operation, so as to output beauty picture of.
  • Application scenario 3 Image classification scenario:
  • the object recognition device After obtaining the image to be classified, the object recognition device adopts the object recognition method of this application to obtain the image to be classified;
  • the category of the object in the image to be classified can then be classified according to the object category of the object in the image to be classified.
  • photos For photographers, many photos are taken every day, including animals, people, and plants.
  • photos can be quickly classified according to the content of the photos, which can be divided into photos containing animals, photos containing people, and photos containing plants.
  • the manual classification method is relatively inefficient, and people are prone to fatigue when dealing with the same thing for a long time. At this time, the classification result will have a large error.
  • the object recognition device After the object recognition device obtains the image of the product, it then uses the object recognition method of the present application to obtain the category of the product in the image of the product, and then classifies the product according to the category of the product. For a wide variety of commodities in large shopping malls or supermarkets, the object recognition method of the present application can quickly complete the classification of commodities, reducing time and labor costs.
  • the neural network model trained in the embodiment of the present application can realize the above-mentioned functions.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep Neural Network can be understood as a neural network with many hidden layers. There is no special metric for "many” here. The essence of the multi-layer neural network and deep neural network we often say The above is the same thing. From the division of DNN according to the location of different layers, the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks complicated, it is not complicated as far as the work of each layer is concerned.
  • the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
  • the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as Note that the input layer has no W parameter.
  • more hidden layers make the network more capable of portraying complex situations in the real world.
  • a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • Convolutional Neural Network (Convosutionas Neuras Network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learning image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
  • the convolution kernel can be initialized in the form of a matrix of random size.
  • the convolution kernel can obtain reasonable weights through learning.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.
  • RNN Recurrent Neural Networks
  • RNN can process sequence data of any length.
  • the training of RNN is the same as the training of traditional CNN or DNN.
  • the neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • an embodiment of the present application provides a system architecture 100.
  • the data collection device 160 is used to collect training data.
  • the training data includes: the image or image block and the category of the object; and the training data is stored in the database 130, and the training device 120 is trained based on the training data maintained in the database 130 to obtain a CNN feature extraction model (explanation: the feature extraction model here is the model trained in the training phase described above, and may be a neural network for feature extraction, etc.).
  • the feature extraction model here is the model trained in the training phase described above, and may be a neural network for feature extraction, etc.
  • the CNN feature extraction model can be used to implement the neural network provided by the embodiment of the application, that is, the image or image block to be recognized After relevant preprocessing, input the CNN feature extraction model to obtain the 2D, 3D, Mask, key points and other information of the object of interest in the image or image block to be recognized.
  • the CNN feature extraction model in the embodiment of the present application may specifically be a CNN convolutional neural network.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily train the CNN feature extraction model completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as an implementation of this application. Limitations of examples.
  • the target model/rule trained by the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 4.
  • the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, or a notebook. Computers, augmented reality (AR) AR/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers or clouds.
  • the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data in the embodiment of the present application may include: an image to be recognized or an image block or a picture.
  • the execution device 120 may call the data storage system 150
  • the data, codes, etc. are used for corresponding processing, and the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150.
  • the I/O interface 112 returns the processing result, such as the 2D, 3D, Mask, key points and other information of the image or image block obtained above or the object of interest in the picture, to the client device 140 to provide it to the user.
  • processing result such as the 2D, 3D, Mask, key points and other information of the image or image block obtained above or the object of interest in the picture
  • the client device 140 may be a planning control unit in an automatic driving system or a beauty algorithm module in a mobile phone terminal.
  • the training device 120 can generate corresponding target models/rules based on different training data for different goals or different tasks, and the corresponding target models/rules can be used to achieve the above goals or complete the above tasks. , So as to provide users with the desired results.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140.
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data, and store it in the database 130 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • FIG. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • a CNN feature extraction model is obtained through training according to the training device 120.
  • the CNN feature extraction model may be a CNN convolutional neural network in this embodiment of the application, or may be a neural network that will be introduced in the following embodiment.
  • CNN is a very common neural network
  • the structure of CNN will be introduced in detail below in conjunction with Figure 5.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230.
  • the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained.
  • the convolutional layer/pooling layer 220 may include layers 221-226, for example: in an implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer. Layers, 224 is the pooling layer, 225 is the convolutional layer, and 226 is the pooling layer; in another implementation, 221 and 222 are the convolutional layers, 223 is the pooling layer, and 224 and 225 are the convolutional layers. Layer, 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 can include many convolution operators.
  • the convolution operator is also called a kernel. Its function in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the step size
  • the value of stride is processed to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same. During the convolution operation, the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple homogeneous matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image. Perform obfuscation and so on.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are merged to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions. .
  • the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features; with the convolutional neural network
  • the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 200 needs to use the neural network layer 230 to generate one or a group of required classes of output. Therefore, the neural network layer 230 can include multiple hidden layers (231, 232 to 23n as shown in FIG. 5) and an output layer 240. The parameters contained in the multiple hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
  • the output layer 240 After the multiple hidden layers in the neural network layer 230, that is, the final layer of the entire convolutional neural network 200 is the output layer 240.
  • the output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • a convolutional neural network (CNN) 200 may include an input layer 110, a convolutional layer/pooling layer 120 (the pooling layer is optional), and a neural network layer 130.
  • CNN convolutional neural network
  • FIG. 5 multiple convolutional layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 6 are parallel, and the respectively extracted features are input to the full neural network layer 130 for processing.
  • the convolutional neural network shown in FIG. 5 and FIG. 6 is only used as an example of two possible convolutional neural networks in the image processing method of the embodiment of this application. In specific applications, this application implements The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
  • the structure of the convolutional neural network obtained by the search method of the neural network structure of the embodiment of the present application may be as shown in the convolutional neural network structure in FIG. 5 and FIG. 6.
  • FIG. 7 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50.
  • the chip can be set in the execution device 110 as shown in FIG. 4 to complete the calculation work of the calculation module 111.
  • the chip can also be set in the training device 120 as shown in FIG. 4 to complete the training work of the training device 120 and output the target model/rule.
  • the algorithms of each layer in the convolutional neural network as shown in FIG. 5 and FIG. 6 can all be implemented in the chip as shown in FIG. 7.
  • Neural network processor NPU 50 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks.
  • the core part of the NPU is the arithmetic circuit 503.
  • the controller 504 controls the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 503 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 503 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 508.
  • the vector calculation unit 507 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 507 can store the processed output vector to the unified buffer 506.
  • the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 507 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network.
  • the unified memory 506 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 510 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.
  • An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504;
  • the controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.
  • the input data here in this application is a picture
  • the output data is information such as 2D, 3D, Mask, and key points of the object of interest in the picture.
  • the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all on-chip (On-Chip) memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory.
  • Memory double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.
  • the execution device 110 in FIG. 5 introduced above can execute the image processing method or each step of the image processing method of the embodiment of the present application.
  • the CNN model shown in FIG. 6 and FIG. 7 and the chip shown in FIG. 7 can also be used for Perform the image processing method or each step of the image processing method in the embodiment of the present application.
  • the image processing method of the embodiment of the present application and the image processing method of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • the embodiment of the present application provides a system architecture.
  • the system architecture includes a local device, a local device, an execution device and a data storage system, where the local device and the local device are connected to the execution device through a communication network.
  • the execution device can be implemented by one or more servers.
  • the execution device can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • Execution equipment can be arranged on one physical site or distributed on multiple physical sites.
  • the execution device may use the data in the data storage system or call the program code in the data storage system to implement the method for searching the neural network structure of the embodiment of the present application.
  • Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
  • Each user's local device can interact with the execution device through any communication mechanism/communication standard communication network.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination of them.
  • the local device and the local device obtain the relevant parameters of the target neural network from the execution device, deploy the target neural network on the local device, the local device, and use the target neural network for image classification or image processing, etc. .
  • the target neural network can be directly deployed on the execution device, and the execution device obtains the image to be processed from the local device and the local device, and classifies or performs other types of image processing on the image to be processed according to the target neural network.
  • the foregoing execution device may also be referred to as a cloud device. At this time, the execution device is generally deployed in the cloud.
  • FIG. 8 is a schematic flowchart of a convolutional layer quantization method provided by an example of this application.
  • the convolutional layer quantization method provided by this application includes:
  • each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on The expected quantization value determined by the N probability values and the N candidate quantization values.
  • the training device can obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values.
  • the first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes A weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value ,
  • the weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values.
  • a first convolutional neural network and N candidate quantized values ⁇ v 1 , v 2 ,..., v N ⁇ can be obtained.
  • the first convolutional neural network includes multiple convolutional layers, and the target The convolutional layer is one of multiple convolutional layers.
  • the weight matrix W corresponding to the target convolutional layer can include multiple weight values. It is set to quantize the weight values into N candidate quantized values ⁇ v 1 ,v 2 ,...,V N ⁇ , the probability that the target weight value belongs to the N candidate quantized values are:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The absolute value of the difference between the value and 1 is smaller. Taking the above probability as an example, in the iterative training process, the closer ⁇ is to 0, the closer one of the N probability values will be to 1.
  • the expected quantization value determined according to the N probability values and the N candidate quantization values may be used as the weight value and the input feature to perform a convolution operation, and the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the weight value will be used to perform convolution calculation with the input feature to obtain the output feature y q ;
  • the parameter to be trained in the existing quantization method is W
  • the parameter to be trained in the embodiment of the present application is W pi .
  • the weight value quantization process in the embodiment of the present application is a mapping from W pi to W q .
  • the mapping process is derivable, which solves the problem that the mapping process from the weight value to be trained to the quantized value in the traditional quantization process is not derivable. .
  • the derivative of W q can be directly obtained through the back-propagation algorithm, and then the parameter W pi can be trained.
  • the image data through the first convolutional neural network to obtain a detection result and a target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value are different If the difference meets the preset condition, a second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.
  • the training device may process the image data through the first convolutional neural network to obtain The detection result and the target loss are updated iteratively according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained, and the second convolution
  • the neural network includes updated weight values, and the updated weight values correspond to the updated N probability values.
  • the first convolutional neural network may be fed forward, and the weight value may be iteratively updated according to the target loss function, until the target loss satisfies a preset condition, and the second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.
  • the N hidden variables may be updated based on the loss function, and then the weight value may be updated.
  • the value of the temperature coefficient can be updated to make the temperature coefficient close to the preset value.
  • the temperature coefficient ⁇ can be gradually attenuated from a larger value (pre-set) to close to 0, so that N The probability value P i tends to 0 or 1, so that the candidate quantization value corresponding to P i close to 1 is used as the value to be quantized into the weight value.
  • the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value Is the candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • ⁇ v 1 , v 2 ,..., v N ⁇ corresponding to the maximum probability value can be used as the quantized weight value, namely:
  • W d can be used to perform convolution calculation with the input feature to get the output feature y d
  • each weight value in the weight matrix can be processed in the above-mentioned manner, and the updated weight value is weighted to obtain the third convolutional neural network.
  • FIG. 9 is a schematic diagram of the structure of a convolutional layer in training in an embodiment of the application. As shown in FIG. 9, by updating the value of the hidden variable, the probability value and the weight value are updated. The weight value is used to perform a convolution operation with the input feature to obtain the output feature.
  • FIG. 10 is a schematic diagram of the structure of the convolutional layer in an application in an embodiment of the application.
  • the quantized weight value obtained through training can be used to perform the convolution with the input feature.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used according to The first mean value and the first standard deviation of the output feature of the target convolutional layer perform a BN operation on the output feature of the target convolutional layer. That is, in the training process, the BN layer performs BN operations based on the mean and standard deviation of the output features of the convolutional layer in the current feedforward process.
  • M fourth convolutional neural networks are obtained, and each fourth convolutional neural network in the M fourth convolutional neural networks includes the updated Quantify the weight value of the updated weight included in the fourth convolutional neural network to obtain M fifth convolutional neural networks; for each fifth of the M fifth neural networks
  • the convolutional neural network performs feedforward to obtain M output features, and the second BN layer is used to update the third convolutional neural network according to the second mean and second standard deviation of the M output features
  • the output feature of the subsequent target convolutional layer is subjected to BN operation. That is, in the training process, the convolutional neural network obtained after each parameter update can be quantified to obtain the fourth convolutional neural network.
  • the BN layer is based on the output characteristics of each fourth convolutional neural network
  • the mean and standard deviation perform BN operations on the input features. It should be noted that the BN operation also needs to be based on the affine coefficients obtained during training. Regarding how to perform the BN operation, reference can be made to the description in the prior art, which will not be repeated here.
  • FIG. 11 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of the application. As shown in FIG. 11, the mean, standard deviation, and affine coefficients obtained through training can be used for sum input The feature performs BN operation to get the output feature.
  • An embodiment of the present application provides a method for quantizing a convolutional layer.
  • the method includes acquiring image data, annotated values, a first convolutional neural network, and N candidate quantized values.
  • the first convolutional neural network includes a target volume.
  • Multilayer the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight
  • the value is the probability size of the corresponding candidate quantization value, and the weight value is the expected quantization value determined according to the N probability values and the N candidate quantization values; the image data is processed by the first convolutional neural network.
  • Processing is performed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values; weighting the updated weight value to obtain a third convolutional neural network,
  • the third convolutional neural network includes a target quantized value corresponding to the updated weight value, and the target quantized value is a candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • FIG. 12 is a schematic flowchart of a convolutional layer quantization method provided by an example of this application.
  • the convolutional layer quantization method provided by this application includes:
  • the first convolutional neural network includes a target convolutional layer
  • the target convolutional layer includes a weight value
  • the weight value corresponds to the N probability values
  • Each probability value in the N probability values corresponds to a candidate quantized value
  • each probability value represents the probability of the weight value corresponding to the candidate quantized value
  • the weight value is based on the N probability values
  • the expected quantization value determined by the N candidate quantization values is based on the N probability values.
  • the product neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.
  • the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value Is the candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • the weight value may be updated by updating the N hidden variables according to a target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function
  • the preset function includes a temperature coefficient
  • the preset function satisfies the following conditions:
  • the multiple feedforward includes a first feedforward process and a second feedforward process, and the second feedforward process is performed in the first
  • the preset function includes a first temperature coefficient
  • the second feedforward process is performed on the first convolutional neural network
  • the preset function includes a second temperature coefficient
  • the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight
  • the updated weight value corresponds to the updated N probability values
  • the updated weight values included in the fourth convolutional neural network may also be quantified to obtain M fifth convolutional neural networks; Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • An embodiment of the present application provides a method for quantizing a convolutional layer.
  • the method includes obtaining a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, and the target
  • the convolutional layer includes weight values, the weight values corresponding to N probability values, each of the N probability values corresponds to a candidate quantization value, and each probability value indicates that the weight value corresponds to the candidate quantization value
  • the probability size of the value, the weight value is a quantized expected value determined according to the N probability values and the N candidate quantized values; feed forward the first convolutional neural network, and iteratively update according to the objective loss function For the weight value, until the target loss satisfies a preset condition, a second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated weight value. N probability values; weighting the updated weight value to obtain a third convolutional neural network, the third convolutional neural network including a target quantization value corresponding to the updated weight value, the The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  • FIG. 13 is a schematic structural diagram of a convolutional layer quantization apparatus 1300 according to an embodiment of the application.
  • the convolutional layer quantization apparatus 1300 may be a server, and the convolutional layer quantization apparatus 1300 includes:
  • the obtaining module 1301 is configured to obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values.
  • the first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value,
  • the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and
  • the weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values;
  • the training module 1302 is configured to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label The difference between the values satisfies a preset condition, and a second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the weight value quantization module 1303 is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value ,
  • the target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables
  • each probability value in the N probability values corresponds to a hidden variable
  • each probability value is calculated based on the corresponding hidden variable
  • the training module 1302 specifically used for:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function
  • the preset function includes a temperature coefficient
  • the preset function satisfies the following conditions:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process.
  • the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
  • each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value
  • the updated weight value corresponds to the updated N probability values
  • the weight value quantization module 1303 is further configured to:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the embodiment of the present application provides a convolutional layer quantization device 1300.
  • the acquisition module 1301 acquires image data, annotated values, a first convolutional neural network, and N candidate quantized values.
  • the first convolutional neural network includes a target convolution Layer, the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight value Is the probability size of the corresponding candidate quantized value, and the weight value is a quantized expected value determined according to the N probability values and the N candidate quantized values;
  • the training module 1302 uses the first convolutional neural network to analyze the The image data is processed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets the preset condition, and the second convolutional neural network is obtained,
  • the second convolutional neural network includes an updated weight value, and the updated weight value
  • the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned.
  • the quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.
  • the convolutional layer quantization apparatus 1300 may further include:
  • the obtaining module 1301 is configured to obtain a first convolutional neural network and N candidate quantized values.
  • the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;
  • the training module 1302 is used to feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function, until the target loss meets a preset condition to obtain a second convolutional neural network, so
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the weight value quantization module 1303 is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value ,
  • the target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and the training module , Specifically used for:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function
  • the preset function includes a temperature coefficient
  • the preset function satisfies the following conditions:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is performed after the first feedforward process.
  • the preset function includes a first temperature coefficient
  • the preset function includes a second temperature coefficient
  • the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is further used for:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the embodiment of the present application provides a convolutional layer quantization device 1300.
  • the acquisition module 1301 acquires a first convolutional neural network and N candidate quantization values.
  • the first convolutional neural network includes a target convolutional layer, and the target volume
  • the product layer includes a weight value, the weight value corresponding to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value indicates that the weight value is a corresponding candidate quantized value
  • the weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values; the training module 1302 feeds forward the first convolutional neural network, and according to the target loss function
  • the weight value is updated iteratively until the target loss satisfies a preset condition, and a second convolutional neural network is obtained.
  • the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated weight value.
  • the weight value quantization module 1303 weights the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes corresponding to the updated weight value
  • the target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  • FIG. 14 is a schematic structural diagram of the training device provided in an embodiment of the present application.
  • the training device is used to implement the function of the convolutional layer quantization device in the embodiment corresponding to FIG. 13.
  • the training device 1400 is implemented by one or more servers, and the training device 1400 may have relatively large differences due to different configurations or performance. It may include one or more central processing units (CPU) 1414 (for example, one or more processors) and memory 1432, and one or more storage media 1430 (for example, one or A storage device in Shanghai).
  • the memory 1432 and the storage medium 1430 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device. Furthermore, the central processing unit 1414 may be configured to communicate with the storage medium 1430, and execute a series of instruction operations in the storage medium 1430 on the training device 1400.
  • the training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, and one or more input and output interfaces 1458; or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • operating systems 1441 such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
  • the central processing unit 1414 is configured to execute the data processing method executed by the convolutional layer quantization device in the embodiment corresponding to FIG. 12.
  • the central processing unit 1414 can obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values.
  • the first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes A weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value ,
  • the weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values;
  • the image data is processed by the first convolutional neural network to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value satisfies Preset conditions to obtain a second convolutional neural network, where the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables
  • each probability value of the N probability values corresponds to a hidden variable
  • the central processing unit 1414 may execute:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function
  • the preset function includes a temperature coefficient
  • the preset function satisfies the following conditions:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process.
  • the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the method further includes:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the embodiments of the present application also provide a product including a computer program, which when running on a computer, causes the computer to execute the steps performed by the aforementioned training device.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a program for signal processing, and when it runs on a computer, the computer executes the following steps:
  • the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values, so Each probability value of the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value of the corresponding candidate quantized value, and the weight value is based on the N probability values and the The quantized expected value determined by the N candidate quantized values;
  • the network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
  • the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
  • the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and according to the target
  • the loss function iteratively updates the weight value, including:
  • the weight value is updated by updating the N hidden variables according to the target loss function.
  • each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function
  • the preset function includes a temperature coefficient
  • the preset function satisfies the following conditions:
  • the multiple feedforwards include a first feedforward process and a second feedforward process
  • the second feedforward process is performed after the first feedforward process.
  • the preset function includes a first temperature coefficient
  • the preset function includes a second temperature coefficient
  • the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  • the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
  • M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the method further includes:
  • the preset function is the following function:
  • P i is the i-th candidate probability values corresponding to the quantized values
  • the weight value is calculated based on the following method:
  • W q is a weight value
  • V i is the i-th quantized candidate values
  • P i is the i-th candidate value corresponding to the quantized values.
  • the execution device, training device, or terminal device provided by the embodiments of the present application may specifically be a chip.
  • the chip includes a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, Pins or circuits, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit to make the chip in the execution device execute the data processing method described in the foregoing embodiment, or to make the chip in the training device execute the data processing method described in the foregoing embodiment.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • Fig. 15 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • the Host CPU assigns tasks.
  • the core part of the NPU is the arithmetic circuit 1503.
  • the arithmetic circuit 1503 is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.
  • the arithmetic circuit 1503 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1503 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 1502 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the matrix A data and matrix B from the input memory 1501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 1508.
  • the unified memory 1506 is used to store input data and output data.
  • the weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 1505, and the DMAC is transferred to the weight memory 1502.
  • the input data is also transferred to the unified memory 1506 through the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 1509.
  • IFB instruction fetch buffer
  • the bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and is also used for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • BIU Bus Interface Unit
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1506 or to transfer the weight data to the weight memory 1502 or to transfer the input data to the input memory 1501.
  • the vector calculation unit 1507 includes multiple arithmetic processing units, and further processes the output of the arithmetic circuit if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. It is mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector calculation unit 1507 can store the processed output vector to the unified memory 1506.
  • the vector calculation unit 1507 may apply a linear function; or, apply a nonlinear function to the output of the arithmetic circuit 1503, for example, perform linear interpolation on the feature plane extracted by the convolutional layer, and for example, a vector of accumulated values to generate the activation value.
  • the vector calculation unit 1507 generates normalized values, pixel-level summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1503, for example for use in subsequent layers in a neural network.
  • the instruction fetch buffer 1509 connected to the controller 1504 is used to store instructions used by the controller 1504;
  • the unified memory 1506, the input memory 1501, the weight memory 1502, and the fetch memory 1509 are all On-Chip memories.
  • the external memory is private to the NPU hardware architecture.
  • the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of the present application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments of this application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, training device, or data.
  • the center transmits to another website, computer, training equipment, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

Provided is a method for convolutional layer quantization, applied to the field of artificial intelligence, comprising: obtaining image data, annotated values, a first convolutional neural network, and N candidate quantization values, the first convolutional neural network comprising a target convolutional layer, the target convolutional layer comprising weight values, the weight values corresponding to N probability values, each of the N probability values corresponding to a candidate quantization value, the weight values being quantization expected values determined according to the N probability values and the N candidate quantization values; processing the image data by means of the first convolutional neural network to obtain a second convolutional neural network, the second convolutional neural network comprising the updated weight values; performing weighting on the updated weight values to obtain a third convolutional neural network. The method can improve the update accuracy of network parameters.

Description

一种卷积层量化方法及其装置Method and device for quantizing convolutional layer
本申请要求于2020年02月21日提交中国国家知识产权局、申请号为202010109185.5、发明名称为“一种卷积层量化方法及其装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the State Intellectual Property Office of China, the application number is 202010109185.5, and the invention title is "a method and device for quantifying convolutional layers" on February 21, 2020, the entire content of which is by reference Incorporated in this application.
技术领域Technical field
本申请涉及人工智能领域,尤其涉及一种卷积层量化方法及其装置。This application relates to the field of artificial intelligence, and in particular to a method and device for quantization of a convolutional layer.
背景技术Background technique
深度卷积神经网络在训练完成后拥有几百甚至上千万的参数,例如,卷积神经网络模型参数中包括的权重参数和偏置参数,还有每一层卷积层的特征图参数等。并且模型参数和特征图参数的存储都是基于32位比特进行的。由于参数较多并且数据量较大,整个卷积计算过程需要消耗大量的存储和计算资源。而深度卷积神经网络的发展朝着“更深、更大、更复杂”的方向发展,就深度卷积神经网络的模型尺寸来说,根本无法移植到手机端或嵌入式芯片当中,就算是想通过网络传输,较高的带宽占用率也往往成为工程实现的难题。The deep convolutional neural network has hundreds or even tens of millions of parameters after training, for example, the weight parameters and bias parameters included in the convolutional neural network model parameters, as well as the feature map parameters of each layer of convolutional layer, etc. . And the storage of model parameters and feature map parameters is based on 32 bits. Due to the large number of parameters and the large amount of data, the entire convolution calculation process needs to consume a large amount of storage and computing resources. The development of deep convolutional neural networks is moving in the direction of "deeper, larger and more complex". As far as the model size of deep convolutional neural networks is concerned, it cannot be transplanted to mobile phones or embedded chips at all, even if you want to Through network transmission, high bandwidth occupancy rate often becomes a difficult problem for engineering realization.
目前,对于在不降低卷积神经网络精度的前提下降低卷积神经网络的复杂度的解决方案主要是利用对卷积神经网络的参数进行量化的方法实现。但是目前量化的方法使用直通估计器(straight through estimator,STE)来近似计算网络参数的梯度,这个梯度是不准确的,进而会影响网络参数的更新精度。At present, the solution to reduce the complexity of the convolutional neural network without reducing the accuracy of the convolutional neural network is mainly realized by using a method of quantifying the parameters of the convolutional neural network. However, the current quantization method uses a straight-through estimator (STE) to approximate the gradient of the network parameters, which is inaccurate, which will affect the update accuracy of the network parameters.
发明内容Summary of the invention
第一方面,本申请提供了一种卷积层量化方法,所述方法包括:In the first aspect, this application provides a convolutional layer quantization method, the method includes:
获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;Acquire image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;
通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The image data is processed by the first convolutional neural network to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value satisfies Preset conditions to obtain a second convolutional neural network, where the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
可选地,在第一方面的一种设计中,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述根据目标损失函数迭代更新所述权重值,包括:Optionally, in a design of the first aspect, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding The hidden variable is calculated, and the iteratively updating the weight value according to the target loss function includes:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,在第一方面的一种设计中,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述通过所述第一卷积神经网络对所述图像数据进行处理,包括:Optionally, in a design of the first aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the processing of the image data through the first convolutional neural network includes:
通过所述第一卷积神经网络对所述图像数据进行多次前馈处理,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,在第一方面的一种设计中,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, in a design of the first aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
可选地,在第一方面的一种设计中,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述方法还包括:Optionally, in a design of the first aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
可选地,在第一方面的一种设计中,所述预设函数为如下函数:Optionally, in a design of the first aspect, the preset function is the following function:
Figure PCTCN2021076983-appb-000001
Figure PCTCN2021076983-appb-000001
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,在第一方面的一种设计中,所述权重值为基于如下方式计算得到:Optionally, in a design of the first aspect, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000002
Figure PCTCN2021076983-appb-000002
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
第二方面,本申请提供了一种卷积层量化方法,所述方法包括:In the second aspect, the present application provides a method for quantizing a convolutional layer, the method including:
获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;Obtain a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values, so Each probability value of the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value of the corresponding candidate quantized value, and the weight value is based on the N probability values and the The quantized expected value determined by the N candidate quantized values;
对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到 所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;Feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function until the target loss satisfies a preset condition to obtain a second convolutional neural network, and the second convolutional neural network The network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
可选地,在第二方面的一种设计中,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述根据目标损失函数迭代更新所述权重值,包括:Optionally, in a design of the second aspect, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding The hidden variable is calculated, and the iteratively updating the weight value according to the target loss function includes:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,在第二方面的一种设计中,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述对所述第一卷积神经网络进行前馈,包括:Optionally, in a design of the second aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the feeding forward the first convolutional neural network includes:
对所述第一卷积神经网络进行多次前馈,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,在第二方面的一种设计中,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, in a design of the second aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
可选地,在第二方面的一种设计中,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述方法还包括:Optionally, in a design of the second aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
可选地,在第二方面的一种设计中,所述预设函数为如下函数:Optionally, in a design of the second aspect, the preset function is the following function:
Figure PCTCN2021076983-appb-000003
Figure PCTCN2021076983-appb-000003
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,在第二方面的一种设计中,所述权重值为基于如下方式计算得到:Optionally, in a design of the second aspect, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000004
Figure PCTCN2021076983-appb-000004
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化 值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
第三方面,本申请提供了一种卷积层量化装置,所述装置包括:In a third aspect, the present application provides a convolutional layer quantization device, the device includes:
获取模块,用于获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;The acquisition module is used to acquire image data, annotated values, a first convolutional neural network and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value. The weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight The value is a quantization expected value determined according to the N probability values and the N candidate quantization values;
训练模块,用于通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The training module is used to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value The difference between satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
权重值量化模块,用于对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。The weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value, The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
可选地,在第三方面的一种设计中,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述训练模块,具体用于:Optionally, in a design of the third aspect, the weight value corresponds to N hidden variables, each probability value of the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding Calculated by hidden variables, the training module is specifically used for:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,在第三方面的一种设计中,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述训练模块,具体用于:Optionally, in a design of the third aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the training module is specifically used for:
通过所述第一卷积神经网络对所述图像数据进行多次前馈处理,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,在第三方面的一种设计中,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, in a design of the third aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
可选地,在第三方面的一种设计中,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述权重值量化模块还用于:Optionally, in a design of the third aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特 征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
可选地,在第三方面的一种设计中,所述预设函数为如下函数:Optionally, in a design of the third aspect, the preset function is the following function:
Figure PCTCN2021076983-appb-000005
Figure PCTCN2021076983-appb-000005
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,在第三方面的一种设计中,所述权重值为基于如下方式计算得到:Optionally, in a design of the third aspect, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000006
Figure PCTCN2021076983-appb-000006
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
第四方面,本申请提供了一种卷积层量化装置,所述装置包括:In a fourth aspect, the present application provides a convolutional layer quantization device, the device includes:
获取模块,用于获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;An acquisition module for acquiring a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N Each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N Probability values and quantized expected values determined by the N candidate quantized values;
训练模块,用于对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The training module is used to feed forward the first convolutional neural network and iteratively update the weight value according to the target loss function until the target loss meets a preset condition to obtain a second convolutional neural network. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
权重值量化模块,用于对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。The weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value, The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
可选地,在第四方面的一种设计中,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述训练模块,具体用于:Optionally, in a design of the fourth aspect, the weight value corresponds to N hidden variables, each probability value of the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding Calculated by hidden variables, the training module is specifically used for:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,在第四方面的一种设计中,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述训练模块,具体用于:Optionally, in a design of the fourth aspect, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The smaller the absolute value of the difference between the value and 1, the training module is specifically used for:
对所述第一卷积神经网络进行多次前馈,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,在第四方面的一种设计中,所述第一卷积神经网络还包括:第一批归一化BN 层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, in a design of the fourth aspect, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and The first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
可选地,在第四方面的一种设计中,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述权重值量化模块还用于:Optionally, in a design of the fourth aspect, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each of the M fourth convolutional neural networks is The four-convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks; for each fifth convolutional neural network of the M fifth neural networks Perform feedforward to obtain M output features, and the second BN layer is used to update the updated target volume included in the third convolutional neural network according to the second mean and second standard deviation of the M output features The output feature of the multilayer is subjected to BN operation.
可选地,在第四方面的一种设计中,所述预设函数为如下函数:Optionally, in a design of the fourth aspect, the preset function is the following function:
Figure PCTCN2021076983-appb-000007
Figure PCTCN2021076983-appb-000007
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,在第四方面的一种设计中,所述权重值为基于如下方式计算得到:Optionally, in a design of the fourth aspect, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000008
Figure PCTCN2021076983-appb-000008
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
第五方面,本申请实施例提供了一种神经网络结构搜索装置,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,以执行如上述第一方面及其任一可选的方法或上述第二方面及其任一可选的方法。In the fifth aspect, an embodiment of the present application provides a neural network structure search device, which may include a memory, a processor, and a bus system, where the memory is used to store programs, and the processor is used to execute the programs in the memory to execute the above-mentioned One aspect and any optional method thereof or the second aspect and any optional method described above.
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及其任一可选的方法或上述第二方面及其任一可选的方法。In a sixth aspect, the embodiments of the present application provide a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the first aspect and any one thereof. Optional method or the above-mentioned second aspect and any optional method thereof.
第七方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面及其任一可选的方法或上述第二方面及其任一可选的方法。In the seventh aspect, the embodiments of the present application provide a computer program that, when run on a computer, causes the computer to execute the above-mentioned first aspect and any of its optional methods or the above-mentioned second aspect and any of its optional methods. method.
第八方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据;或,信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。In an eighth aspect, the present application provides a chip system that includes a processor for supporting an execution device or a training device to implement the functions involved in the above aspects, for example, sending or processing the data involved in the above methods; Or, information. In a possible design, the chip system further includes a memory for storing program instructions and data necessary for the execution device or the training device. The chip system can be composed of chips, and can also include chips and other discrete devices.
本申请实施例提供了一种卷积层量化方法,所述方法包括:获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更 新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。通过上述方式,将候选量化值的期望作为权重值,对量化值的概率分布进行学习,该量化过程是可导的,所以不需要通过使用STE来近似计算网络参数的导数,提高了网络参数的更新精度。An embodiment of the present application provides a method for quantizing a convolutional layer. The method includes acquiring image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target volume. Multilayer, the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight The value is the probability size of the corresponding candidate quantization value, and the weight value is the expected quantization value determined according to the N probability values and the N candidate quantization values; the image data is processed by the first convolutional neural network. Processing is performed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values; weighting the updated weight value to obtain a third convolutional neural network, The third convolutional neural network includes a target quantized value corresponding to the updated weight value, and the target quantized value is a candidate quantized value corresponding to the largest probability value among the updated N probability values. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.
附图说明Description of the drawings
图1为人工智能主体框架的一种结构示意图;Figure 1 is a schematic diagram of a structure of the main frame of artificial intelligence;
图2为本申请的一种场景示意图;Figure 2 is a schematic diagram of a scenario of this application;
图3为本申请的一种场景示意图;Figure 3 is a schematic diagram of a scenario of this application;
图4为本申请实施例提供的系统架构;Figure 4 is a system architecture provided by an embodiment of the application;
图5为本申请实施例提供的卷积神经网络的示意;FIG. 5 is a schematic diagram of a convolutional neural network provided by an embodiment of the application;
图6为本申请实施例提供的卷积神经网络的示意;FIG. 6 is a schematic diagram of a convolutional neural network provided by an embodiment of the application;
图7为本申请实施例提供的一种芯片的硬件结构;FIG. 7 is a hardware structure of a chip provided by an embodiment of the application;
图8为本申请示例提供的一种卷积层量化方法的流程示意;FIG. 8 is a schematic flowchart of a method for quantizing a convolutional layer provided by an example of this application;
图9为本申请实施例中一种训练中的卷积层的结构示意;FIG. 9 is a schematic diagram of the structure of a convolutional layer in training in an embodiment of the application;
图10为本申请实施例中一种应用中的卷积层的结构示意;FIG. 10 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of this application;
图11为本申请实施例中一种应用中的卷积层的结构示意;FIG. 11 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of this application;
图12为本申请示例提供的一种卷积层量化方法的流程示意;FIG. 12 is a schematic flowchart of a method for quantizing a convolutional layer provided by an example of this application;
图13为本申请实施例提供的卷积层量化装置的一种结构示意图;FIG. 13 is a schematic diagram of a structure of a convolutional layer quantization device provided by an embodiment of the application;
图14为本申请实施例提供的训练设备的一种结构示意图;FIG. 14 is a schematic structural diagram of a training device provided by an embodiment of the application;
图15为本申请实施例提供的芯片的一种结构示意图。FIG. 15 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
具体实施方式Detailed ways
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。The embodiments of the present invention will be described below in conjunction with the drawings in the embodiments of the present invention. The terms used in the embodiment of the present invention are only used to explain specific embodiments of the present invention, and are not intended to limit the present invention.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The embodiments of the present application will be described below in conjunction with the drawings. A person of ordinary skill in the art knows that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second", etc. in the description and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a way of distinguishing objects with the same attributes in the description of the embodiments of the present application. In addition, the terms "include" and "have" and any variations of them are intended to cover non-exclusive inclusion, so that a process, method, system, product or device containing a series of units is not necessarily limited to those units, but may include Listed or inherent to these processes, methods, products, or equipment.
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a schematic diagram of the main framework of artificial intelligence. (Vertical axis) Two dimensions explain the above-mentioned artificial intelligence theme framework. Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom". The "IT value chain" from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.
(1)基础设施(1) Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform. Communicate with the outside through sensors; computing capabilities are provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); basic platforms include distributed computing frameworks and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc. For example, sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies. The typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the above-mentioned data processing is performed on the data, some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart medical care, autonomous driving, safe city, etc.
本申请实施例主要应用在驾驶辅助、自动驾驶、手机终端等领域。The embodiments of this application are mainly applied in fields such as driving assistance, automatic driving, and mobile phone terminals.
下面介绍几种应用场景:Here are several application scenarios:
应用场景1:ADAS/ADS视觉感知系统Application scenario 1: ADAS/ADS visual perception system
如图2所示,在ADAS和ADS中,需要实时进行多类型的2D目标检测,包括:动态障碍物(行人(Pedestrian)、骑行者(Cyclist)、三轮车(Tricycle)、轿车(Car)、卡车(Truck)、公交 车(Bus)),静态障碍物(交通锥标(TrafficCone)、交通棍标(TrafficStick)、消防栓(FireHydrant)、摩托车(Motocycle)、自行车(Bicycle)),交通标志(TrafficSign、导向标志(GuideSign)、广告牌(Billboard)、红色交通灯(TrafficLight_Red)/黄色交通灯(TrafficLight_Yellow)/绿色交通灯(TrafficLight_Green)/黑色交通灯(TrafficLight_Black)、路标(RoadSign))。另外,为了准确获取动态障碍物的在3维空间所占的区域,还需要对动态障碍物进行3D估计,输出3D框。为了与激光雷达的数据进行融合,需要获取动态障碍物的Mask,从而把打到动态障碍物上的激光点云筛选出来;为了进行精确的泊车位,需要同时检测出泊车位的4个关键点;为了进行构图定位,需要检测出静态目标的关键点。这是一个语义分割问题。自动驾驶车辆的摄像头捕捉到道路画面,需要对画面进行分割,分出路面、路基、车辆、行人等不同物体,从而保持车辆行驶在正确的区域。对于安全型要求极高的自动驾驶需要实时对画面进行理解,能够实时运行的进行语义分割的卷积神经网络至关重要。As shown in Figure 2, in ADAS and ADS, real-time detection of multiple types of 2D targets is required, including: dynamic obstacles (Pedestrian, Cyclist, Tricycle, Car, Truck) (Truck), bus (Bus), static obstacles (TrafficCone, TrafficStick, FireHydrant, Motocycle, Bicycle), traffic signs ( TrafficSign, GuideSign, Billboard, TrafficLight_Red/TrafficLight_Yellow/TrafficLight_Green/TrafficLight_Black, RoadSign). In addition, in order to accurately obtain the area occupied by the dynamic obstacle in the 3-dimensional space, it is also necessary to perform a 3D estimation of the dynamic obstacle and output a 3D frame. In order to integrate with the lidar data, it is necessary to obtain the mask of the dynamic obstacle, so as to filter out the laser point cloud hitting the dynamic obstacle; in order to carry out accurate parking space, it is necessary to detect the 4 key points of the parking space at the same time ; In order to locate the composition, it is necessary to detect the key points of the static target. This is a semantic segmentation problem. The camera of the self-driving vehicle captures the road picture, and the picture needs to be segmented to separate different objects such as road surface, roadbed, vehicle, pedestrian, etc., so as to keep the vehicle driving in the correct area. For autonomous driving with extremely high safety requirements, it is necessary to understand the picture in real time, and a convolutional neural network that can run in real time for semantic segmentation is very important.
应用场景2:手机美颜功能Application scenario 2: mobile phone beauty function
如图3所示,在手机中,通过本申请实施例提供的神经网络检测出人体的Mask和关键点,可以对人体相应的部位进行放大缩小,比如进行收腰和美臀操作,从而输出美颜的图片。As shown in Figure 3, in the mobile phone, the mask and key points of the human body are detected through the neural network provided by the embodiments of the present application, and the corresponding parts of the human body can be zoomed in and out, such as waist reduction and buttocks operation, so as to output beauty picture of.
应用场景3:图像分类场景:Application scenario 3: Image classification scenario:
物体识别装置在获取待分类图像后,采用本申请的物体识别方法获取待分类图像;After obtaining the image to be classified, the object recognition device adopts the object recognition method of this application to obtain the image to be classified;
中的物体的类别,然后可根据待分类图像中物体的物体类别对待分类图像进行分类。对于摄影师来说,每天会拍很多照片,有动物的,有人物,有植物的。采用本申请的方法可以快速地将照片按照照片中的内容进行分类,可分成包含动物的照片、包含人物的照片和包含植物的照片。The category of the object in the image to be classified can then be classified according to the object category of the object in the image to be classified. For photographers, many photos are taken every day, including animals, people, and plants. Using the method of the present application, photos can be quickly classified according to the content of the photos, which can be divided into photos containing animals, photos containing people, and photos containing plants.
对于图像数量比较庞大的情况,人工分类的方式效率比较低下,并且人在长时间处理同一件事情时很容易产生疲劳感,此时分类的结果会有很大的误差。In the case of a relatively large number of images, the manual classification method is relatively inefficient, and people are prone to fatigue when dealing with the same thing for a long time. At this time, the classification result will have a large error.
应用场景4商品分类:Application Scenario 4 Commodity Classification:
物体识别装置获取商品的图像后,然后采用本申请的物体识别方法获取商品的图像中商品的类别,然后根据商品的类别对商品进行分类。对于大型商场或超市中种类繁多的商品,采用本申请的物体识别方法可以快速完成商品的分类,降低了时间开销和人工成本。After the object recognition device obtains the image of the product, it then uses the object recognition method of the present application to obtain the category of the product in the image of the product, and then classifies the product according to the category of the product. For a wide variety of commodities in large shopping malls or supermarkets, the object recognition method of the present application can quickly complete the classification of commodities, reducing time and labor costs.
应用场景5:入口闸机人脸验证Application Scenario 5: Face verification at entrance gates
这是一个图像相似度比对问题。在高铁、机场等入口的闸机上,乘客进行人脸认证时,摄像头会拍摄人脸图像,使用卷积神经网络抽取特征,和存储在系统中的身份证件的图像特征进行相似度计算,如果相似度高就验证成功。其中,卷积神经网络抽取特征是最耗时的,要快速进行人脸验证,需要高效的卷积神经网络进行特征提取。This is an image similarity comparison problem. At the gates at the entrances of high-speed railways and airports, when passengers perform face authentication, the camera will take facial images, use convolutional neural networks to extract features, and perform similarity calculations with the image features of ID documents stored in the system. If they are similar The verification is successful if the degree is high. Among them, the convolutional neural network is the most time-consuming feature extraction. To perform face verification quickly, an efficient convolutional neural network is required for feature extraction.
应用场景6:翻译机同声传译Application Scenario 6: Simultaneous Interpretation by Translator
这是一个语音识别和机器翻译问题。在语音识别和机器翻译问题上,卷积神经网络也是常有的一种识别模型。在需要同声传译的场景,必须达到实时语音识别并进行翻译,高效的卷积神经网络可以给翻译机带来更好的体验。This is a speech recognition and machine translation problem. In speech recognition and machine translation, convolutional neural networks are also a common recognition model. In scenes that require simultaneous interpretation, real-time speech recognition and translation must be achieved. An efficient convolutional neural network can bring a better experience to the translator.
本申请实施例训练出的神经网络模型可以实现上述功能。The neural network model trained in the embodiment of the present application can realize the above-mentioned functions.
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例涉及的相关术语及神经网络等相关概念进行介绍。Since the embodiments of the present application involve a large number of applications of neural networks, in order to facilitate understanding, the following first introduces related terms, neural networks and other related concepts involved in the embodiments of the present application.
(1)神经网络(1) Neural network
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:A neural network can be composed of neural units. A neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs. The output of the arithmetic unit can be:
Figure PCTCN2021076983-appb-000009
Figure PCTCN2021076983-appb-000009
其中,s=1、2、……n,n为大于1的自然数,Ws为xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。Among them, s=1, 2,...n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next convolutional layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.
(2)深度神经网络(2) Deep neural network
深度神经网络(Deep Neural Network,DNN),可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准,我们常说的多层神经网络和深度神经网络其本质上是同一个东西。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021076983-appb-000010
其中,
Figure PCTCN2021076983-appb-000011
是输入向量,
Figure PCTCN2021076983-appb-000012
是输出向量,
Figure PCTCN2021076983-appb-000013
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021076983-appb-000014
经过如此简单的操作得到输出向量
Figure PCTCN2021076983-appb-000015
由于DNN层数多,则系数W和偏移向量
Figure PCTCN2021076983-appb-000016
的数量也就是很多了。那么,具体的参数在DNN是如何定义的呢?首先我们来看看系数W的定义。以一个三层的DNN为例,如:第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021076983-appb-000017
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结下,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021076983-appb-000018
注意,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。
Deep Neural Network (DNN) can be understood as a neural network with many hidden layers. There is no special metric for "many" here. The essence of the multi-layer neural network and deep neural network we often say The above is the same thing. From the division of DNN according to the location of different layers, the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks complicated, it is not complicated as far as the work of each layer is concerned. Simply put, it is the following linear relationship expression:
Figure PCTCN2021076983-appb-000010
in,
Figure PCTCN2021076983-appb-000011
Is the input vector,
Figure PCTCN2021076983-appb-000012
Is the output vector,
Figure PCTCN2021076983-appb-000013
Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just the input vector
Figure PCTCN2021076983-appb-000014
After such a simple operation, the output vector is obtained
Figure PCTCN2021076983-appb-000015
Due to the large number of DNN layers, the coefficient W and the offset vector
Figure PCTCN2021076983-appb-000016
The number is also a lot. So, how are specific parameters defined in DNN? First, let's take a look at the definition of the coefficient W. Take a three-layer DNN as an example. For example, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
Figure PCTCN2021076983-appb-000017
The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4. In summary, the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as
Figure PCTCN2021076983-appb-000018
Note that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world. In theory, a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
(3)卷积神经网络(Convosutionas Neuras Network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器。该特征抽取器可以看作是滤波器,卷积过程可以看作是使用一个可训练的滤波器与一个输入的图像或者卷积特征平面(feature map)做卷积。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可 以理解为提取图像信息的方式与位置无关。这其中隐含的原理是:图像的某一部分的统计信息与其他部分是一样的。即意味着在某一部分学习的图像信息也能用在另一部分上。所以对于图像上的所有位置,我们都能使用同样的学习得到的图像信息。在同一卷积层中,可以使用多个卷积核来提取不同的图像信息,一般地,卷积核数量越多,卷积操作反映的图像信息越丰富。(3) Convolutional Neural Network (Convosutionas Neuras Network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be seen as a filter, and the convolution process can be seen as using a trainable filter to convolve with an input image or convolution feature map. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels. Sharing weight can be understood as the way of extracting image information has nothing to do with location. The underlying principle is that the statistical information of a certain part of the image is the same as that of other parts. This means that the image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learning image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more the number of convolution kernels, the richer the image information reflected by the convolution operation.
卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。The convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
(4)反向传播算法(4) Backpropagation algorithm
卷积神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的超分辨率模型中参数的大小,使得超分辨率模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的超分辨率模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的超分辨率模型的参数,例如权重矩阵。Convolutional neural networks can use backpropagation (BP) algorithms to modify the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial super-resolution model are updated by backpropagating the error loss information, so that the error loss is converged. The backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal super-resolution model parameters, such as a weight matrix.
(5)循环神经网络(recurrent neural networks,RNN)是用来处理序列数据的。在(5) Recurrent Neural Networks (RNN) are used to process sequence data. exist
传统的神经网络模型中,是从输入层到隐含层再到输出层,层与层之间是全连接的,而对于每一层层内之间的各个节点是无连接的。这种普通的神经网络虽然解决了很多难题,但是却仍然对很多问题无能无力。例如,你要预测句子的下一个单词是什么,一般需要用到前面的单词,因为一个句子中前后单词并不是独立的。RNN之所以称为循环神经网路,即一个序列当前的输出与前面的输出也有关。具体的表现形式为网络会对前面的信息进行记忆并应用于当前输出的计算中,即隐含层本层之间的节点不再无连接而是有连接的,并且隐含层的输入不仅包括输入层的输出还包括上一时刻隐含层的输出。理论上,RNN能够对任何长度的序列数据进行处理。对于RNN的训练和对传统的CNN或DNN的训练一样。In the traditional neural network model, from the input layer to the hidden layer and then to the output layer, the layers are fully connected, and the nodes in each layer are disconnected. Although this ordinary neural network has solved many problems, it is still powerless for many problems. For example, if you want to predict what the next word of a sentence will be, you generally need to use the previous word, because the preceding and following words in a sentence are not independent. The reason why RNN is called recurrent neural network is that the current output of a sequence is also related to the previous output. The specific form of expression is that the network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer are no longer unconnected but connected, and the input of the hidden layer includes not only The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as the training of traditional CNN or DNN.
既然已经有了卷积神经网络,为什么还要循环神经网络?原因很简单,在卷积神经网络中,有一个前提假设是:元素之间是相互独立的,输入与输出也是独立的,比如猫和狗。但现实世界中,很多元素都是相互连接的,比如股票随时间的变化,再比如一个人说了:我喜欢旅游,其中最喜欢的地方是云南,以后有机会一定要去。这里填空,人类应该都知道是填“云南”。因为人类会根据上下文的内容进行推断,但如何让机器做到这一步?RNN就应运而生了。RNN旨在让机器像人一样拥有记忆的能力。因此,RNN的输出就需要依赖当前的输入信息和历史的记忆信息。Now that there is a convolutional neural network, why do you need to recycle neural networks? The reason is simple. In a convolutional neural network, there is a premise that the elements are independent of each other, and the input and output are also independent, such as cats and dogs. But in the real world, many elements are interconnected, such as the changes in stocks over time, and another person said: I like to travel, and my favorite place is Yunnan, and I must go if I have the opportunity in the future. To fill in the blanks here, humans should all know that it means to fill in "Yunnan". Because humans will make inferences based on the content of the context, but how to make the machine do this step? RNN came into being. RNN aims to make machines have memory capabilities like humans. Therefore, the output of RNN needs to rely on current input information and historical memory information.
(6)损失函数(6) Loss function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之 间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training a deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the target value you really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make its prediction lower, and keep adjusting until the deep neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function or objective function, which is used to measure the difference between the predicted value and the target value. Important equation. Among them, take the loss function as an example. The higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing this loss as much as possible.
(7)反向传播算法(7) Backpropagation algorithm
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。The neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged. The back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
下面介绍本申请实施例提供系统架构。The following describes the system architecture provided by the embodiments of the present application.
参见图4,本申请实施例提供了一种系统架构100。如所述系统架构100所示,数据采集设备160用于采集训练数据,本申请实施例中训练数据包括:物体的图像或者图像块及物体的类别;并将训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到CNN特征提取模型(解释说明:这里的特征提取模型就是前面介绍的经训练阶段训练得到的模型,可以是用于特征提取的神经网络等)。下面将以实施例一更详细地描述训练设备120如何基于训练数据得到CNN特征提取模型,该CNN特征提取模型能够用于实现本申请实施例提供的神经网络,即,将待识别图像或图像块通过相关预处理后输入该CNN特征提取模型,即可得到待识别图像或图像块感兴趣物体的2D、3D、Mask、关键点等信息。本申请实施例中的CNN特征提取模型具体可以为CNN卷积神经网络。需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行CNN特征提取模型的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。Referring to FIG. 4, an embodiment of the present application provides a system architecture 100. As shown in the system architecture 100, the data collection device 160 is used to collect training data. In the embodiment of the application, the training data includes: the image or image block and the category of the object; and the training data is stored in the database 130, and the training device 120 is trained based on the training data maintained in the database 130 to obtain a CNN feature extraction model (explanation: the feature extraction model here is the model trained in the training phase described above, and may be a neural network for feature extraction, etc.). The following will describe in more detail how the training device 120 obtains the CNN feature extraction model based on the training data with the first embodiment. The CNN feature extraction model can be used to implement the neural network provided by the embodiment of the application, that is, the image or image block to be recognized After relevant preprocessing, input the CNN feature extraction model to obtain the 2D, 3D, Mask, key points and other information of the object of interest in the image or image block to be recognized. The CNN feature extraction model in the embodiment of the present application may specifically be a CNN convolutional neural network. It should be noted that in actual applications, the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices. In addition, it should be noted that the training device 120 does not necessarily train the CNN feature extraction model completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as an implementation of this application. Limitations of examples.
根据训练设备120训练得到的目标模型/规则可以应用于不同的系统或设备中,如应用于图4所示的执行设备110,所述执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)AR/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图4中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:待识别图像或者图像块或者图片。The target model/rule trained by the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 4. The execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, or a notebook. Computers, augmented reality (AR) AR/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers or clouds. In FIG. 4, the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices. The user can input data to the I/O interface 112 through the client device 140. The input data in the embodiment of the present application may include: an image to be recognized or an image block or a picture.
在执行设备120对输入数据进行预处理,或者在执行设备120的计算模块111执行计算等相关的处理(比如进行本申请中神经网络的功能实现)过程中,执行设备120可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。When the execution device 120 preprocesses the input data, or when the calculation module 111 of the execution device 120 executes calculations and other related processing (such as performing the function realization of the neural network in this application), the execution device 120 may call the data storage system 150 The data, codes, etc. are used for corresponding processing, and the data, instructions, etc. obtained by corresponding processing can also be stored in the data storage system 150.
最后,I/O接口112将处理结果,如上述得到的图像或图像块或者图片中感兴趣物体的2D、3D、Mask、关键点等信息返回给客户设备140,从而提供给用户。Finally, the I/O interface 112 returns the processing result, such as the 2D, 3D, Mask, key points and other information of the image or image block obtained above or the object of interest in the picture, to the client device 140 to provide it to the user.
可选地,客户设备140,可以是自动驾驶系统中的规划控制单元、手机终端中的美颜算法模块。Optionally, the client device 140 may be a planning control unit in an automatic driving system or a beauty algorithm module in a mobile phone terminal.
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则,该相应的目标模型/规则即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。It is worth noting that the training device 120 can generate corresponding target models/rules based on different training data for different goals or different tasks, and the corresponding target models/rules can be used to achieve the above goals or complete the above tasks. , So as to provide users with the desired results.
在图4中所示情况下,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。In the case shown in FIG. 4, the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112. In another case, the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140. The user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action. The client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data, and store it in the database 130 as shown in the figure. Of course, it is also possible not to collect through the client device 140, but the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure. The data is stored in the database 130.
值得注意的是,图4仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图4中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。It is worth noting that FIG. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 4, the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
如图4所示,根据训练设备120训练得到CNN特征提取模型,该CNN特征提取模型在本申请实施例中可以是CNN卷积神经网络也可以是下面实施例即将介绍的的神经网络。As shown in FIG. 4, a CNN feature extraction model is obtained through training according to the training device 120. The CNN feature extraction model may be a CNN convolutional neural network in this embodiment of the application, or may be a neural network that will be introduced in the following embodiment.
由于CNN是一种非常常见的神经网络,下面结合图5重点对CNN的结构进行详细的介绍。如上文的基础概念介绍所述,卷积神经网络是一种带有卷积结构的深度神经网络,是一种深度学习(deep learning)架构,深度学习架构是指通过机器学习的算法,在不同的抽象层级上进行多个层次的学习。作为一种深度学习架构,CNN是一种前馈(feed-forward)人工神经网络,该前馈人工神经网络中的各个神经元可以对输入其中的图像作出响应。Since CNN is a very common neural network, the structure of CNN will be introduced in detail below in conjunction with Figure 5. As mentioned in the introduction to the basic concepts above, a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture. The deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of. As a deep learning architecture, CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图5所示。在图5中,卷积神经网络(CNN)200可以包括输入层210,卷积层/池化层220(其中池化层为可选的),以及神经网络层230。其中,输入层210可以获取待处理图像,并将获取到的待处理图像交由卷积层/池化层220以及后面的神经网络层230进行处理,可以得到图像的处理结果。下面对图5中的CNN 200中内部的层结构进行详细的介绍。The structure of the neural network specifically used in the image processing method of the embodiment of the present application may be as shown in FIG. 5. In FIG. 5, a convolutional neural network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (where the pooling layer is optional), and a neural network layer 230. Among them, the input layer 210 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 220 and the subsequent neural network layer 230 for processing, and the processing result of the image can be obtained. The following describes the internal layer structure in CNN 200 in Figure 5 in detail.
卷积层/池化层220:Convolutional layer/pooling layer 220:
卷积层:Convolutional layer:
如图5所示卷积层/池化层220可以包括如示例221-226层,举例来说:在一种实现中,221层为卷积层,222层为池化层,223层为卷积层,224层为池化层,225为卷积层,226为池化层;在另一种实现方式中,221、222为卷积层,223为池化层,224、225为卷积层,226为池化层。即卷积层的输出可以作为随后的池化层的输入,也可以作为另一个卷积层的输入以继续进行卷积操作。As shown in FIG. 5, the convolutional layer/pooling layer 220 may include layers 221-226, for example: in an implementation, layer 221 is a convolutional layer, layer 222 is a pooling layer, and layer 223 is a convolutional layer. Layers, 224 is the pooling layer, 225 is the convolutional layer, and 226 is the pooling layer; in another implementation, 221 and 222 are the convolutional layers, 223 is the pooling layer, and 224 and 225 are the convolutional layers. Layer, 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
下面将以卷积层221为例,介绍一层卷积层的内部工作原理。The following will take the convolutional layer 221 as an example to introduce the internal working principle of a convolutional layer.
卷积层221可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图 像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长The convolution layer 221 can include many convolution operators. The convolution operator is also called a kernel. Its function in image processing is equivalent to a filter that extracts specific information from the input image matrix. The convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the step size
stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用多个尺寸(行×列)相同的权重矩阵,即多个同型矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度,这里的维度可以理解为由上面所述的“多个”来决定。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化等。该多个权重矩阵尺寸(行×列)相同,经过该多个尺寸相同的权重矩阵提取后的卷积特征图的尺寸也相同,再将提取到的多个尺寸相同的卷积特征图合并形成卷积运算的输出。The value of stride) is processed to complete the work of extracting specific features from the image. The size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same. During the convolution operation, the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row×column) are applied. That is, multiple homogeneous matrices. The output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image. Perform obfuscation and so on. The multiple weight matrices have the same size (row×column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are merged to form The output of the convolution operation.
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以用来从输入图像中提取信息,从而使得卷积神经网络200进行正确的预测。The weight values in these weight matrices need to be obtained through a lot of training in practical applications. Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions. .
当卷积神经网络200有多个卷积层的时候,初始的卷积层(例如221)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络200深度的加深,越往后的卷积层(例如226)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。When the convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features; with the convolutional neural network With the deepening of the network 200, the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
池化层:Pooling layer:
由于常常需要减少训练参数的数量,因此卷积层之后常常需要周期性的引入池化层,在如图5中220所示例的221-226各层,可以是一层卷积层后面跟一层池化层,也可以是多层卷积层后面接一层或多层池化层。在图像处理过程中,池化层的唯一目的就是减少图像的空间大小。池化层可以包括平均池化算子和/或最大池化算子,以用于对输入图像进行采样得到较小尺寸的图像。平均池化算子可以在特定范围内对图像中的像素值进行计算产生平均值作为平均池化的结果。最大池化算子可以在特定范围内取该范围内值最大的像素作为最大池化的结果。另外,就像卷积层中用权重矩阵的大小应该与图像尺寸相关一样,池化层中的运算符也应该与图像的大小相关。通过池化层处理后输出的图像尺寸可以小于输入池化层的图像的尺寸,池化层输出的图像中每个像素点表示输入池化层的图像的对应子区域的平均值或最大值。Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce a pooling layer after the convolutional layer. In the 221-226 layers as illustrated by 220 in Figure 5, it can be a convolutional layer followed by a layer. The pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers. In the image processing process, the sole purpose of the pooling layer is to reduce the size of the image space. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size. The average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling. The maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling. In addition, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
神经网络层230:Neural network layer 230:
在经过卷积层/池化层220的处理后,卷积神经网络200还不足以输出所需要的输出信息。因为如前所述,卷积层/池化层220只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或其他相关信息),卷积神经网络200需要利用神经网络层230来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层230中可以包括多层隐含层(如图5所示的231、232至23n)以及输出层240,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图 像识别,图像分类,图像超分辨率重建等等。After processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 200 needs to use the neural network layer 230 to generate one or a group of required classes of output. Therefore, the neural network layer 230 can include multiple hidden layers (231, 232 to 23n as shown in FIG. 5) and an output layer 240. The parameters contained in the multiple hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image super-resolution reconstruction and so on.
在神经网络层230中的多层隐含层之后,也就是整个卷积神经网络200的最后层为输出层240,该输出层240具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络200的前向传播(如图5由210至240方向的传播为前向传播)完成,反向传播(如图5由240至210方向的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络200的损失,及卷积神经网络200通过输出层输出的结果和理想结果之间的误差。After the multiple hidden layers in the neural network layer 230, that is, the final layer of the entire convolutional neural network 200 is the output layer 240. The output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error. Once the forward propagation of the entire convolutional neural network 200 (as shown in Figure 5, the propagation from 210 to 240 directions is forward propagation) is completed, the back propagation (as shown in Figure 5, the propagation from 240 to 210 directions is reverse propagation). Start to update the weight values and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 200 and the error between the output result of the convolutional neural network 200 through the output layer and the ideal result.
本申请实施例的图像处理方法具体采用的神经网络的结构可以如图6所示。在图6中,卷积神经网络(CNN)200可以包括输入层110,卷积层/池化层120(其中池化层为可选的),以及神经网络层130。与图5相比,图6中的卷积层/池化层120中的多个卷积层/池化层并行,将分别提取的特征均输入给全神经网络层130进行处理。The structure of the neural network specifically adopted by the image processing method of the embodiment of the present application may be as shown in FIG. 6. In FIG. 6, a convolutional neural network (CNN) 200 may include an input layer 110, a convolutional layer/pooling layer 120 (the pooling layer is optional), and a neural network layer 130. Compared with FIG. 5, multiple convolutional layers/pooling layers in the convolutional layer/pooling layer 120 in FIG. 6 are parallel, and the respectively extracted features are input to the full neural network layer 130 for processing.
需要说明的是,图5和图6所示的卷积神经网络仅作为一种本申请实施例的图像处理方法的两种可能的卷积神经网络的示例,在具体的应用中,本申请实施例的图像处理方法所采用的卷积神经网络还可以以其他网络模型的形式存在。It should be noted that the convolutional neural network shown in FIG. 5 and FIG. 6 is only used as an example of two possible convolutional neural networks in the image processing method of the embodiment of this application. In specific applications, this application implements The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
另外,采用本申请实施例的神经网络结构的搜索方法得到的卷积神经网络的结构可以如图5和图6中的卷积神经网络结构所示。In addition, the structure of the convolutional neural network obtained by the search method of the neural network structure of the embodiment of the present application may be as shown in the convolutional neural network structure in FIG. 5 and FIG. 6.
图7为本申请实施例提供的一种芯片的硬件结构,该芯片包括神经网络处理器50。该芯片可以被设置在如图4所示的执行设备110中,用以完成计算模块111的计算工作。该芯片也可以被设置在如图4所示的训练设备120中,用以完成训练设备120的训练工作并输出目标模型/规则。如图5和图6所示的卷积神经网络中各层的算法均可在如图7所示的芯片中得以实现。FIG. 7 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor 50. The chip can be set in the execution device 110 as shown in FIG. 4 to complete the calculation work of the calculation module 111. The chip can also be set in the training device 120 as shown in FIG. 4 to complete the training work of the training device 120 and output the target model/rule. The algorithms of each layer in the convolutional neural network as shown in FIG. 5 and FIG. 6 can all be implemented in the chip as shown in FIG. 7.
神经网络处理器NPU 50,NPU作为协处理器挂载到主中央处理器(central processing unit,CPU)(host CPU)上,由主CPU分配任务。NPU的核心部分为运算电路503,控制器504控制运算电路503提取存储器(权重存储器或输入存储器)中的数据并进行运算。Neural network processor NPU 50, NPU is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks. The core part of the NPU is the arithmetic circuit 503. The controller 504 controls the arithmetic circuit 503 to extract data from the memory (weight memory or input memory) and perform calculations.
在一些实现中,运算电路503内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路503是二维脉动阵列。运算电路503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路503是通用的矩阵处理器。In some implementations, the arithmetic circuit 503 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 503 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)508中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the corresponding data of matrix B from the weight memory 502 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches the matrix A data and matrix B from the input memory 501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 508.
向量计算单元507可以对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元507可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batch normalization),局部响应归一化(local response normalization)等。The vector calculation unit 507 can perform further processing on the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 507 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
在一些实现中,向量计算单元能507将经处理的输出的向量存储到统一缓存器506。例如,向量计算单元507可以将非线性函数应用到运算电路503的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元507生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路503的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 507 can store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a nonlinear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates a normalized value, a combined value, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 503, for example for use in a subsequent layer in a neural network.
统一存储器506用于存放输入数据以及输出数据。The unified memory 506 is used to store input data and output data.
权重数据直接通过存储单元访问控制器505(direct memory access controller,DMAC)将外部存储器中的输入数据搬运到输入存储器501和/或统一存储器506、将外部存储器中的权重数据存入权重存储器502,以及将统一存储器506中的数据存入外部存储器。The weight data directly transfers the input data in the external memory to the input memory 501 and/or the unified memory 506 through the storage unit access controller 505 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 502, And the data in the unified memory 506 is stored in the external memory.
总线接口单元(bus interface unit,BIU)510,用于通过总线实现主CPU、DMAC和取指存储器509之间进行交互。The bus interface unit (BIU) 510 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through the bus.
与控制器504连接的取指存储器(instruction fetch buffer)509,用于存储控制器504使用的指令;An instruction fetch buffer 509 connected to the controller 504 is used to store instructions used by the controller 504;
控制器504,用于调用指存储器509中缓存的指令,实现控制该运算加速器的工作过程。The controller 504 is used to call the instructions cached in the memory 509 to control the working process of the computing accelerator.
可选地,本申请中此处的输入数据为图片,输出数据为图片中感兴趣物体的2D、3D、Mask、关键点等信息。Optionally, the input data here in this application is a picture, and the output data is information such as 2D, 3D, Mask, and key points of the object of interest in the picture.
一般地,统一存储器506,输入存储器501,权重存储器502以及取指存储器509均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)、高带宽存储器(high bandwidth memory,HBM)或其他可读可写的存储器。Generally, the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 are all on-chip (On-Chip) memories. The external memory is a memory external to the NPU. The external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (HBM) or other readable and writable memory.
上文中介绍的图5中的执行设备110能够执行本申请实施例的图像处理方法或者图像处理方法的各个步骤,图6和图7所示的CNN模型和图7所示的芯片也可以用于执行本申请实施例的图像处理方法或者图像处理方法的各个步骤。下面结合附图对本申请实施例的图像处理方法和本申请实施例的图像处理方法进行详细的介绍。The execution device 110 in FIG. 5 introduced above can execute the image processing method or each step of the image processing method of the embodiment of the present application. The CNN model shown in FIG. 6 and FIG. 7 and the chip shown in FIG. 7 can also be used for Perform the image processing method or each step of the image processing method in the embodiment of the present application. The image processing method of the embodiment of the present application and the image processing method of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
本申请实施例提供了一种系统架构。该系统架构包括本地设备、本地设备以及执行设备和数据存储系统,其中,本地设备和本地设备通过通信网络与执行设备连接。The embodiment of the present application provides a system architecture. The system architecture includes a local device, a local device, an execution device and a data storage system, where the local device and the local device are connected to the execution device through a communication network.
执行设备可以由一个或多个服务器实现。可选的,执行设备可以与其它计算设备配合使用,例如:数据存储器、路由器、负载均衡器等设备。执行设备可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备可以使用数据存储系统中的数据,或者调用数据存储系统中的程序代码来实现本申请实施例的搜索神经网络结构的方法。The execution device can be implemented by one or more servers. Optionally, the execution device can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices. Execution equipment can be arranged on one physical site or distributed on multiple physical sites. The execution device may use the data in the data storage system or call the program code in the data storage system to implement the method for searching the neural network structure of the embodiment of the present application.
用户可以操作各自的用户设备(例如本地设备和本地设备)与执行设备进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。Users can operate their respective user equipment (for example, local equipment and local equipment) to interact with the execution equipment. Each local device can represent any computing device, such as personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, etc.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。Each user's local device can interact with the execution device through any communication mechanism/communication standard communication network. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination of them.
在一种实现方式中,本地设备、本地设备从执行设备获取到目标神经网络的相关参数, 将目标神经网络部署在本地设备、本地设备上,利用该目标神经网络进行图像分类或者图像处理等等。In one implementation, the local device and the local device obtain the relevant parameters of the target neural network from the execution device, deploy the target neural network on the local device, the local device, and use the target neural network for image classification or image processing, etc. .
在另一种实现中,执行设备上可以直接部署目标神经网络,执行设备通过从本地设备和本地设备获取待处理图像,并根据目标神经网络对待处理图像进行分类或者其他类型的图像处理。In another implementation, the target neural network can be directly deployed on the execution device, and the execution device obtains the image to be processed from the local device and the local device, and classifies or performs other types of image processing on the image to be processed according to the target neural network.
上述执行设备也可以称为云端设备,此时执行设备一般部署在云端。The foregoing execution device may also be referred to as a cloud device. At this time, the execution device is generally deployed in the cloud.
首先从训练侧对本申请提供的进行描述。图8所示的方法可以由卷积层量化装置来执行,该卷积层量化装置可以是电脑、服务器等。参照图8,图8为本申请示例提供的一种卷积层量化方法的流程示意,如图8示出的那样,本申请提供的卷积层量化方法包括:First, the description provided by this application will be described from the training side. The method shown in FIG. 8 may be executed by a convolutional layer quantization device, and the convolutional layer quantization device may be a computer, a server, or the like. Referring to FIG. 8, FIG. 8 is a schematic flowchart of a convolutional layer quantization method provided by an example of this application. As shown in FIG. 8, the convolutional layer quantization method provided by this application includes:
801、获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值。801. Obtain image data, a label value, a first convolutional neural network, and N candidate quantized values, where the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value Corresponding to N probability values, each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on The expected quantization value determined by the N probability values and the N candidate quantization values.
本申请实施例中,训练设备可以获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值。In the embodiment of the present application, the training device can obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes A weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value , The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values.
本申请实施例中,可以获取第一卷积神经网络以及N个候选量化值{v 1,v 2,…,v N},所述第一卷积神经网络包括多个卷积层,其中目标卷积层为多个卷积层中的一个,目标卷积层对应的权重矩阵W,其可以包括多个权重值,设定要将权重值量化为N个候选量化值{v 1,v 2,…,v N},目标权重值属于N个候选量化值的概率分别为: In the embodiment of the present application, a first convolutional neural network and N candidate quantized values {v 1 , v 2 ,..., v N } can be obtained. The first convolutional neural network includes multiple convolutional layers, and the target The convolutional layer is one of multiple convolutional layers. The weight matrix W corresponding to the target convolutional layer can include multiple weight values. It is set to quantize the weight values into N candidate quantized values {v 1 ,v 2 ,...,V N }, the probability that the target weight value belongs to the N candidate quantized values are:
Figure PCTCN2021076983-appb-000019
Figure PCTCN2021076983-appb-000019
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小。以上述概率为例,在迭代训练的过程,τ越接近于0,N个概率值中的一个概率值会越接近1。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ. The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the probability of one of the N probability values The absolute value of the difference between the value and 1 is smaller. Taking the above probability as an example, in the iterative training process, the closer τ is to 0, the closer one of the N probability values will be to 1.
在训练时,可以将根据所述N个概率值和所述N个候选量化值确定的量化期望值,作为权重值和输入特征进行卷积运算,所述权重值为基于如下方式计算得到:During training, the expected quantization value determined according to the N probability values and the N candidate quantization values may be used as the weight value and the input feature to perform a convolution operation, and the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000020
Figure PCTCN2021076983-appb-000020
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
该权重值会用来和输入特征进行卷积计算,得到输出特征y qThe weight value will be used to perform convolution calculation with the input feature to obtain the output feature y q ;
Figure PCTCN2021076983-appb-000021
Figure PCTCN2021076983-appb-000021
以二值神经网络为例,现有量化方法要训练的参数是W,本申请实施例中训练的参数为W pi。现有传统方法的量化过程为:W q=sign(W),这个过程在零点不可导,所以难以训练,因此通过使用直通估计器(straight through estimator,STE)来近似计算网络参数的梯度,这个梯度是不准确的,进而会影响网络参数的更新精度。本申请实施例中的权重值量化过程为从W pi到W q的映射,该映射过程是可导的,解决了传统量化过程中从待训练的权重值到量化值的映射过程不可导的问题。 Taking a binary neural network as an example, the parameter to be trained in the existing quantization method is W, and the parameter to be trained in the embodiment of the present application is W pi . The quantization process of the existing traditional method is: W q =sign(W). This process is not divisible at the zero point, so it is difficult to train. Therefore, the gradient of the network parameters is approximately calculated by using a straight through estimator (STE). The gradient is inaccurate, which will affect the update accuracy of the network parameters. The weight value quantization process in the embodiment of the present application is a mapping from W pi to W q . The mapping process is derivable, which solves the problem that the mapping process from the weight value to be trained to the quantized value in the traditional quantization process is not derivable. .
通过本申请实施例中的量化方法,W q的导数可以直接通过反向传播算法求出,然后训练参数W pi即可。 Through the quantization method in the embodiment of the present application, the derivative of W q can be directly obtained through the back-propagation algorithm, and then the parameter W pi can be trained.
802、通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值。802. Process the image data through the first convolutional neural network to obtain a detection result and a target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value are different If the difference meets the preset condition, a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.
本申请实施例中,在获取到图像数据、标注值、第一卷积神经网络以及N个候选量化值之后,训练设备可以通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值。In the embodiment of the present application, after acquiring image data, annotated values, a first convolutional neural network, and N candidate quantized values, the training device may process the image data through the first convolutional neural network to obtain The detection result and the target loss are updated iteratively according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained, and the second convolution The neural network includes updated weight values, and the updated weight values correspond to the updated N probability values.
本申请实施例中,可以对第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值。In the embodiment of the present application, the first convolutional neural network may be fed forward, and the weight value may be iteratively updated according to the target loss function, until the target loss satisfies a preset condition, and the second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.
本申请实施例中,在训练过程中,可以基于损失函数来更新N个隐藏变量,进而更新所述权重值。且训练过程中,可以更新温度系数的数值,以使得温度系数接近于预设值,例如,可以将温度系数τ从一个较大的值(预先设定)逐渐衰减到接近于0,这样N个概率值P i会趋向于0或1,从而将P i接近于1对应的候选量化值作为该权重值将要量化成的值。 In the embodiment of the present application, in the training process, the N hidden variables may be updated based on the loss function, and then the weight value may be updated. And during the training process, the value of the temperature coefficient can be updated to make the temperature coefficient close to the preset value. For example, the temperature coefficient τ can be gradually attenuated from a larger value (pre-set) to close to 0, so that N The probability value P i tends to 0 or 1, so that the candidate quantization value corresponding to P i close to 1 is used as the value to be quantized into the weight value.
803、对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。803. Perform weighting on the updated weight value to obtain a third convolutional neural network, where the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value Is the candidate quantized value corresponding to the largest probability value among the updated N probability values.
本申请实施例中,可以将最大概率值所对应的{v 1,v 2,…,v N}作为量化后的权重值,即: In the embodiment of the present application, {v 1 , v 2 ,..., v N } corresponding to the maximum probability value can be used as the quantized weight value, namely:
W d=∑ iv i(P 1=max(P 1,…,P N)); W d =∑ i v i (P 1 =max(P 1 ,..., P N ));
W d可以用来和输入特征进行卷积计算,得到输出特征y d W d can be used to perform convolution calculation with the input feature to get the output feature y d
Figure PCTCN2021076983-appb-000022
Figure PCTCN2021076983-appb-000022
本申请实施例中,可以通过上述方式对权重矩阵中的每个权重值进行处理,所述更新后的权重值进行权重量化,得到第三卷积神经网络。In the embodiment of the present application, each weight value in the weight matrix can be processed in the above-mentioned manner, and the updated weight value is weighted to obtain the third convolutional neural network.
参照图9,图9为本申请实施例中一种训练中的卷积层的结构示意,如图9中示出的那样,通过更新隐藏变量的值,进而更新概率值,进而更新权重值,权重值用于和输入特征进行卷积运算来得到输出特征。Referring to FIG. 9, FIG. 9 is a schematic diagram of the structure of a convolutional layer in training in an embodiment of the application. As shown in FIG. 9, by updating the value of the hidden variable, the probability value and the weight value are updated. The weight value is used to perform a convolution operation with the input feature to obtain the output feature.
参照图10,图10为本申请实施例中一种应用中的卷积层的结构示意,如图10中示出的那样,通过训练得到的量化后的权重值可以用于和输入特征进行卷积运算来得到输出特征。10, FIG. 10 is a schematic diagram of the structure of the convolutional layer in an application in an embodiment of the application. As shown in FIG. 10, the quantized weight value obtained through training can be used to perform the convolution with the input feature. Product operation to get the output characteristics.
本申请实施例中,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。即,在训练过程中,BN层是基于当前前馈过程中卷积层的输出特征的均值和标准差来进行BN运算的。In the embodiment of the present application, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used according to The first mean value and the first standard deviation of the output feature of the target convolutional layer perform a BN operation on the output feature of the target convolutional layer. That is, in the training process, the BN layer performs BN operations based on the mean and standard deviation of the output features of the convolutional layer in the current feedforward process.
本申请实施例中,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。即,在训练过程中,可以每次更新参数后得到的卷积神经网络进行量化,得到第四卷积神经网络,在应用过程中,BN层是基于各个第四卷积神经网络的输出特征的均值和标准差对输入特征进行BN运算。需要说明的是,BN运算还需要基于训练中获取到的仿射系数。关于如何进行BN运算可以参照现有技术中的描述,这里不再赘述。In the embodiment of the present application, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network in the M fourth convolutional neural networks includes the updated Quantify the weight value of the updated weight included in the fourth convolutional neural network to obtain M fifth convolutional neural networks; for each fifth of the M fifth neural networks The convolutional neural network performs feedforward to obtain M output features, and the second BN layer is used to update the third convolutional neural network according to the second mean and second standard deviation of the M output features The output feature of the subsequent target convolutional layer is subjected to BN operation. That is, in the training process, the convolutional neural network obtained after each parameter update can be quantified to obtain the fourth convolutional neural network. In the application process, the BN layer is based on the output characteristics of each fourth convolutional neural network The mean and standard deviation perform BN operations on the input features. It should be noted that the BN operation also needs to be based on the affine coefficients obtained during training. Regarding how to perform the BN operation, reference can be made to the description in the prior art, which will not be repeated here.
参照图11,图11为本申请实施例中一种应用中的卷积层的结构示意,如图11中示出的那样,通过训练得到的均值、标准差以及仿射系数可以用于和输入特征进行BN运算来得到输出特征。11, FIG. 11 is a schematic diagram of the structure of a convolutional layer in an application in an embodiment of the application. As shown in FIG. 11, the mean, standard deviation, and affine coefficients obtained through training can be used for sum input The feature performs BN operation to get the output feature.
本申请实施例提供了一种卷积层量化方法,所述方法包括:获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。通过上述方式,将候选量化值的期望作为权重值,对量化值的概率分布进行学习,该量化过程是可导的,所以不需要通过使用STE来近似计算网络参数的导数,提高了网络参数的更新精度。An embodiment of the present application provides a method for quantizing a convolutional layer. The method includes acquiring image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target volume. Multilayer, the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight The value is the probability size of the corresponding candidate quantization value, and the weight value is the expected quantization value determined according to the N probability values and the N candidate quantization values; the image data is processed by the first convolutional neural network. Processing is performed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values; weighting the updated weight value to obtain a third convolutional neural network, The third convolutional neural network includes a target quantized value corresponding to the updated weight value, and the target quantized value is a candidate quantized value corresponding to the largest probability value among the updated N probability values. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.
参照图12,图12为本申请示例提供的一种卷积层量化方法的流程示意,如图12示出的 那样,本申请提供的卷积层量化方法包括:Referring to FIG. 12, FIG. 12 is a schematic flowchart of a convolutional layer quantization method provided by an example of this application. As shown in FIG. 12, the convolutional layer quantization method provided by this application includes:
1201、获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值。1201. Obtain a first convolutional neural network and N candidate quantized values, where the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values , Each probability value in the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values And the expected quantization value determined by the N candidate quantization values.
1202、对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值。1202. Feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function, until the target loss meets a preset condition to obtain a second convolutional neural network, and the second convolutional neural network The product neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values.
1203、对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。1203. Perform weighting on the updated weight value to obtain a third convolutional neural network, where the third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value Is the candidate quantized value corresponding to the largest probability value among the updated N probability values.
可选地,可以通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。Optionally, the weight value may be updated by updating the N hidden variables according to a target loss function.
可选地,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,可以对所述第一卷积神经网络进行多次前馈,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The first convolutional neural network may be fed forward multiple times, wherein the multiple feedforward includes a first feedforward process and a second feedforward process, and the second feedforward process is performed in the first After the feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the second feedforward process is performed on the first convolutional neural network When the preset function includes a second temperature coefficient, the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
可选地,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,还可以对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight The updated weight value corresponds to the updated N probability values, and the updated weight values included in the fourth convolutional neural network may also be quantified to obtain M fifth convolutional neural networks; Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
可选地,所述预设函数为如下函数:Optionally, the preset function is the following function:
Figure PCTCN2021076983-appb-000023
Figure PCTCN2021076983-appb-000023
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,所述权重值为基于如下方式计算得到:Optionally, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000024
Figure PCTCN2021076983-appb-000024
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
本申请实施例提供了一种卷积层量化方法,所述方法包括:获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。通过上述方式,将候选量化值的期望作为权重值,对量化值的概率分布进行学习,该量化过程是可导的,所以不需要通过使用STE来近似计算网络参数的导数,提高了网络参数的更新精度。An embodiment of the present application provides a method for quantizing a convolutional layer. The method includes obtaining a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, and the target The convolutional layer includes weight values, the weight values corresponding to N probability values, each of the N probability values corresponds to a candidate quantization value, and each probability value indicates that the weight value corresponds to the candidate quantization value The probability size of the value, the weight value is a quantized expected value determined according to the N probability values and the N candidate quantized values; feed forward the first convolutional neural network, and iteratively update according to the objective loss function For the weight value, until the target loss satisfies a preset condition, a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated weight value. N probability values; weighting the updated weight value to obtain a third convolutional neural network, the third convolutional neural network including a target quantization value corresponding to the updated weight value, the The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.
在图1至图12所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图13,图13为本申请实施例提供的卷积层量化装置1300的一种结构示意图,卷积层量化装置1300可以是服务器,卷积层量化装置1300包括:On the basis of the embodiments corresponding to FIG. 1 to FIG. 12, in order to better implement the above solutions of the embodiments of the present application, related equipment for implementing the above solutions is also provided below. For details, refer to FIG. 13, which is a schematic structural diagram of a convolutional layer quantization apparatus 1300 according to an embodiment of the application. The convolutional layer quantization apparatus 1300 may be a server, and the convolutional layer quantization apparatus 1300 includes:
获取模块1301,用于获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;The obtaining module 1301 is configured to obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value, The weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values;
训练模块1302,用于通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The training module 1302 is configured to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label The difference between the values satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
权重值量化模块1303,用于对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。The weight value quantization module 1303 is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value , The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
可选地,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述训练模块1302,具体用于:Optionally, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and the training module 1302, specifically used for:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个 概率值与1的差值绝对值越小,所述训练模块1302,具体用于:Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The training module 1302 is specifically used for:
通过所述第一卷积神经网络对所述图像数据进行多次前馈处理,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
可选地,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述权重值量化模块1303还用于:Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module 1303 is further configured to:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
可选地,所述预设函数为如下函数:Optionally, the preset function is the following function:
Figure PCTCN2021076983-appb-000025
Figure PCTCN2021076983-appb-000025
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,所述权重值为基于如下方式计算得到:Optionally, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000026
Figure PCTCN2021076983-appb-000026
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
本申请实施例提供了一种卷积层量化装置1300,获取模块1301获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;训练模块1302通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;权重值量化模块1303对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。通过上述方式,将候选量化值的期望作为权重值,对量化值的概率分布进行学习,该 量化过程是可导的,所以不需要通过使用STE来近似计算网络参数的导数,提高了网络参数的更新精度。The embodiment of the present application provides a convolutional layer quantization device 1300. The acquisition module 1301 acquires image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolution Layer, the target convolutional layer includes a weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the weight value Is the probability size of the corresponding candidate quantized value, and the weight value is a quantized expected value determined according to the N probability values and the N candidate quantized values; the training module 1302 uses the first convolutional neural network to analyze the The image data is processed to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value meets the preset condition, and the second convolutional neural network is obtained, The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values; the weight value quantization module 1303 performs weighting on the updated weight value to obtain A third convolutional neural network, where the third convolutional neural network includes a target quantized value corresponding to the updated weight value, and the target quantized value is the largest probability value among the updated N probability values The corresponding candidate quantization value. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.
本申请实施例中,卷积层量化装置1300还可以包括:In the embodiment of the present application, the convolutional layer quantization apparatus 1300 may further include:
获取模块1301,用于获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;The obtaining module 1301 is configured to obtain a first convolutional neural network and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;
训练模块1302,用于对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The training module 1302 is used to feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function, until the target loss meets a preset condition to obtain a second convolutional neural network, so The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
权重值量化模块1303,用于对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。The weight value quantization module 1303 is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value , The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
可选地,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述训练模块,具体用于:Optionally, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and the training module , Specifically used for:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述训练模块,具体用于:Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The training module is specifically used for:
对所述第一卷积神经网络进行多次前馈,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
可选地,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述权重值量化模块还用于:Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is further used for:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks; for each fifth convolutional neural network of the M fifth neural networks Perform feedforward to obtain M output features, and the second BN layer is used to update the updated target volume included in the third convolutional neural network according to the second mean and second standard deviation of the M output features The output feature of the multilayer is subjected to BN operation.
可选地,所述预设函数为如下函数:Optionally, the preset function is the following function:
Figure PCTCN2021076983-appb-000027
Figure PCTCN2021076983-appb-000027
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,所述权重值为基于如下方式计算得到:Optionally, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000028
Figure PCTCN2021076983-appb-000028
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
本申请实施例提供了一种卷积层量化装置1300,获取模块1301获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;训练模块1302对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;权重值量化模块1303对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。通过上述方式,将候选量化值的期望作为权重值,对量化值的概率分布进行学习,该量化过程是可导的,所以不需要通过使用STE来近似计算网络参数的导数,提高了网络参数的更新精度。The embodiment of the present application provides a convolutional layer quantization device 1300. The acquisition module 1301 acquires a first convolutional neural network and N candidate quantization values. The first convolutional neural network includes a target convolutional layer, and the target volume The product layer includes a weight value, the weight value corresponding to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value indicates that the weight value is a corresponding candidate quantized value The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values; the training module 1302 feeds forward the first convolutional neural network, and according to the target loss function The weight value is updated iteratively until the target loss satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated weight value. After the N probability values; the weight value quantization module 1303 weights the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes corresponding to the updated weight value The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values. Through the above method, the expectation of the candidate quantization value is used as the weight value, and the probability distribution of the quantization value is learned. The quantization process is derivable, so there is no need to use STE to approximate the derivative of the network parameter, which improves the performance of the network parameter. Update accuracy.
本申请实施例还提供了一种训练设备,请参阅图14,图14是本申请实施例提供的训练设备的一种结构示意图,训练设备1400上可以部署有图13对应实施例中所描述的训练设备,用于实现图13对应实施例中卷积层量化装置的功能,具体的,训练设备1400由一个或多个服务器实现,训练设备1400可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1414(例如,一个或一个以上处理器)和存储器1432,一个或一个以上存储应用程序1442或数据1444的存储介质1430(例如一个或一个以上海量存储设备)。其中,存储器1432和存储介质1430可以是短暂存储或持久存储。存储在存储介质1430的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1414可以设置为与存储介质1430通信,在训练设备1400上执行存储介质1430中的一系列指令操作。An embodiment of the present application also provides a training device. Please refer to FIG. 14. FIG. 14 is a schematic structural diagram of the training device provided in an embodiment of the present application. The training device is used to implement the function of the convolutional layer quantization device in the embodiment corresponding to FIG. 13. Specifically, the training device 1400 is implemented by one or more servers, and the training device 1400 may have relatively large differences due to different configurations or performance. It may include one or more central processing units (CPU) 1414 (for example, one or more processors) and memory 1432, and one or more storage media 1430 (for example, one or A storage device in Shanghai). Among them, the memory 1432 and the storage medium 1430 may be short-term storage or persistent storage. The program stored in the storage medium 1430 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device. Furthermore, the central processing unit 1414 may be configured to communicate with the storage medium 1430, and execute a series of instruction operations in the storage medium 1430 on the training device 1400.
训练设备1400还可以包括一个或一个以上电源1426,一个或一个以上有线或无线网络接口1450,一个或一个以上输入输出接口1458;或,一个或一个以上操作系统1441,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。The training device 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, and one or more input and output interfaces 1458; or, one or more operating systems 1441, such as Windows ServerTM, Mac OS XTM , UnixTM, LinuxTM, FreeBSDTM and so on.
本申请实施例中,中央处理器1414,用于执行图12对应实施例中的卷积层量化装置执行的数据处理方法。In the embodiment of the present application, the central processing unit 1414 is configured to execute the data processing method executed by the convolutional layer quantization device in the embodiment corresponding to FIG. 12.
具体的,中央处理器1414,可以获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;Specifically, the central processing unit 1414 can obtain image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes A weight value, the weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value , The weight value is a quantization expected value determined according to the N probability values and the N candidate quantization values;
通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The image data is processed by the first convolutional neural network to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value satisfies Preset conditions to obtain a second convolutional neural network, where the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
可选地,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,中央处理器1414可以执行:Optionally, the weight value corresponds to N hidden variables, and each probability value of the N probability values corresponds to a hidden variable, and the central processing unit 1414 may execute:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,中央处理器1414可以执行:Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , The central processing unit 1414 can execute:
通过所述第一卷积神经网络对所述图像数据进行多次前馈处理,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
可选地,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述方法还包括:Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the method further includes:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
可选地,所述预设函数为如下函数:Optionally, the preset function is the following function:
Figure PCTCN2021076983-appb-000029
Figure PCTCN2021076983-appb-000029
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,所述权重值为基于如下方式计算得到:Optionally, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000030
Figure PCTCN2021076983-appb-000030
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述训练设备所执行的步骤。The embodiments of the present application also provide a product including a computer program, which when running on a computer, causes the computer to execute the steps performed by the aforementioned training device.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如下步骤:The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a program for signal processing, and when it runs on a computer, the computer executes the following steps:
获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;Obtain a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values, so Each probability value of the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value of the corresponding candidate quantized value, and the weight value is based on the N probability values and the The quantized expected value determined by the N candidate quantized values;
对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;Feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function until the target loss satisfies a preset condition to obtain a second convolutional neural network, and the second convolutional neural network The network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
可选地,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述根据目标损失函数迭代更新所述权重值,包括:Optionally, the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is calculated based on the corresponding hidden variable, and according to the target The loss function iteratively updates the weight value, including:
通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
可选地,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述对所述第一卷积神经网络进行前馈,包括:Optionally, each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, the preset function includes a temperature coefficient, and the preset function satisfies the following conditions: When performing the feedforward of the first convolutional neural network, the smaller the absolute value of the difference between the temperature coefficient and the preset value, the smaller the absolute value of the difference between one of the N probability values and 1 , Said feeding forward the first convolutional neural network includes:
对所述第一卷积神经网络进行多次前馈,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
可选地,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。Optionally, the first convolutional neural network further includes: a first batch of normalized BN layers, the first BN layer is connected to the target convolutional layer, and the first BN layer is used for The first mean value and the first standard deviation of the output feature of the target convolutional layer are subjected to a BN operation on the output feature of the target convolutional layer.
可选地,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权 重值对应于更新后的N个概率值,所述方法还包括:Optionally, after iteratively updating the weight value according to the target loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network of the M fourth convolutional neural networks includes the updated weight Value, the updated weight value corresponds to the updated N probability values, and the method further includes:
对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
可选地,所述预设函数为如下函数:Optionally, the preset function is the following function:
Figure PCTCN2021076983-appb-000031
Figure PCTCN2021076983-appb-000031
其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
可选地,所述权重值为基于如下方式计算得到:Optionally, the weight value is calculated based on the following method:
Figure PCTCN2021076983-appb-000032
Figure PCTCN2021076983-appb-000032
其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述实施例描述的数据处理方法,或者,以使训练设备内的芯片执行上述实施例描述的数据处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The execution device, training device, or terminal device provided by the embodiments of the present application may specifically be a chip. The chip includes a processing unit and a communication unit. The processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, Pins or circuits, etc. The processing unit can execute the computer-executable instructions stored in the storage unit to make the chip in the execution device execute the data processing method described in the foregoing embodiment, or to make the chip in the training device execute the data processing method described in the foregoing embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
具体的,请参阅图15,图15为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 1500,NPU 1500作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1503,通过控制器1504控制运算电路1503提取存储器中的矩阵数据并进行乘法运算。Specifically, please refer to Fig. 15. Fig. 15 is a schematic diagram of a structure of a chip provided by an embodiment of the application. On the CPU), the Host CPU assigns tasks. The core part of the NPU is the arithmetic circuit 1503. The arithmetic circuit 1503 is controlled by the controller 1504 to extract matrix data from the memory and perform multiplication operations.
在一些实现中,运算电路1503内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1503是二维脉动阵列。运算电路1503还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1503是通用的矩阵处理器。In some implementations, the arithmetic circuit 1503 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1503 is a two-dimensional systolic array. The arithmetic circuit 1503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1503 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1502中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1501中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1508中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the corresponding data of matrix B from the weight memory 1502 and caches it on each PE in the arithmetic circuit. The arithmetic circuit fetches the matrix A data and matrix B from the input memory 1501 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 1508.
统一存储器1506用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1505,DMAC被搬运到权重存储器1502 中。输入数据也通过DMAC被搬运到统一存储器1506中。The unified memory 1506 is used to store input data and output data. The weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 1505, and the DMAC is transferred to the weight memory 1502. The input data is also transferred to the unified memory 1506 through the DMAC.
BIU为Bus Interface Unit即,总线接口单元1510,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)1509的交互。The BIU is the Bus Interface Unit, that is, the bus interface unit 1510, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 1509.
总线接口单元1510(Bus Interface Unit,简称BIU),用于取指存储器1509从外部存储器获取指令,还用于存储单元访问控制器1505从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1510 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1509 to obtain instructions from the external memory, and is also used for the storage unit access controller 1505 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1506或将权重数据搬运到权重存储器1502中或将输入数据数据搬运到输入存储器1501中。The DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1506 or to transfer the weight data to the weight memory 1502 or to transfer the input data to the input memory 1501.
向量计算单元1507包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。The vector calculation unit 1507 includes multiple arithmetic processing units, and further processes the output of the arithmetic circuit if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. It is mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
在一些实现中,向量计算单元1507能将经处理的输出的向量存储到统一存储器1506。例如,向量计算单元1507可以将线性函数;或,非线性函数应用到运算电路1503的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1507生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1503的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 1507 can store the processed output vector to the unified memory 1506. For example, the vector calculation unit 1507 may apply a linear function; or, apply a nonlinear function to the output of the arithmetic circuit 1503, for example, perform linear interpolation on the feature plane extracted by the convolutional layer, and for example, a vector of accumulated values to generate the activation value. In some implementations, the vector calculation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 1503, for example for use in subsequent layers in a neural network.
控制器1504连接的取指存储器(instruction fetch buffer)1509,用于存储控制器1504使用的指令;The instruction fetch buffer 1509 connected to the controller 1504 is used to store instructions used by the controller 1504;
统一存储器1506,输入存储器1501,权重存储器1502以及取指存储器1509均为On-Chip存储器。外部存储器私有于该NPU硬件架构。The unified memory 1506, the input memory 1501, the weight memory 1502, and the fetch memory 1509 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述程序执行的集成电路。Among them, the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。In addition, it should be noted that the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate. The physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. In addition, in the drawings of the device embodiments provided in the present application, the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中, 如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments of this application method.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, training device, or data. The center transmits to another website, computer, training equipment, or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Claims (29)

  1. 一种卷积量化方法,其特征在于,所述方法包括:A method for convolutional quantization, characterized in that the method includes:
    获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;Acquire image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N probability values, each of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N probability values and the expected quantization value determined by the N candidate quantization values;
    通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The image data is processed by the first convolutional neural network to obtain the detection result and the target loss, and the weight value is iteratively updated according to the target loss function until the difference between the detection result and the label value satisfies Preset conditions to obtain a second convolutional neural network, where the second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
    对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
  2. 根据权利要求1所述的方法,其特征在于,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述根据目标损失函数迭代更新所述权重值,包括:The method according to claim 1, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding hidden variable. Obtained by variable calculation, the iteratively updating the weight value according to the target loss function includes:
    通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
  3. 根据权利要求2所述的方法,其特征在于,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述通过所述第一卷积神经网络对所述图像数据进行处理,包括:The method according to claim 2, wherein each of the N probability values is obtained by mapping the corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference from 1 is, the processing the image data through the first convolutional neural network includes:
    通过所述第一卷积神经网络对所述图像数据进行多次前馈处理,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。The method according to any one of claims 1 to 3, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  5. 根据权利要求4所述的方法,其特征在于,根据目标损失函数迭代更新所述权重值 后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述方法还包括:The method according to claim 4, characterized in that after iteratively updating the weight value according to the objective loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network in the M fourth convolutional neural networks is The convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:
    对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
    对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述预设函数为如下函数:The method according to any one of claims 1 to 5, wherein the preset function is the following function:
    Figure PCTCN2021076983-appb-100001
    Figure PCTCN2021076983-appb-100001
    其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述权重值为基于如下方式计算得到:The method according to any one of claims 1 to 6, wherein the weight value is calculated based on the following method:
    Figure PCTCN2021076983-appb-100002
    Figure PCTCN2021076983-appb-100002
    其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
  8. 一种卷积层量化方法,其特征在于,所述方法包括:A method for quantization of a convolutional layer, characterized in that the method includes:
    获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;Obtain a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to the N probability values, so Each probability value of the N probability values corresponds to a candidate quantized value, each probability value represents the probability of the weight value of the corresponding candidate quantized value, and the weight value is based on the N probability values and the The quantized expected value determined by the N candidate quantized values;
    对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;Feed forward the first convolutional neural network, and iteratively update the weight value according to the target loss function until the target loss satisfies a preset condition to obtain a second convolutional neural network, and the second convolutional neural network The network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
    对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。Perform weighting on the updated weight value to obtain a third convolutional neural network. The third convolutional neural network includes a target quantization value corresponding to the updated weight value, and the target quantization value is The candidate quantized value corresponding to the largest probability value among the updated N probability values.
  9. 根据权利要求8所述的方法,其特征在于,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述根据目标损失函数迭代更新所述权重值,包括:The method according to claim 8, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding hidden variable. Obtained by variable calculation, the iteratively updating the weight value according to the target loss function includes:
    通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
  10. 根据权利要求9所述的方法,其特征在于,所述N个概率值中的每个概率值为通过 将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述对所述第一卷积神经网络进行前馈,包括:The method according to claim 9, wherein each of the N probability values is obtained by mapping the corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference with 1, the feeding forward the first convolutional neural network includes:
    对所述第一卷积神经网络进行多次前馈,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  11. 根据权利要求8至10任一所述的方法,其特征在于,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。The method according to any one of claims 8 to 10, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  12. 根据权利要求11所述的方法,其特征在于,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述方法还包括:The method according to claim 11, characterized in that after iteratively updating the weight value according to the objective loss function, M fourth convolutional neural networks are obtained, and each fourth convolutional neural network in the M fourth convolutional neural networks is The convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values, and the method further includes:
    对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
    对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
  13. 根据权利要求8至12任一所述的方法,其特征在于,所述预设函数为如下函数:The method according to any one of claims 8 to 12, wherein the preset function is the following function:
    Figure PCTCN2021076983-appb-100003
    Figure PCTCN2021076983-appb-100003
    其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
  14. 根据权利要求8至13任一所述的方法,其特征在于,所述权重值为基于如下方式计算得到:The method according to any one of claims 8 to 13, wherein the weight value is calculated based on the following method:
    Figure PCTCN2021076983-appb-100004
    Figure PCTCN2021076983-appb-100004
    其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
  15. 一种卷积层量化装置,其特征在于,所述装置包括:A convolutional layer quantization device, characterized in that the device includes:
    获取模块,用于获取图像数据、标注值、第一卷积神经网络以及N个候选量化值,所 述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;The acquisition module is used to acquire image data, annotated values, a first convolutional neural network, and N candidate quantized values. The first convolutional neural network includes a target convolutional layer, and the target convolutional layer includes a weight value. The weight value corresponds to N probability values, each probability value in the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight The value is a quantization expected value determined according to the N probability values and the N candidate quantization values;
    训练模块,用于通过所述第一卷积神经网络对所述图像数据进行处理,得到检测结果和目标损失,根据目标损失函数迭代更新所述权重值,直到所述检测结果和所述标注值之间的差异满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The training module is used to process the image data through the first convolutional neural network to obtain the detection result and target loss, and iteratively update the weight value according to the target loss function until the detection result and the label value The difference between satisfies a preset condition, and a second convolutional neural network is obtained. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
    权重值量化模块,用于对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。The weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value, The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  16. 根据权利要求15所述的装置,其特征在于,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述训练模块,具体用于:The device according to claim 15, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on the corresponding hidden variable. Obtained by variable calculation, the training module is specifically used for:
    通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
  17. 根据权利要求16所述的装置,其特征在于,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述训练模块,具体用于:The device according to claim 16, wherein each probability value of the N probability values is obtained by mapping the corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference with 1, the training module is specifically used for:
    通过所述第一卷积神经网络对所述图像数据进行多次前馈处理,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforward processing on the image data through the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is After the first feedforward process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and the first convolutional neural network is performed on the first feedforward process. In the two feedforward process, the preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  18. 根据权利要求15至17任一所述的装置,其特征在于,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。The device according to any one of claims 15 to 17, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  19. 根据权利要求18所述的装置,其特征在于,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述权重值量化模块还用于:The device according to claim 18, wherein the weight value is iteratively updated according to the target loss function to obtain M fourth convolutional neural networks, and each fourth convolutional neural network in the M fourth convolutional neural networks The convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:
    对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
    对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
  20. 根据权利要求15至19任一所述的装置,其特征在于,所述预设函数为如下函数:The device according to any one of claims 15 to 19, wherein the preset function is the following function:
    Figure PCTCN2021076983-appb-100005
    Figure PCTCN2021076983-appb-100005
    其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
  21. 根据权利要求15至20任一所述的装置,其特征在于,所述权重值为基于如下方式计算得到:The device according to any one of claims 15 to 20, wherein the weight value is calculated based on the following method:
    Figure PCTCN2021076983-appb-100006
    Figure PCTCN2021076983-appb-100006
    其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
  22. 一种卷积层量化装置,其特征在于,所述装置包括:A convolutional layer quantization device, characterized in that the device includes:
    获取模块,用于获取第一卷积神经网络以及N个候选量化值,所述第一卷积神经网络包括目标卷积层,所述目标卷积层包括权重值,所述权重值对应于N个概率值,所述N个概率值中的每个概率值对应一个候选量化值,每个概率值表示所述权重值为对应的候选量化值的概率大小,所述权重值为根据所述N个概率值和所述N个候选量化值确定的量化期望值;An acquisition module for acquiring a first convolutional neural network and N candidate quantized values, the first convolutional neural network includes a target convolutional layer, the target convolutional layer includes a weight value, and the weight value corresponds to N Each probability value of the N probability values corresponds to a candidate quantized value, and each probability value represents the probability of the weight value corresponding to the candidate quantized value, and the weight value is based on the N Probability values and quantized expected values determined by the N candidate quantized values;
    训练模块,用于对所述第一卷积神经网络进行前馈,并根据目标损失函数迭代更新所述权重值,直到所述目标损失满足预设条件,得到第二卷积神经网络,所述第二卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值;The training module is used to feed forward the first convolutional neural network and iteratively update the weight value according to the target loss function until the target loss meets a preset condition to obtain a second convolutional neural network. The second convolutional neural network includes an updated weight value, and the updated weight value corresponds to the updated N probability values;
    权重值量化模块,用于对所述更新后的权重值进行权重量化,得到第三卷积神经网络,所述第三卷积神经网络包括与所述更新后的权重值对应的目标量化值,所述目标量化值为所述更新后的N个概率值中最大的概率值对应的候选量化值。The weight value quantization module is configured to perform weight quantization on the updated weight value to obtain a third convolutional neural network, and the third convolutional neural network includes a target quantization value corresponding to the updated weight value, The target quantization value is the candidate quantization value corresponding to the largest probability value among the updated N probability values.
  23. 根据权利要求22所述的装置,其特征在于,所述权重值对应于N个隐藏变量,所述N个概率值中的每个概率值对应一个隐藏变量,每个概率值为基于对应的隐藏变量计算得到的,所述训练模块,具体用于:The device according to claim 22, wherein the weight value corresponds to N hidden variables, each probability value in the N probability values corresponds to a hidden variable, and each probability value is based on a corresponding hidden variable. Obtained by variable calculation, the training module is specifically used for:
    通过根据目标损失函数更新所述N个隐藏变量来更新所述权重值。The weight value is updated by updating the N hidden variables according to the target loss function.
  24. 根据权利要求23所述的装置,其特征在于,所述N个概率值中的每个概率值为通过将对应的隐藏变量基于预设函数映射得到的,所述预设函数包括温度系数,所述预设函 数满足如下条件:在进行所述第一卷积神经网络的前馈时,所述温度系数与预设值的差值绝对值越小,所述N个概率值中的一个概率值与1的差值绝对值越小,所述训练模块,具体用于:The device according to claim 23, wherein each of the N probability values is obtained by mapping a corresponding hidden variable based on a preset function, and the preset function includes a temperature coefficient, so The preset function satisfies the following condition: when the feedforward of the first convolutional neural network is performed, the smaller the absolute value of the difference between the temperature coefficient and the preset value, one of the N probability values The smaller the absolute value of the difference with 1, the training module is specifically used for:
    对所述第一卷积神经网络进行多次前馈,其中,所述多次前馈包括第一前馈过程和第二前馈过程,所述第二前馈过程在所述第一前馈过程之后,在对所述第一卷积神经网络进行第一前馈过程时,所述预设函数包括第一温度系数,在对所述第一卷积神经网络进行第二前馈过程时,所述预设函数包括第二温度系数,所述第二温度系数与预设值的差值绝对值小于所述第一温度系数与预设值的差值绝对值。Perform multiple feedforwards on the first convolutional neural network, where the multiple feedforwards include a first feedforward process and a second feedforward process, and the second feedforward process is performed after the first feedforward process. After the process, when the first feedforward process is performed on the first convolutional neural network, the preset function includes a first temperature coefficient, and when the second feedforward process is performed on the first convolutional neural network, The preset function includes a second temperature coefficient, and the absolute value of the difference between the second temperature coefficient and the preset value is smaller than the absolute value of the difference between the first temperature coefficient and the preset value.
  25. 根据权利要求22至24任一所述的装置,其特征在于,所述第一卷积神经网络还包括:第一批归一化BN层,所述第一BN层与所述目标卷积层连接,所述第一BN层用于根据所述目标卷积层的输出特征的第一均值和第一标准差对所述目标卷积层的输出特征进行BN运算。The device according to any one of claims 22 to 24, wherein the first convolutional neural network further comprises: a first batch of normalized BN layers, the first BN layer and the target convolutional layer Connected, the first BN layer is used to perform a BN operation on the output feature of the target convolutional layer according to the first mean value and the first standard deviation of the output feature of the target convolutional layer.
  26. 根据权利要求25所述的装置,其特征在于,根据目标损失函数迭代更新所述权重值后得到M个第四卷积神经网络,所述M个第四卷积神经网络中的每个第四卷积神经网络包括更新后的权重值,所述更新后的权重值对应于更新后的N个概率值,所述权重值量化模块还用于:The device according to claim 25, wherein the weight values are updated iteratively according to the target loss function to obtain M fourth convolutional neural networks, and each fourth convolutional neural network in the M fourth convolutional neural networks The convolutional neural network includes an updated weight value, the updated weight value corresponds to the updated N probability values, and the weight value quantization module is also used for:
    对第四卷积神经网络包括的更新后的权重值进行权重值量化,得到M个第五卷积神经网络;Perform weight value quantization on the updated weight value included in the fourth convolutional neural network to obtain M fifth convolutional neural networks;
    对所述M个所述第五神经网络中的每个第五卷积神经网络进行前馈,得到M个输出特征,所述第二BN层用于根据所述M个输出特征的第二均值和第二标准差对所述第三卷积神经网络包括的更新后的目标卷积层的输出特征进行BN运算。Feed forward each fifth convolutional neural network in the M fifth neural networks to obtain M output features, and the second BN layer is used to obtain the second mean value of the M output features And the second standard deviation to perform a BN operation on the updated output feature of the target convolutional layer included in the third convolutional neural network.
  27. 根据权利要求22至26任一所述的装置,其特征在于,所述预设函数为如下函数:The device according to any one of claims 22 to 26, wherein the preset function is the following function:
    Figure PCTCN2021076983-appb-100007
    Figure PCTCN2021076983-appb-100007
    其中,所述P i为第i个候选量化值对应的概率值,所述W pi为与所述第i个候选量化值对应的隐藏变量,所述τ为温度系数。 Wherein P i is the i-th candidate probability values corresponding to the quantized values, and the W pi of the i-th quantized candidate values corresponding to the hidden variables, the temperature coefficient τ.
  28. 根据权利要求22至27任一所述的装置,其特征在于,所述权重值为基于如下方式计算得到:The device according to any one of claims 22 to 27, wherein the weight value is calculated based on the following method:
    Figure PCTCN2021076983-appb-100008
    Figure PCTCN2021076983-appb-100008
    其中,所述W q为所述权重值,所述v i为第i个候选量化值,所述P i为第i个候选量化值对应的概率值。 Wherein said W q is a weight value, V i is the i-th quantized candidate values, the probability P i is the i-th candidate value corresponding to the quantized values.
  29. 一种计算机可读存储介质,用于存储计算机程序,其特征在于,所述计算机程序用于执行根据权利要求1至14中任一项所述的卷积层量化的方法的指令。A computer-readable storage medium for storing a computer program, wherein the computer program is used to execute instructions of the method for quantizing a convolutional layer according to any one of claims 1 to 14.
PCT/CN2021/076983 2020-02-21 2021-02-20 Method and apparatus for convolutional layer quantization WO2021164750A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010109185.5A CN111368972B (en) 2020-02-21 2020-02-21 Convolutional layer quantization method and device
CN202010109185.5 2020-02-21

Publications (1)

Publication Number Publication Date
WO2021164750A1 true WO2021164750A1 (en) 2021-08-26

Family

ID=71208314

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/076983 WO2021164750A1 (en) 2020-02-21 2021-02-20 Method and apparatus for convolutional layer quantization

Country Status (2)

Country Link
CN (1) CN111368972B (en)
WO (1) WO2021164750A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471398A (en) * 2022-08-31 2022-12-13 北京科技大学 Image super-resolution method, system, terminal device and storage medium
CN116739050A (en) * 2022-09-30 2023-09-12 荣耀终端有限公司 Cross-layer equalization optimization method, device and storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368972B (en) * 2020-02-21 2023-11-10 华为技术有限公司 Convolutional layer quantization method and device
CN112257858B (en) * 2020-09-21 2024-06-14 华为技术有限公司 Model compression method and device
WO2022190195A1 (en) 2021-03-09 2022-09-15 日本電気株式会社 Information processing system, encoding device, decoding device, model learning device, information processing method, encoding method, decoding method, model learning method, and program storage medium
TWI764628B (en) * 2021-03-18 2022-05-11 英業達股份有限公司 Classification system and method of information in image
CN112949599B (en) * 2021-04-07 2022-01-14 青岛民航凯亚系统集成有限公司 Candidate content pushing method based on big data
CN113570033B (en) * 2021-06-18 2023-04-07 北京百度网讯科技有限公司 Neural network processing unit, neural network processing method and device
CN116681110B (en) * 2022-10-24 2024-05-14 荣耀终端有限公司 Extremum algorithm configuration method, electronic device, program product and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN108805265A (en) * 2018-05-21 2018-11-13 Oppo广东移动通信有限公司 Neural network model treating method and apparatus, image processing method, mobile terminal
CN109388779A (en) * 2017-08-03 2019-02-26 珠海全志科技股份有限公司 A kind of neural network weight quantization method and neural network weight quantization device
CN110222821A (en) * 2019-05-30 2019-09-10 浙江大学 Convolutional neural networks low-bit width quantization method based on weight distribution
US20190385059A1 (en) * 2018-05-23 2019-12-19 Tusimple, Inc. Method and Apparatus for Training Neural Network and Computer Server
CN111368972A (en) * 2020-02-21 2020-07-03 华为技术有限公司 Convolution layer quantization method and device thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3427195B1 (en) * 2016-03-11 2024-05-01 Telecom Italia S.p.A. Convolutional neural networks, particularly for image analysis
US10936913B2 (en) * 2018-03-20 2021-03-02 The Regents Of The University Of Michigan Automatic filter pruning technique for convolutional neural networks
KR20190125141A (en) * 2018-04-27 2019-11-06 삼성전자주식회사 Method and apparatus for quantizing parameters of neural network
CN110598839A (en) * 2018-06-12 2019-12-20 华为技术有限公司 Convolutional neural network system and method for quantizing convolutional neural network
CN110688502B (en) * 2019-09-09 2022-12-27 重庆邮电大学 Image retrieval method and storage medium based on depth hash and quantization
CN110610166B (en) * 2019-09-18 2022-06-07 北京猎户星空科技有限公司 Text region detection model training method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388779A (en) * 2017-08-03 2019-02-26 珠海全志科技股份有限公司 A kind of neural network weight quantization method and neural network weight quantization device
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN108805265A (en) * 2018-05-21 2018-11-13 Oppo广东移动通信有限公司 Neural network model treating method and apparatus, image processing method, mobile terminal
US20190385059A1 (en) * 2018-05-23 2019-12-19 Tusimple, Inc. Method and Apparatus for Training Neural Network and Computer Server
CN110222821A (en) * 2019-05-30 2019-09-10 浙江大学 Convolutional neural networks low-bit width quantization method based on weight distribution
CN111368972A (en) * 2020-02-21 2020-07-03 华为技术有限公司 Convolution layer quantization method and device thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115471398A (en) * 2022-08-31 2022-12-13 北京科技大学 Image super-resolution method, system, terminal device and storage medium
CN115471398B (en) * 2022-08-31 2023-08-15 北京科技大学 Image super-resolution method, system, terminal equipment and storage medium
CN116739050A (en) * 2022-09-30 2023-09-12 荣耀终端有限公司 Cross-layer equalization optimization method, device and storage medium
CN116739050B (en) * 2022-09-30 2024-06-07 荣耀终端有限公司 Cross-layer equalization optimization method, device and storage medium

Also Published As

Publication number Publication date
CN111368972A (en) 2020-07-03
CN111368972B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
WO2021164750A1 (en) Method and apparatus for convolutional layer quantization
WO2020221200A1 (en) Neural network construction method, image processing method and devices
WO2021120719A1 (en) Neural network model update method, and image processing method and device
WO2021190451A1 (en) Method and apparatus for training image processing model
WO2021043112A1 (en) Image classification method and apparatus
US20220165045A1 (en) Object recognition method and apparatus
WO2022042713A1 (en) Deep learning training method and apparatus for use in computing device
WO2021164751A1 (en) Perception network architecture search method and device
WO2021147325A1 (en) Object detection method and apparatus, and storage medium
WO2021218517A1 (en) Method for acquiring neural network model, and image processing method and apparatus
WO2021238366A1 (en) Neural network construction method and apparatus
WO2021155792A1 (en) Processing apparatus, method and storage medium
WO2022052601A1 (en) Neural network model training method, and image processing method and device
WO2021057056A1 (en) Neural architecture search method, image processing method and device, and storage medium
WO2022001805A1 (en) Neural network distillation method and device
WO2021244249A1 (en) Classifier training method, system and device, and data processing method, system and device
WO2021008206A1 (en) Neural architecture search method, and image processing method and device
WO2021129668A1 (en) Neural network training method and device
WO2022111617A1 (en) Model training method and apparatus
WO2021136058A1 (en) Video processing method and device
CN113191241A (en) Model training method and related equipment
WO2022156475A1 (en) Neural network model training method and apparatus, and data processing method and apparatus
WO2024160215A1 (en) Data processing method and apparatus
WO2023125628A1 (en) Neural network model optimization method and apparatus, and computing device
WO2022179599A1 (en) Perceptual network and data processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21756654

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21756654

Country of ref document: EP

Kind code of ref document: A1