CN110874627A - Data processing method, data processing apparatus, and computer readable medium - Google Patents

Data processing method, data processing apparatus, and computer readable medium Download PDF

Info

Publication number
CN110874627A
CN110874627A CN201811034336.4A CN201811034336A CN110874627A CN 110874627 A CN110874627 A CN 110874627A CN 201811034336 A CN201811034336 A CN 201811034336A CN 110874627 A CN110874627 A CN 110874627A
Authority
CN
China
Prior art keywords
target
data
parameter set
parameter
sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811034336.4A
Other languages
Chinese (zh)
Other versions
CN110874627B (en
Inventor
程捷
罗龙强
郭青海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811034336.4A priority Critical patent/CN110874627B/en
Publication of CN110874627A publication Critical patent/CN110874627A/en
Application granted granted Critical
Publication of CN110874627B publication Critical patent/CN110874627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a data processing method, a data processing device and a computer readable medium, wherein the method comprises the following steps: inputting data to be processed into a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are sets of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data; performing the target processing on the data to be processed through the target network to obtain an output result; the precision of target processing can be greatly improved, and the calculation time is saved.

Description

Data processing method, data processing apparatus, and computer readable medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, and a computer readable medium.
Background
The progress of big data technology and artificial intelligence technology has pushed revolutionary changes in data processing. People not only put forward the high accuracy requirement to data processing, but also expanded the requirements of real-time, low power consumption, intelligence and the like on the basis of accuracy.
From the storage point of view, existing Neural networks such as Deep Neural Network (DNN) and Convolutional Neural Network (CNN) store floating point data. One DNN generally requires several tens to hundreds of megabytes of storage resources, which makes it difficult to migrate the DNN to a terminal device such as a mobile phone for use. From the calculation perspective, a DNN needs to perform a large amount of operations such as multiplication and addition, and in some application scenarios with high real-time requirements, floating-point data is adopted for calculation, so that the real-time requirements are difficult to meet. For example, in an autonomous driving scenario, multiple networks are required to perform calculations simultaneously. From a hardware design perspective, the existing DNN can only run on a Central Processing Unit (CPU) operating on floating-point data. When a Field-Programmable Gate Array (FPGA) with smaller consumption and faster operation realizes the DNN algorithm, the operation of the floating point number must be converted into a lower stored fixed point number in consideration of the limiting conditions such as hardware resources. At present, quantizing floating point data in a neural network into integer data is a main means for increasing the operation speed of the neural network and reducing the occupied storage space, and becomes an important research direction.
In a conventional quantization method of a neural network, a quantization coefficient is determined after the weight of each layer is counted as a whole, and each layer corresponds to one quantization scheme. However, this quantization method has low accuracy and cannot meet the requirements of application scenarios with high accuracy.
Disclosure of Invention
The application provides a data processing method, a data processing device and a computer readable medium, which can improve the quantization precision and reduce the hardware overhead.
In a first aspect, the present application provides a data processing method, including:
inputting data to be processed into a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are sets of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
and performing the target processing on the data to be processed through the target network to obtain an output result.
The execution main body of the application is a data processing device, and the data processing device can be a mobile phone, a notebook computer, a desktop computer, a tablet computer, a wearable device, a server and the like. The target network may be a neural network currently stored by the data processing apparatus, i.e., a preset neural network; or a neural network acquired by the data processing apparatus from another device, for example, a cloud server; the data processing device may further quantize a neural network obtained by quantizing a reference network, where the reference network is a trained neural network used for performing the target processing on the input data. The target processing may be various processing such as target detection, image segmentation, target recognition, target classification, target tracking, and the like. Optionally, data included in the first parameter set and the second parameter set in the target network are both integer data. It can be understood that each parameter included in the convolution layer of the target network is integer data, so that the convolution layer can greatly reduce the operation amount when performing convolution operation or dot product operation. In addition, different sets of parameters in the same convolutional layer may correspond to different quantized coefficients. That is, the same convolutional layer can quantize the convolutional kernel (weight) and the offset by using a plurality of quantization coefficients, and the quantization precision can be effectively improved.
In the embodiment of the application, the target operation is performed on the data to be processed by adopting the neural network taking the combination of the convolution kernel and the bias as a quantization unit, so that the precision of target processing can be greatly improved, and the calculation time is saved.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolutional layers and have the same corresponding quantization coefficient.
In practical applications, the target network may correspond to F quantization coefficients, and any one of the parameter sets included in the target network corresponds to one of the F quantization coefficients. That is, each parameter set included in the target network is quantized by using one of the F quantized coefficients. Each quantization coefficient corresponds to a quantization mode. It can be seen that the target network corresponds to the F quantization mode only. In the case of quantizing the parameter set using the amplifiers, the data processing apparatus can perform the quantizing operation using F amplifiers. In case the parameter set is quantized with a shifter, the shifter only needs F shifting operations to quantize the parameter set. This can greatly reduce the workload of quantization and reduce the overhead of hardware.
In this implementation, parameter sets in different convolutional layers may correspond to the same quantization coefficient, which not only improves the precision of target processing, but also reduces the overhead of hardware.
In an optional implementation manner, the method for quantizing the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any parameter set in the N parameter sets is a set of convolution kernels and bias used for calculating one parameter in the characteristic diagram, and N is more than or equal to 2; dividing the N parameter sets into M types, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; it is possible to improve the accuracy of quantization and reduce the workload of quantization operation.
In an optional implementation manner, before the inputting the data to be processed into the target network, the method further includes:
acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein N is more than or equal to 2;
dividing the N parameter sets into M types, wherein M is more than or equal to 2;
determining M quantization coefficients corresponding to the M classes, the M quantization coefficients corresponding to the M classes one-to-one;
and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In practical applications, before the data processing apparatus performs target processing on the data to be processed, the data processing apparatus may quantize a current neural network (reference network) to obtain a target network, so as to quantize the data to be processed by using the target network. In this implementation, by classifying the N parameter sets, the parameter sets whose corresponding quantized coefficients are closest may be grouped into the same class. That is, the parameter sets with the same quantization mode are grouped into the same class, and the parameter sets in the same class are quantized by the same quantization coefficient.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; it is possible to improve the accuracy of quantization and reduce the workload of quantization operation.
In an alternative implementation, the determining M quantization coefficients corresponding to the M classes includes:
determining M central points corresponding to the M classes, wherein the M central points correspond to the M classes one by one;
determining the M quantization coefficients corresponding to the M classes one to one according to the M central points respectively.
In the implementation mode, the quantization coefficients corresponding to various types can be accurately determined by determining the central points corresponding to various types, and the implementation is simple.
In an optional implementation manner, the quantizing the N parameter sets according to the M quantization coefficients, respectively, to obtain the target network includes:
under the condition that the N parameter sets are quantized by using a shifter, sorting the N parameter sets according to a target sequence, wherein the target sequence is used for reducing the shifting times required by the shifter when the N parameter sets are quantized;
quantizing the N sets of parameters in sequence in the target order based on the M quantized coefficients.
In this implementation, the shift operation of the shifter can be reduced by quantizing the N parameter sets in sequence according to the target order, which is simple to implement.
In an optional implementation manner, before the sorting the N parameter sets according to the target order, the method further includes:
determining M shift factors in one-to-one correspondence with the M quantization coefficients;
determining shifting factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times of shift required by the shifter to quantize the fourth parameter set;
and determining the sequence for quantizing the N parameter sets according to the shifting factors respectively corresponding to the N parameter sets to obtain the target sequence.
In the implementation mode, the sequence of quantizing the N parameter sets is determined according to the shifting factors respectively corresponding to the N parameter sets, so that the shifting times of the shifter are reduced, and the implementation is simple.
In an optional implementation manner, the dividing the N parameter sets into M classes includes:
determining N groups of parameters corresponding to the N parameter sets, wherein the N groups of parameters correspond to the N parameter sets one by one, a target group parameter in the N groups of parameters comprises at least one of a maximum value, a minimum value, a mean value and a median corresponding to a target parameter set, and the target parameter set is contained in each of the N parameter sets and corresponds to the target group parameter;
adopting a clustering algorithm to divide the N groups of parameters into the M classes;
and determining the M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
In the implementation mode, each parameter set is classified through one or more parameters of the maximum value, the minimum value, the mean value, the median and the like corresponding to each parameter set, the calculation is simple, and a good classification effect can be obtained.
In an optional implementation manner, the dividing the N sets of parameters into M classes by using a clustering algorithm includes:
determining N vectors corresponding to the N groups of parameters one by one;
and adopting a clustering algorithm to divide the N vectors into the M types.
Optionally, a k-average algorithm is used to divide the N vectors into the M classes. In practical application, a plurality of heuristic algorithms can be adopted to classify the N groups of vectors, and the application is not limited.
In the implementation mode, N vectors corresponding to N groups of parameters are classified by using a clustering algorithm, so that the classification efficiency is high.
In an optional implementation manner, the parameters included in the first parameter set and the second parameter set are both integer data, and the performing the target processing on the data to be processed by the target network to obtain an output result includes:
calculating convolution or dot product of convolution kernels contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data are data input by a convolution layer to which the first parameter set belongs;
calculating the sum of the bias contained in the first parameter set and the first data to obtain second data;
performing inverse quantization on the second data to obtain third data, wherein the third data are floating point type data;
and storing the third data into a buffer area, wherein the third data is used for calculating the output result.
The data processing device performs convolution operation or dot product operation on input data by using the quantized integer data of the convolutional layer, thereby reducing the operation amount of the convolutional layer. In addition, after the data processing device performs inverse quantization on the data output by the convolutional layer, the data is stored in a buffer area so as to facilitate subsequent processing; the accuracy of target processing can be improved.
In this implementation, the quantized parameter set is used to perform convolution operation or dot product operation and inverse quantization on the data output by the convolution layer; the calculation amount of the convolution layer can be reduced and the accuracy of the target processing can be improved.
In a second aspect, the present application provides a data processing apparatus comprising:
an input unit for inputting data to be processed to a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are sets of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
and the computing unit is used for carrying out the target processing on the data to be processed through the target network to obtain an output result.
In the embodiment of the application, the target operation is performed on the data to be processed by adopting the neural network taking the combination of the convolution kernel and the bias as a quantization unit, so that the precision of target processing can be greatly improved, and the calculation time is saved.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolutional layers and have the same corresponding quantization coefficient.
In this implementation, parameter sets in different convolutional layers may correspond to the same quantization coefficient, which not only improves the precision of target processing, but also reduces the overhead of hardware.
In an optional implementation manner, the method for quantizing the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any parameter set in the N parameter sets is a set of convolution kernels and bias used for calculating one point in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M types, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; it is possible to improve the accuracy of quantization and reduce the workload of quantization operation.
In an optional implementation, the apparatus further comprises:
the acquisition unit is used for acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein N is more than or equal to 2;
the clustering unit is used for dividing the N parameter sets into M types, wherein M is more than or equal to 2;
a determining unit configured to determine M quantization coefficients corresponding to the M classes, the M quantization coefficients corresponding to the M classes one-to-one;
a quantizing unit, configured to quantize the N parameter sets according to the M quantizing coefficients, respectively, to obtain the target network, where a quantizing coefficient corresponding to a fourth parameter set in the N parameter sets is a quantizing coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; can improve the quantization precision and reduce the workload of quantization operation
In an optional implementation manner, the determining unit is specifically configured to determine M central points corresponding to the M classes, where the M central points correspond to the M classes one to one; determining the M quantization coefficients corresponding to the M classes one to one according to the M central points respectively.
In the implementation mode, the quantization coefficients corresponding to various types can be accurately determined by determining the central points corresponding to various types, and the implementation is simple.
In an optional implementation, the apparatus further comprises:
a sorting unit, configured to, when the N parameter sets are quantized by using a shifter, sort the N parameter sets according to a target order, where the target order is used to reduce the number of shifts required by the shifter when the N parameter sets are quantized;
the quantization unit is specifically configured to quantize the N parameter sets in sequence according to the target order according to the M quantization coefficients.
In this implementation, the shift operation of the shifter can be reduced by quantizing the N parameter sets in sequence according to the target order, which is simple to implement.
In an alternative implementation, the determining unit is specifically configured to determine M shift factors that correspond to the M quantization coefficients in a one-to-one manner; determining shifting factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times of shift required by the shifter to quantize the fourth parameter set; and determining the sequence for quantizing the N parameter sets according to the shifting factors respectively corresponding to the N parameter sets to obtain the target sequence.
In the implementation mode, the sequence of quantizing the N parameter sets is determined according to the shifting factors respectively corresponding to the N parameter sets, so that the shifting times of the shifter are reduced, and the implementation is simple.
In an optional implementation manner, the determining unit is specifically configured to determine N sets of parameters corresponding to the N parameter sets, where the N sets of parameters correspond to the N parameter sets one to one, a target set parameter in the N sets of parameters includes at least one of a maximum value, a minimum value, a mean value, and a median corresponding to a target parameter set, and the target parameter set is included in each of the N parameter sets and corresponds to the target set parameter;
the clustering unit is specifically configured to divide the N sets of parameters into the M classes by using a clustering algorithm; and determining the M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
In the implementation mode, each parameter set is classified through one or more parameters of the maximum value, the minimum value, the mean value, the median and the like corresponding to each parameter set, the calculation is simple, and a good classification effect can be obtained.
In an optional implementation manner, the determining unit is specifically configured to determine N vectors corresponding to the N sets of parameters one to one;
the clustering unit is specifically configured to divide the N vectors into the M classes by using a clustering algorithm.
In the implementation mode, N vectors corresponding to N groups of parameters are classified by using a clustering algorithm, so that the classification efficiency is high.
In an optional implementation, the apparatus further comprises:
the calculation unit is used for calculating convolution or dot product of convolution kernels contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data are data input by a convolution layer to which the first parameter set belongs; calculating the sum of the bias contained in the first parameter set and the first data to obtain second data;
the inverse quantization unit is used for performing inverse quantization on the second data to obtain third data, and the third data are floating point type data;
and the storage unit is used for storing the third data into a buffer area, and the third data is used for calculating the output result.
In this implementation, the quantized parameter set is used to perform convolution operation or dot product operation and inverse quantization on the data output by the convolution layer; the calculation amount of the convolution layer can be reduced and the accuracy of the target processing can be improved.
In a third aspect, an embodiment of the present invention provides another data processing apparatus, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is used to store a computer program, and the computer program includes program instructions, and the processor is configured to call the program instructions to execute the method according to the first aspect and any implementation manner of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to execute the first aspect and the method of any implementation manner of the first aspect.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
FIG. 1 is a flow chart of a data processing method provided herein;
fig. 2 is a schematic diagram of an amplifier implementing quantization operation according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a quantization sequence of a parameter set according to an embodiment of the present disclosure;
FIG. 4 is a diagram illustrating a comparison of a pre-quantized neural network and a post-quantized neural network provided by an embodiment of the present application;
FIG. 5 is a flow chart of a method for training a neural network according to an embodiment of the present disclosure;
fig. 6 is a flowchart of a processing method based on a quantized neural network according to an embodiment of the present disclosure;
fig. 7 is a diagram illustrating an image recognition method according to an embodiment of the present application;
fig. 8 is a method for calculating convolutional layers in a neural network according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application.
Detailed Description
The present application proposes a quantization scheme that performs quantization in units of a combination of a convolution kernel and an offset, while seeking a quantization scheme in units of a combination of a convolution kernel and an offset under a condition that hardware constraints are satisfied, in consideration of hardware overhead.
In practical applications, the input data of the neural network often needs to be subjected to convolution operation to obtain a result. In convolution operation, 32-bit floating point number (float32) is generally required for operation. In practical applications, however, using floating point numbers for computation is slow and wasteful of memory space. Therefore, in practical hardware implementation, 32-bit floating point operations are generally converted into 8-bit integer data (Int8) for operation, and the result error between the result and the floating point operation is ensured to be small or to meet the application requirement. The operation of converting 32-bit floating point numbers into 8-bit shaped data is a quantization operation. The quantization operation needs to be implemented in consideration of the overhead of hardware. The following describes the problem of hardware overhead when the data processing apparatus performs quantization operations.
To understand the problem of hardware overhead, we take as an example running a Convolutional Neural Network (CNN) network. The data processing device can store the weight in the ReRAM unit, the process of converting the input external voltage into the current is multiplication operation, and the addition operation is realized by the superposition of the current, so that the rapid matrix operation is realized. Because the ReRAM adopts an analog circuit, voltage and current including signal sampling have a certain range, high-precision calculation for maintaining an original network is impossible, and parameters of the original network need to be quantized into low-bit data (such as int8 type) for calculation. In an analog circuit, the quantization operation can be realized by setting an amplifier. In a digital circuit, the quantization work is performed by a shifter through a shift operation. One quantization scheme corresponds to the setting of one amplifier. From a hardware point of view, it is desirable to reduce the number of amplifiers required as much as possible, or to reduce the number of shifts of the shifter as much as possible. This also makes our quantization scheme need to satisfy the following conditions: (1) from the quantization point of view, we need to find a better quantization scheme combination, so that the loss after quantization is less. (2) From a hardware implementation point of view, it is desirable that the quantization scheme is changed as little as possible, so that the operation of shifting the setting on the hardware, or the number of amplifiers, can be reduced.
In a conventional neural network quantization method, a quantization coefficient is determined by counting the weight of each convolutional layer as a whole, and each convolutional layer corresponds to one quantization scheme. However, in this quantization method, each convolutional layer corresponds to the same quantization coefficient, and the difference between a plurality of parameter sets (combination of convolutional kernels and offsets) included in the same convolutional layer is not considered, which results in a large loss of quantization accuracy. That is, different sets of parameters in the same convolutional layer may need to be quantized with different quantization coefficients. So that the quantization accuracy of this layer can be guaranteed. On the other hand, if the number of parameter sets is too large, the quantization schemes are not feasible to be implemented in hardware, because each quantization scheme depends on a specific hardware setting. Therefore, we need to solve the problem of how to balance the loss of precision with the hardware implementation.
The method and the device have the advantages that the cost of hardware is low, the quantization precision is improved, and the efficient quantization method is provided. Specifically, in the quantization step, in order to ensure that the accuracy of the neural network is not affected, different quantization schemes need to be set for different characteristics in the data, and then quantization is realized through setting of hardware. The parameter sets may first be grouped in a certain rule. For example, the parameter sets in each convolution layer can be grouped (classified) by methods such as halving and clustering, and the group types are limited, so that the total number of quantization schemes is reduced, and the aim of reducing hardware overhead to a certain extent is fulfilled. In addition, quantization coefficients required by each parameter set can be determined in a targeted manner through grouping, and then quantization precision is improved. It will be appreciated that the weights and offsets for the same convolutional layer may employ multiple quantization schemes, and the weights and offsets in different convolutional layers may employ the same quantization scheme.
A data processing method (quantization method) provided by the present application is specifically described below. Fig. 1 is a flowchart of a data processing method provided in the present application, and as shown in fig. 1, the method may include:
101. the data processing device obtains N parameter sets contained in at least one convolutional layer of the reference network.
N is more than or equal to 2, namely N is an integer more than or equal to 2. The data processing device can be a mobile phone, a notebook computer, a desktop computer, a server and the like. The N parameter sets may be all parameter sets included in the reference network. In practical application, the data processing device reads all the parameter sets in each convolution layer to obtain N parameter sets. The reference network is a trained neural network for performing target processing on input data. For example, the reference network is a face recognition network and the target processing is face recognition.
102. The data processing device divides the N parameter sets into M types.
M is not less than 2, namely M is an integer more than or equal to 2.
In an optional implementation manner, the dividing the N parameter sets into M classes includes:
determining N sets of parameters corresponding to the N sets of parameters, where the N sets of parameters correspond to the N sets of parameters one to one, a target set of parameters in the N sets of parameters includes at least one of a maximum value, a minimum value, a mean value, and a median corresponding to a target set of parameters, and the target set of parameters is included in each of the N sets of parameters and corresponds to the target set of parameters;
dividing the N groups of parameters into the M classes by adopting a clustering algorithm;
and determining the M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
The N groups of parameters comprise the same parameter types and the same parameter quantity. For example, each set of parameters includes a maximum value, a minimum value, a mean value, and a median corresponding to the set of parameters. The embodiment of the present application does not limit the types and the number of parameters included in each group of parameters. That is, the vectors corresponding to the parameter sets may be constructed using parameters other than the maximum value, the minimum value, the mean value, and the median corresponding to the parameter sets. It will be appreciated that the vector corresponding to each parameter set may be determined in a number of ways.
The dividing of the N sets of parameters into M classes using a clustering algorithm may be determining N vectors corresponding to the N sets of parameters one to one; and dividing the N vectors into the M types by adopting a clustering algorithm. For example, the first set of parameters is { 12.036; 17.273, respectively; 15.691, respectively; 14.258, the vector corresponding to the first set of parameters is [ 12.036; 17.273, respectively; 15.691, respectively; 14.258]. In practical application, the data processing device may cluster the N vectors by using a classification algorithm such as a k-means algorithm, and further classify the N vectors into M classes. The purpose of the k-average algorithm is: n points (which may be an observation or an instance of a sample) are divided into k clusters, such that each point belongs to the cluster corresponding to the mean closest to it (i.e., the cluster center), and is used as the criterion for clustering. Optionally, the data processing apparatus clusters the N vectors by using an annealing algorithm, a maximum expectation algorithm, or the like. The cluster center for each category may be the center point of such category. And (4) clustering all vectors according to a certain distance (such as Euclidean distance) by using a clustering method (such as K-means).
103. The data processing device determines M quantization coefficients corresponding to the M classes.
The M quantization coefficients correspond one-to-one to the M classes.
In an optional implementation manner, the determining M quantization coefficients corresponding to the M classes includes:
determining M central points corresponding to the M classes, wherein the M central points correspond to the M classes one by one;
the M quantization coefficients corresponding to the M classes one-to-one are determined based on the M center points, respectively.
Optionally, the central point corresponding to one class is a clustering center corresponding to each parameter set belonging to the class.
It will be appreciated that each set of parameters corresponds to a vector, which may be composed of various attributes such as mean, maximum, minimum, etc. Any one of the M center points corresponds to a vector. The central point corresponding to one class is the geometric central point corresponding to all vectors corresponding to the class, that is, the geometric central point corresponding to each vector corresponding to each parameter set contained in the class. That is, a geometric mean vector of the respective vectors within each category is calculated, and the geometric mean vector is used as a geometric center point (center point) of the category. For example, a class corresponds to two vectors, i.e., { a, b, c } and { d, e, f }, and the class corresponds to geometric center points of { (a + d)/2, (b + e)/2, (c + f)/2 }. In practical applications, the average value term of the center point corresponding to each class (a median, etc. may be used as needed) may be used as the quantization multiple of the class.
104. And quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, wherein a quantization coefficient corresponding to a fourth parameter set of the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set of the M classes.
The quantized coefficient corresponding to one parameter set is the quantized coefficient corresponding to the class to which the parameter set belongs.
The hardware implementation of the quantization operation may be a shift operation of a shifter or an amplification operation of an amplifier. In practical applications, the same shifter may be used to quantize multiple parameter sets. When the data processing apparatus uses the shifter to quantize the parameter sets belonging to different classes, the number of shifts required by the shifter is different. The data processing apparatus quantizes the parameter sets in different orders using the shifter, and the number of shifts required for the shifter to complete the quantization operation is different. A quantization method that can reduce the number of shifts of the shifter is described below, specifically as follows: the quantizing the N parameter sets according to the M quantization coefficients, respectively, to obtain the target network includes:
when quantizing the N parameter sets using a shifter, sorting the N parameter sets according to a target order that reduces a number of shifts required by the shifter when quantizing the N parameter sets;
and sequentially quantizing the N parameter sets in the target order based on the M quantization coefficients.
Optionally, the data processing apparatus quantizes each parameter set included in the M classes in sequence. For example, the data processing apparatus quantizes the parameter sets included in each class sequentially from the first class to the mth class. In this way, only M amplifiers are needed, each quantizing the set of parameters contained in one class. Optionally, a shifter is used to quantize each parameter set included in each class in turn, and after the quantization of each parameter set included in one class is completed, the shifter is adjusted to a state suitable for quantizing the parameter set included in the next class. Fig. 2 is a schematic diagram of an amplifier implementing quantization operation according to an embodiment of the present application, as shown in fig. 2, each amplifier quantizes parameter sets included in one group (class), each amplifier is connected to a sampler, and each sampler is used for sampling a signal input by the amplifier to obtain quantized data. The quantization operation by means of an amplifier and a sampler is a commonly used technical means in the art and will not be described in detail here. One class for each weight grouping. In practical applications, one amplifier and one sampler may be used to quantize the parameter sets belonging to the same class.
Optionally, the data processing apparatus quantizes the parameter set of each convolution layer included in the parameter network in sequence. Before quantizing the parameter sets included in each convolutional layer in turn by using a shifter, it is necessary to determine the order in which each parameter set is quantized. The embodiment of the present application provides a method for determining a quantization order of a parameter set included in each convolution layer, which specifically includes the following steps: before the sorting the N parameter sets according to the target order, the method further includes:
determining M shift factors in one-to-one correspondence with the M quantization coefficients;
determining shifting factors corresponding to the N parameter sets respectively; the shift factor corresponding to the fourth parameter set is the number of times that the shifter needs to quantize the fourth parameter set;
and determining the sequence of quantizing the N parameter sets according to the shifting factors respectively corresponding to the N parameter sets to obtain the target sequence.
One shift factor corresponds to the number of shifts of one shifter. The shift factor corresponding to one parameter set is the shift factor corresponding to the quantization coefficient corresponding to the parameter set. Determining the shift factor from the quantized coefficients is a common technique in the art and will not be described in detail here. It is to be understood that the data processing apparatus may determine the number of shifts of the shifter based on the quantization coefficients. The shift factor corresponding to one parameter set is the shift times required by the shifter to quantize the parameter set. In practical application, the data processing device determines the sequence of quantizing the N parameter sets according to the shift factors corresponding to the parameter sets with the goal of minimizing the total shift times of the shifter. Fig. 3 is a schematic diagram of a quantization sequence of parameter sets according to an embodiment of the present application, and as shown in fig. 3, N parameter sets are divided into four ABCD categories, and the size relationship of shift times (shift factors) is: if the shifter quantizes the 4 types of parameter sets and moves to the right i, j, k, l times, the quantization sequence of each parameter set in the first Layer (Layer 1) is ABCD, and the quantization sequence of each parameter set in the second Layer is DCBA, i.e. moves to the left 0, k, j, i times.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; the quantization accuracy can be improved and the hardware overhead can be reduced.
In order to illustrate the difference between the neural network before quantization and the neural network after quantization, the neural network for implementing the image recognition function is taken as an example to describe the difference between the neural network before quantization and the neural network after quantization. Fig. 4 is a comparison diagram of a neural network before quantization and a neural network after quantization according to an embodiment of the present application, and as shown in fig. 4, the neural network includes an input unit, a calculation processing unit, and an output unit, where the input units of the neural network before quantization and the neural network after quantization both obtain pictures through a camera and input image data into the neural network; the output units of the neural network before quantization and the neural network after quantization obtain the probability value of the picture belonging to a certain category according to the activation function, and determine the category of the picture; the neural network before quantization and the neural network after quantization are different in that when the calculation processing unit of the neural network before quantization performs calculation, the parameters (weight and offset) of each convolution layer are floating point data (float32), and when the calculation processing unit of the neural network after quantization performs calculation, the parameters (weight and offset) of each convolution layer are integer data (Int 8). The calculation processing unit is a calculation unit mentioned below. The quantized weights and biases in the neural network are quantized into integer data, so that on one hand, the occupied storage space can be reduced, and on the other hand, the calculation amount can be reduced.
The foregoing embodiment quantifies a trained neural network. That is, before a neural network is quantized, the neural network is trained. A method of training a neural network is described below. Fig. 5 is a flowchart of a method for training a neural network according to an embodiment of the present disclosure, and as shown in fig. 5, the method may include:
501. the data processing device inputs the image data to the neural network.
The image data may be image data to be subject to target processing. For example, the image data is face image data, and the target processing is face recognition.
502. The neural network propagates forward.
The neural network can obtain final output through layer-by-layer derivation of input feature vectors, and the classification or regression problem is solved through output results. In fact, the neural network is derived layer by using a forward propagation algorithm. Forward propagation is a commonly used technique in the art and is not described in detail here.
503. The neural network obtains an output result.
The output result may be a classification result of the image data.
504. And the neural network adjusts the weight and the bias according to the comparison result of the output result and the labeling result of the image data.
The data processing device may store the labeling result of each image data.
504. And stopping training after the neural network converges.
For example, the neural network is used to perform face recognition on an input face image, and the training is stopped when the accuracy of the face recognition reaches a certain value. After the training is stopped, the current neural network is the network obtained by training, namely the trained neural network.
In the embodiment, the weight and the bias of the neural network can be quickly adjusted through a forward propagation algorithm, and the implementation is simple.
The method for quantizing the neural network is described above, and the application of the quantized neural network is described below. Fig. 6 is a flowchart of a processing method based on a quantized neural network according to an embodiment of the present application, and as shown in fig. 6, the method may include:
601. inputting data to be processed into a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are a set of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data.
The target network may be a neural network currently stored in the data processing apparatus, that is, a preset neural network; or a neural network acquired by the data processing apparatus from other devices, such as a cloud server; the data processing apparatus may further include a neural network obtained by quantizing a reference network, where the reference network is a trained neural network used for performing the target processing on the input data. The target processing may be various processing such as target detection, image segmentation, target recognition, target classification, and target tracking. Optionally, the data included in the first parameter set and the second parameter set in the target network are both integer data. It can be understood that each parameter included in the convolution layer of the target network is integer data, so that the convolution layer can greatly reduce the operation amount when performing convolution operation or dot product operation. In addition, different sets of parameters in the same convolutional layer may correspond to different quantized coefficients. That is, the same convolutional layer can quantize the convolutional kernel (weight) and the offset by using a plurality of quantization coefficients, and the quantization precision can be effectively improved.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolutional layers and have the same corresponding quantization coefficients.
In practical applications, the target network may correspond to F quantization coefficients, and any one of the parameter sets included in the target network corresponds to one of the F quantization coefficients. That is, each parameter set included in the target network is quantized by using one of the F quantized coefficients. Each quantization coefficient corresponds to a quantization mode. It can be seen that the target network corresponds to the F quantization mode only. In the case of quantizing the parameter set using the amplifiers, the data processing apparatus can perform the quantizing operation using F amplifiers. In case the parameter set is quantized with a shifter, the shifter only needs F shifting operations to quantize the parameter set. This can greatly reduce the workload of quantization and reduce the overhead of hardware.
In this implementation, parameter sets in different convolutional layers may correspond to the same quantization coefficient, which not only improves the precision of target processing, but also reduces the overhead of hardware.
In an optional implementation manner, the method for quantizing the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any parameter set in the N parameter sets is a set of convolution kernels and bias used for calculating one parameter in the characteristic diagram, and N is more than or equal to 2; dividing the N parameter sets into M types, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, wherein a quantization coefficient corresponding to a fourth parameter set of the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set of the M classes. The foregoing embodiments detail the quantization method and are not described in detail here.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; it is possible to improve the accuracy of quantization and reduce the workload of quantization operation.
602. And performing the target processing on the data to be processed through the target network to obtain an output result.
In an optional implementation manner, the parameters included in the first parameter set and the second parameter set are both integer data, and the performing the target processing on the data to be processed through the target network to obtain the output result includes:
calculating convolution or dot product of convolution kernel contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data is data input by a convolution layer to which the first parameter set belongs;
calculating the sum of the bias included in the first parameter set and the first data to obtain second data;
performing inverse quantization on the second data to obtain third data, wherein the third data are floating point type data;
and storing the third data into a buffer, wherein the third data is used for calculating the output result.
The data processing device performs convolution operation or dot product operation on input data by using the quantized integer data of the convolutional layer, thereby reducing the operation amount of the convolutional layer. In addition, after the data processing device performs inverse quantization on the data output by the convolutional layer, the data is stored in a buffer area so as to facilitate subsequent processing; the accuracy of target processing can be improved.
In this implementation, the quantized parameter set is used to perform convolution operation or dot product operation and inverse quantization on the data output by the convolution layer; the calculation amount of the convolution layer can be reduced and the accuracy of the target processing can be improved.
In the embodiment of the application, the target operation is performed on the data to be processed by adopting the neural network taking the combination of the convolution kernel and the bias as a quantization unit, so that the precision of target processing can be greatly improved, and the calculation time is saved.
An embodiment of face recognition using a quantized neural network is described below. Fig. 7 is a method for identifying an image according to an embodiment of the present application, where the method includes:
701. the data processing device inputs the picture to the target network.
The target network is a quantized neural network and can be used for picture identification. The target network can read the pixel value of each point of the picture. In practical application, a reference network for picture recognition can be obtained through training, and a target network is obtained through quantizing the reference network.
702. And the data processing device calculates the probability of the pictures belonging to each category through the target network and outputs the probability value.
The probability value may indicate a probability that the picture belongs to each category. The convolution layer of the target network adopts the quantized convolution kernel and the bias to process the input data during calculation, so that the calculation complexity can be greatly reduced.
703. And the data processing device determines the type of the picture according to the probability value.
Optionally, the data processing apparatus determines the category with the highest probability value as the category of the picture.
In the embodiment of the application, the images are classified through the quantized neural network, so that the complexity of calculation can be greatly reduced.
The foregoing embodiments do not describe in detail how to process the data read in by the convolutional layer by using the quantized neural network, and a process of processing the convolutional layer is provided below. Due to the layer-by-layer reasoning characteristic of the deep neural network, the general data is calculated from the first layer until the last layer is inferred. Fig. 8 is a method for calculating convolutional layers in a neural network according to an embodiment of the present disclosure, where as shown in fig. 8, the method may include:
801. the ith convolution layer reads in the feature map data Di.
The feature map data Di is data to be processed in the ith convolution layer, that is, feature map data output from the previous layer or input data to be processed. First set i to 1 and increase i by 1 after each cycle until the last layer.
802. And reading the sequence number aij of the jth convolution kernel of the i layer.
803. The corresponding shifting factor bij is read in according to the sequence number aij.
The data processing apparatus may store the shift factors corresponding to the respective sequence numbers.
804. And reading in the j-th convolution kernel Dij of the i layer according to the sequence number aij.
The data in the convolution kernel is quantized integer data.
805. A convolution Ei or dot product Ei of the convolution kernel Dij and the feature map data Di is calculated.
806. Bias is applied to Ei and Ei is shifted according to the shift factor.
The bias of each layer in the neural network is quantized integer data.
807. Writing the calculation result D (i +1) into a buffer.
The calculation result is the result obtained by the convolution calculation or dot product calculation.
808. And judging whether the calculation of the current layer is finished or not.
If so, entering the next layer, otherwise, continuing to utilize the residual convolution kernels of the layer to perform calculation, namely executing 802 (reading in the serial number of the (j +1) th convolution kernel of the i layer).
809. And judging whether all layers are calculated.
If so, the calculation is stopped, otherwise, the next layer is calculated, i.e. 801 is executed (the (i +1) th convolution layer reads in the feature map data).
810. The calculation is stopped.
In the embodiment of the application, the convolution layer performs convolution operation or dot product operation by using the quantized convolution kernel and the bias, so that the complexity of calculation can be greatly reduced, and the calculation efficiency is improved.
Fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 9, the apparatus includes:
an input unit 901 for inputting data to be processed to a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are sets of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
a calculating unit 902, configured to perform the target processing on the to-be-processed data through the target network to obtain an output result.
The specific implementation is the same as in fig. 6 and will not be described in detail here.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolutional layers and have the same corresponding quantization coefficients.
In this implementation, parameter sets in different convolutional layers may correspond to the same quantization coefficient, which not only improves the precision of target processing, but also reduces the overhead of hardware.
In an optional implementation manner, the method for quantizing the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any parameter set in the N parameter sets is a set of convolution kernels and bias used for calculating one point in a characteristic diagram, and N is more than or equal to 2; dividing the N parameter sets into M types, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, wherein a quantization coefficient corresponding to a fourth parameter set of the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set of the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; it is possible to improve the accuracy of quantization and reduce the workload of quantization operation.
In an optional implementation manner, the apparatus further includes:
an obtaining unit 903, configured to obtain N parameter sets included in at least one convolution layer of the reference network, where N is greater than or equal to 2;
a clustering unit 904, configured to divide the N parameter sets into M classes, where M is greater than or equal to 2;
a determining unit 905 configured to determine M quantization coefficients corresponding to the M classes, where the M quantization coefficients correspond to the M classes one-to-one;
a quantizing unit 906, configured to quantize the N parameter sets according to the M quantization coefficients, respectively, to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set of the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set of the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, and a quantization coefficient corresponding to the parameter set in each class is determined, so that the reference network is quantized; can improve the quantization precision and reduce the workload of quantization operation
In an optional implementation manner, the determining unit 905 is specifically configured to determine M central points corresponding to the M classes, where the M central points correspond to the M classes one to one; the M quantization coefficients corresponding to the M classes one-to-one are determined based on the M center points, respectively.
In the implementation mode, the quantization coefficients corresponding to various types can be accurately determined by determining the central points corresponding to various types, and the implementation is simple.
In an optional implementation manner, the apparatus further includes:
a sorting unit 907 configured to, when the N parameter sets are quantized by the shifter, sort the N parameter sets in a target order, where the target order is used to reduce the number of shifts required by the shifter when the N parameter sets are quantized;
specifically, the quantization unit 906 is configured to sequentially quantize the N parameter sets in the target order based on the M quantization coefficients.
In this implementation, the shift operation of the shifter can be reduced by quantizing the N parameter sets in sequence according to the target order, which is simple to implement.
In an optional implementation manner, the determining unit 905 is specifically configured to determine M shift factors corresponding to the M quantization coefficients one to one; determining shifting factors corresponding to the N parameter sets respectively; the shift factor corresponding to the fourth parameter set is the number of times that the shifter needs to quantize the fourth parameter set; and determining the sequence of quantizing the N parameter sets according to the shifting factors respectively corresponding to the N parameter sets to obtain the target sequence.
In the implementation mode, the sequence of quantizing the N parameter sets is determined according to the shifting factors respectively corresponding to the N parameter sets, so that the shifting times of the shifter are reduced, and the implementation is simple.
In an optional implementation manner, the determining unit 905 is specifically configured to determine N sets of parameters corresponding to the N sets of parameters, where the N sets of parameters correspond to the N sets of parameters one to one, a target set parameter in the N sets of parameters includes at least one of a maximum value, a minimum value, a mean value, and a median corresponding to a target set parameter, and the target set parameter is included in each of the N sets of parameters and corresponds to the target set parameter;
the clustering unit 904 is specifically configured to classify the N sets of parameters into the M classes by using a clustering algorithm; and determining the M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
In the implementation mode, each parameter set is classified through one or more parameters of the maximum value, the minimum value, the mean value, the median and the like corresponding to each parameter set, the calculation is simple, and a good classification effect can be obtained.
In an optional implementation manner, the determining unit 905 is specifically configured to determine N vectors corresponding to the N sets of parameters one to one;
the clustering unit 904 is specifically configured to divide the N vectors into the M classes by using a clustering algorithm.
In the implementation mode, N vectors corresponding to N groups of parameters are classified by using a clustering algorithm, so that the classification efficiency is high.
In an optional implementation manner, the calculating unit 902 is configured to calculate a convolution or a dot product between a convolution kernel included in the first parameter set and intermediate data to obtain first data, where the intermediate data is data input by a convolution layer to which the first parameter set belongs; calculating the sum of the bias included in the first parameter set and the first data to obtain second data; the above-mentioned device still includes:
an inverse quantization unit 908, configured to perform inverse quantization on the second data to obtain third data, where the third data is floating point data;
a storage section 909 for storing the third data into a buffer, wherein the third data is used for calculating the output result.
In this implementation, the quantized parameter set is used to perform convolution operation or dot product operation and inverse quantization on the data output by the convolution layer; the calculation amount of the convolution layer can be reduced and the accuracy of the target processing can be improved.
Fig. 10 is a hardware structure diagram of a data Processing apparatus according to an embodiment of the present invention, where the data Processing apparatus includes a Central Processing Unit (CPU), a Neural Network Processing Unit (NPU), and an external memory. The neural network processor NPU 100NPU is mounted as a coprocessor on a main CPU (Host CPU), and tasks are allocated by the Host CPU. The core portion of the NPU is an arithmetic circuit 100, and the controller 1004 controls the arithmetic circuit 1003 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 1003 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuit 1003 is a two-dimensional systolic array. The arithmetic circuit 1003 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1003 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1002 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1001 and performs matrix operation with the matrix B, and partial results or final results of the obtained matrix are stored in the accumulator 1008 accumulator.
The unified memory 1006 is used for storing input data and output data. The weight data is directly transferred to the weight Memory 1002 through the Memory cell Access Controller 1005Direct Memory Access Controller, and the DMAC. The input data is also carried into the unified memory 1006 by the DMAC.
The BIU is a Bus Interface Unit, i.e., a Bus Interface Unit 510, for the interaction of the AXI Bus with the DMAC and the Instruction Fetch memory 1009Instruction Fetch Buffer.
A Bus Interface Unit 1010(Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1009 to fetch instructions from the external memory, and is also used for the memory Unit access controller 1005 to fetch the original data of the input matrix a or the weight matrix B from the external memory.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1006 or to transfer weight data into the weight memory 1002 or to transfer input data into the input memory 1001.
The vector calculation unit 1007 has a plurality of operation processing units, and further processes the output of the operation circuit such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/FC layer network calculation in the neural network, such as Pooling (Pooling), Batch Normalization (Batch Normalization), local response Normalization (local response Normalization) and the like.
In some implementations, the vector calculation unit 1007 can store the processed output vector to the unified buffer 1006. For example, the vector calculation unit 1007 may apply a non-linear function to the output of the arithmetic circuit 1003, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 1007 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuit 1003, for example, for use in subsequent layers in a neural network.
An instruction fetch buffer 1009 connected to the controller 1004, for storing instructions used by the controller 1004;
the unified memory 1006, the input memory 1001, the weight memory 1002, and the instruction fetch memory 1009 are On-Chip memories. The external memory is private to the NPU hardware architecture. The CPU can realize the functions of the acquisition unit 901, the input unit 902, the clustering unit 903, the determination unit 904, and the sorting unit 906. The NPU may implement the functionality of the computation unit 908. The data processing apparatus described above also has an amplifier or shifter (not shown in fig. 10) for implementing the functions of the quantization unit 905 and the dequantization unit 907. The function of the storage unit 909 is implemented by the unified memory 1006.
In an embodiment of the present invention, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements: inputting data to be processed into a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are sets of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data; and performing the target processing on the data to be processed through the target network to obtain an output result.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (22)

1. A data processing method, comprising:
inputting data to be processed into a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are sets of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
and performing the target processing on the data to be processed through the target network to obtain an output result.
2. The method of claim 1, wherein the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolutional layers and have the same quantization coefficients.
3. The method of claim 2, wherein quantizing the reference network to obtain the target network comprises: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any parameter set in the N parameter sets is a set of convolution kernels and bias used for calculating one parameter in the characteristic diagram, and N is more than or equal to 2; dividing the N parameter sets into M types, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
4. The method of claim 1, wherein prior to inputting the data to be processed into the target network, the method further comprises:
acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein N is more than or equal to 2;
dividing the N parameter sets into M types, wherein M is more than or equal to 2;
determining M quantization coefficients corresponding to the M classes, the M quantization coefficients corresponding to the M classes one-to-one;
and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
5. The method of claim 3 or 4, wherein said determining the M quantization coefficients corresponding to the M classes comprises:
determining M central points corresponding to the M classes, wherein the M central points correspond to the M classes one by one;
determining the M quantization coefficients corresponding to the M classes one to one according to the M central points respectively.
6. The method according to claim 3 or 4, wherein the quantizing the N parameter sets according to the M quantization coefficients respectively to obtain the target network comprises:
under the condition that the N parameter sets are quantized by using a shifter, sorting the N parameter sets according to a target sequence, wherein the target sequence is used for reducing the shifting times required by the shifter when the N parameter sets are quantized;
quantizing the N sets of parameters in sequence in the target order based on the M quantized coefficients.
7. The method of claim 6, wherein prior to said sorting the N sets of parameters in a target order, the method further comprises:
determining M shift factors in one-to-one correspondence with the M quantization coefficients;
determining shifting factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times of shift required by the shifter to quantize the fourth parameter set;
and determining the sequence for quantizing the N parameter sets according to the shifting factors respectively corresponding to the N parameter sets to obtain the target sequence.
8. The method of claim 3 or 4, wherein the classifying the N sets of parameters into M classes comprises:
determining N groups of parameters corresponding to the N parameter sets, wherein the N groups of parameters correspond to the N parameter sets one by one, a target group parameter in the N groups of parameters comprises at least one of a maximum value, a minimum value, a mean value and a median corresponding to a target parameter set, and the target parameter set is contained in each of the N parameter sets and corresponds to the target group parameter;
adopting a clustering algorithm to divide the N groups of parameters into the M classes;
and determining the M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
9. The method of claim 8, wherein the using a clustering algorithm to classify the N sets of parameters into M classes comprises:
determining N vectors corresponding to the N groups of parameters one by one;
and adopting a clustering algorithm to divide the N vectors into the M types.
10. The method according to any one of claims 1 to 4, wherein the parameters included in the first parameter set and the second parameter set are integer data, and the performing the target processing on the data to be processed through the target network to obtain the output result includes:
calculating convolution or dot product of convolution kernels contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data are data input by a convolution layer to which the first parameter set belongs;
calculating the sum of the bias contained in the first parameter set and the first data to obtain second data;
performing inverse quantization on the second data to obtain third data, wherein the third data are floating point type data;
and storing the third data into a buffer area, wherein the third data is used for calculating the output result.
11. A data processing apparatus, comprising:
an input unit for inputting data to be processed to a target network; the target network is a neural network comprising at least one convolutional layer, one parameter set is a set of convolution kernels and offsets used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are sets of convolution kernels and offsets used for calculating different points of the feature map in the same convolutional layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
and the computing unit is used for carrying out the target processing on the data to be processed through the target network to obtain an output result.
12. The apparatus of claim 11, wherein the target network is a neural network derived from a quantized reference network, the reference network is a neural network derived from training for performing the target processing on the input data, and a third parameter set in the target network belongs to a different convolutional layer and has the same quantized coefficients as the first parameter set.
13. The apparatus of claim 12, wherein the means for quantizing the reference network to obtain the target network comprises: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any parameter set in the N parameter sets is a set of convolution kernels and bias used for calculating one point in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M types, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
14. The apparatus of claim 11, further comprising:
the acquisition unit is used for acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein N is more than or equal to 2;
the clustering unit is used for dividing the N parameter sets into M types, wherein M is more than or equal to 2;
a determining unit configured to determine M quantization coefficients corresponding to the M classes, the M quantization coefficients corresponding to the M classes one-to-one;
a quantizing unit, configured to quantize the N parameter sets according to the M quantizing coefficients, respectively, to obtain the target network, where a quantizing coefficient corresponding to a fourth parameter set in the N parameter sets is a quantizing coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
15. The device according to claim 14,
the determining unit is specifically configured to determine M center points corresponding to the M classes, where the M center points correspond to the M classes one to one; determining the M quantization coefficients corresponding to the M classes one to one according to the M central points respectively.
16. The apparatus of claim 14, further comprising:
a sorting unit, configured to, when the N parameter sets are quantized by using a shifter, sort the N parameter sets according to a target order, where the target order is used to reduce the number of shifts required by the shifter when the N parameter sets are quantized;
the quantization unit is specifically configured to quantize the N parameter sets in sequence according to the target order according to the M quantization coefficients.
17. The apparatus of claim 16,
the determining unit is specifically configured to determine M shift factors that correspond to the M quantization coefficients one to one; determining shifting factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times of shift required by the shifter to quantize the fourth parameter set; and determining the sequence for quantizing the N parameter sets according to the shifting factors respectively corresponding to the N parameter sets to obtain the target sequence.
18. The apparatus of claim 14,
the determining unit is specifically configured to determine N sets of parameters corresponding to the N parameter sets, where the N sets of parameters correspond to the N parameter sets one to one, a target set parameter in the N sets of parameters includes at least one of a maximum value, a minimum value, a mean value, and a median corresponding to a target parameter set, and the target parameter set is included in each of the N parameter sets and corresponds to the target parameter set;
the clustering unit is specifically configured to divide the N sets of parameters into the M classes by using a clustering algorithm; and determining the M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
19. The apparatus of claim 18,
the determining unit is specifically configured to determine N vectors that correspond to the N sets of parameters one to one;
the clustering unit is specifically configured to divide the N vectors into the M classes by using a clustering algorithm.
20. The apparatus according to any one of claims 11 to 14,
the calculation unit is configured to calculate convolution or a dot product between a convolution kernel included in the first parameter set and intermediate data to obtain first data, where the intermediate data is data input by a convolution layer to which the first parameter set belongs; calculating the sum of the bias contained in the first parameter set and the first data to obtain second data; the device further comprises:
the inverse quantization unit is used for performing inverse quantization on the second data to obtain third data, and the third data are floating point type data;
and the storage unit is used for storing the third data into a buffer area, and the third data is used for calculating the output result.
21. A data processing apparatus comprising a processor and a memory, the processor and memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any one of claims 1 to 10.
22. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-10.
CN201811034336.4A 2018-09-04 2018-09-04 Data processing method, data processing device and computer readable medium Active CN110874627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811034336.4A CN110874627B (en) 2018-09-04 2018-09-04 Data processing method, data processing device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811034336.4A CN110874627B (en) 2018-09-04 2018-09-04 Data processing method, data processing device and computer readable medium

Publications (2)

Publication Number Publication Date
CN110874627A true CN110874627A (en) 2020-03-10
CN110874627B CN110874627B (en) 2024-06-28

Family

ID=69716124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811034336.4A Active CN110874627B (en) 2018-09-04 2018-09-04 Data processing method, data processing device and computer readable medium

Country Status (1)

Country Link
CN (1) CN110874627B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554149A (en) * 2021-06-18 2021-10-26 北京百度网讯科技有限公司 Neural network processing unit NPU, neural network processing method and device
CN113780513A (en) * 2020-06-10 2021-12-10 杭州海康威视数字技术股份有限公司 Network model quantification and inference method and device, electronic equipment and storage medium
WO2022001364A1 (en) * 2020-06-30 2022-01-06 华为技术有限公司 Method for extracting data features, and related apparatus
WO2023231794A1 (en) * 2022-05-30 2023-12-07 华为技术有限公司 Neural network parameter quantification method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
US20170286830A1 (en) * 2016-04-04 2017-10-05 Technion Research & Development Foundation Limited Quantized neural network training and inference
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
WO2018120740A1 (en) * 2016-12-29 2018-07-05 深圳光启合众科技有限公司 Picture classification method, device and robot

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170286830A1 (en) * 2016-04-04 2017-10-05 Technion Research & Development Foundation Limited Quantized neural network training and inference
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
WO2018120740A1 (en) * 2016-12-29 2018-07-05 深圳光启合众科技有限公司 Picture classification method, device and robot
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAXIANG WU等: "《Quantized Convolutional Neural Networks for Mobile Devices》", 《IEEE XPLORE》, pages 1 - 3 *
MOTAZ AL-HAMI等: "《Toward a Stable Quantized Convolutional Neural Networks:An Embedded Perspective》", 《RESEARCHGATE》, pages 1 - 7 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780513A (en) * 2020-06-10 2021-12-10 杭州海康威视数字技术股份有限公司 Network model quantification and inference method and device, electronic equipment and storage medium
CN113780513B (en) * 2020-06-10 2024-05-03 杭州海康威视数字技术股份有限公司 Network model quantization and reasoning method and device, electronic equipment and storage medium
WO2022001364A1 (en) * 2020-06-30 2022-01-06 华为技术有限公司 Method for extracting data features, and related apparatus
CN113554149A (en) * 2021-06-18 2021-10-26 北京百度网讯科技有限公司 Neural network processing unit NPU, neural network processing method and device
WO2023231794A1 (en) * 2022-05-30 2023-12-07 华为技术有限公司 Neural network parameter quantification method and apparatus

Also Published As

Publication number Publication date
CN110874627B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
US11727276B2 (en) Processing method and accelerating device
CN110874627B (en) Data processing method, data processing device and computer readable medium
CN112418392A (en) Neural network construction method and device
CN108701250A (en) Data fixed point method and apparatus
CN112200295B (en) Ordering method, operation method, device and equipment of sparse convolutional neural network
WO2023231794A1 (en) Neural network parameter quantification method and apparatus
CN113326930A (en) Data processing method, neural network training method, related device and equipment
WO2022028323A1 (en) Classification model training method, hyper-parameter searching method, and device
WO2022088063A1 (en) Method and apparatus for quantizing neural network model, and method and apparatus for processing data
CN113240079A (en) Model training method and device
CN112926570A (en) Adaptive bit network quantization method, system and image processing method
CN115601692A (en) Data processing method, training method and device of neural network model
CN111079753A (en) License plate recognition method and device based on deep learning and big data combination
CN111382839B (en) Method and device for pruning neural network
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN114155388B (en) Image recognition method and device, computer equipment and storage medium
CN112085175B (en) Data processing method and device based on neural network calculation
WO2021081854A1 (en) Convolution operation circuit and convolution operation method
WO2022227024A1 (en) Operational method and apparatus for neural network model and training method and apparatus for neural network model
Chin et al. A high-performance adaptive quantization approach for edge CNN applications
CN112418388A (en) Method and device for realizing deep convolutional neural network processing
CN116362301A (en) Model quantization method and related equipment
Al Maashri et al. Hardware acceleration for neuromorphic vision algorithms
CN115292033A (en) Model operation method and device, storage medium and electronic equipment
CN113065638A (en) Neural network compression method and related equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant