CN110874627B - Data processing method, data processing device and computer readable medium - Google Patents

Data processing method, data processing device and computer readable medium Download PDF

Info

Publication number
CN110874627B
CN110874627B CN201811034336.4A CN201811034336A CN110874627B CN 110874627 B CN110874627 B CN 110874627B CN 201811034336 A CN201811034336 A CN 201811034336A CN 110874627 B CN110874627 B CN 110874627B
Authority
CN
China
Prior art keywords
target
data
parameter set
parameter sets
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811034336.4A
Other languages
Chinese (zh)
Other versions
CN110874627A (en
Inventor
程捷
罗龙强
郭青海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811034336.4A priority Critical patent/CN110874627B/en
Publication of CN110874627A publication Critical patent/CN110874627A/en
Application granted granted Critical
Publication of CN110874627B publication Critical patent/CN110874627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The application discloses a data processing method, a data processing device and a computer readable medium, wherein the method comprises the following steps: inputting data to be processed into a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data; performing the target processing on the data to be processed through the target network to obtain an output result; the accuracy of target processing can be greatly improved, and the calculation time is saved.

Description

Data processing method, data processing device and computer readable medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, a data processing apparatus, and a computer readable medium.
Background
Advances in big data technology and artificial intelligence technology have driven revolutionary changes in data processing. The high-precision requirements are put forward for data processing, and the real-time, low-power consumption, intelligent and other requirements are expanded on the basis of precision.
From a storage perspective, existing deep neural networks (Deep Neural Network, DNN) and convolutional neural networks (Convolutional Neural Network, CNN) are all used to store floating point data. A DNN generally requires several tens to hundreds of megabytes of storage resources, which makes it difficult to migrate the DNN to a terminal device such as a mobile phone for use. From the calculation perspective, a DNN needs to perform a large number of multiplication, addition and other operations, and in some application scenes with high real-time requirements, floating point data are adopted for calculation, so that the real-time requirements are difficult to meet. For example, in an autopilot scenario, multiple networks are required to perform the calculations simultaneously. From a hardware design perspective, existing DNNs can only run on central processing units (Central Processing Unit, CPUs) operating on floating point data. When implementing DNN algorithms with smaller, faster-operating Field-Programmable gate arrays (fieldprogrammable GATE ARRAY, FPGA), floating-point operations must be converted to lower stored fixed-point numbers, taking into account hardware resource constraints. Currently, quantization of floating point data in a neural network into integer data is a main means for improving the operation speed of the neural network and reducing the occupied storage space, and becomes an important research direction.
The conventional quantization method of the neural network is to determine quantization coefficients by counting the weight of each layer, and each layer corresponds to a quantization scheme. However, the quantization method has lower precision and cannot meet the requirements of application scenes with higher precision.
Disclosure of Invention
The application provides a data processing method, a data processing device and a computer readable medium, which can improve the quantization precision and reduce the hardware cost.
In a first aspect, the present application provides a data processing method, the method comprising:
Inputting data to be processed into a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
And carrying out target processing on the data to be processed through the target network to obtain an output result.
The execution main body of the application is a data processing device, and the data processing device can be a mobile phone, a notebook computer, a desktop computer, a tablet computer, a wearable device, a server and the like. The target network may be a neural network currently stored by the data processing apparatus, i.e. a preset neural network; the data processing device can also be a neural network acquired from other devices, such as a cloud server; the data processing device can also be a neural network obtained by quantifying a reference network, wherein the reference network is a neural network which is obtained by training and is used for carrying out the target processing on the input data. The target processing may be various processing such as target detection, image segmentation, target recognition, target classification, target tracking, and the like. Optionally, the data contained in the first parameter set and the second parameter set in the target network are integer data. It can be understood that all parameters contained in the convolution layer of the target network are integer data, so that the calculation amount can be greatly reduced when the convolution layer carries out convolution calculation or dot product calculation. In addition, different sets of parameters in the same convolutional layer may correspond to different quantization coefficients. That is, the same convolution layer can quantize the convolution kernel (weight) and the offset by adopting various quantization coefficients, and the quantization precision can be effectively improved.
In the embodiment of the application, the target operation is carried out on the data to be processed by adopting the neural network taking the combination of the convolution kernel and the bias as a quantization unit, so that the accuracy of target processing can be greatly improved, and the calculation time is saved.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolution layers and correspond to the same quantization coefficient.
In practical applications, the target network may correspond to F quantized coefficients, and any one parameter set included in the target network corresponds to one of the F quantized coefficients. That is, each parameter set included in the target network is quantized using one quantization coefficient of the F quantization coefficients. Each quantization coefficient corresponds to a quantization mode. It can be seen that the target network corresponds to only F quantization modes. In the case of quantization of parameter sets using amplifiers, the data processing apparatus can complete the quantization operation using F amplifiers. In the case of quantizing a parameter set with a shifter, the shifter also only needs F shifting operations to quantize the parameter set. This can greatly reduce the workload of quantization and reduce the overhead of hardware.
In the implementation manner, parameter sets in different convolution layers can correspond to the same quantization coefficient, so that not only can the accuracy of target processing be improved, but also the cost of hardware can be reduced.
In an alternative implementation, the method for quantifying the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any one of the N parameter sets is a convolution kernel and offset set used for calculating one parameter in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M classes, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; the accuracy of quantization can be improved and the workload of quantization operation can be reduced.
In an alternative implementation, before the data to be processed is input into the target network, the method further includes:
acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein N is more than or equal to 2;
dividing the N parameter sets into M classes, wherein M is more than or equal to 2;
determining M quantization coefficients corresponding to the M classes, wherein the M quantization coefficients are in one-to-one correspondence with the M classes;
And respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In practical application, the data processing device may quantize the current neural network (reference network) to obtain a target network before performing target processing on the data to be processed, so as to quantize the data to be processed by using the target network. In this implementation, by classifying the N parameter sets, the parameter set with the closest corresponding quantization coefficient may be classified into the same class. That is, parameter sets with the same quantization mode are divided into the same class, and the parameter sets in the same class are quantized by using the same quantization coefficient.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; the accuracy of quantization can be improved and the workload of quantization operation can be reduced.
In an optional implementation manner, the determining M quantization coefficients corresponding to the M classes includes:
M center points corresponding to the M classes are determined, and the M center points are in one-to-one correspondence with the M classes;
and respectively determining M quantization coefficients corresponding to the M classes one by one according to the M center points.
In the implementation mode, through determining the center points corresponding to the various types, the quantization coefficients corresponding to the various types can be accurately determined, and the implementation is simple.
In an optional implementation manner, the quantizing the N parameter sets according to the M quantization coefficients, respectively, to obtain the target network includes:
in the case of quantizing the N parameter sets with a shifter, sorting the N parameter sets in a target order for reducing the number of shifts required by the shifter when quantizing the N parameter sets;
And quantizing the N parameter sets according to the M quantization coefficients in sequence according to the target sequence.
In this implementation, by sequentially quantizing the N parameter sets in the target order, the shift operation of the shifter can be reduced, which is simple to implement.
In an alternative implementation, before the sorting the N parameter sets in the target order, the method further includes:
determining M shifting factors corresponding to the M quantized coefficients one by one;
Determining shift factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times the shifter quantizes the fourth parameter set to be shifted;
And determining the order of quantizing the N parameter sets according to the shift factors respectively corresponding to the N parameter sets to obtain the target order.
In the implementation manner, the order of quantizing the N parameter sets is determined according to the shift factors respectively corresponding to the N parameter sets, so that the shift times of the shifter are reduced, and the implementation is simple.
In an alternative implementation, the classifying the N parameter sets into M classes includes:
Determining N groups of parameters corresponding to the N parameter sets, wherein the N groups of parameters are in one-to-one correspondence with the N parameter sets, a target group of parameters in the N groups of parameters comprises at least one of a maximum value, a minimum value, a mean value and a median corresponding to a target parameter set, and the target parameter set is contained in each of the N parameter sets and corresponds to the target group of parameters;
Dividing the N groups of parameters into M classes by adopting a clustering algorithm;
And determining M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
In the implementation mode, the parameter sets are classified through one or more parameters of the maximum value, the minimum value, the mean value, the median and the like corresponding to the parameter sets, the calculation is simple, and a good classification effect can be obtained.
In an alternative implementation, the classifying the N sets of parameters into M classes using a clustering algorithm includes:
Determining N vectors corresponding to the N groups of parameters one by one;
and adopting a clustering algorithm to divide the N vectors into M classes.
Optionally, the N vectors are divided into the M classes using a k-average algorithm. In practical application, a plurality of heuristic algorithms can be adopted to classify the N groups of vectors, and the application is not limited.
In the implementation mode, N vectors corresponding to the N groups of parameters are classified by using a clustering algorithm, so that the classification efficiency is high.
In an optional implementation manner, the parameters included in the first parameter set and the second parameter set are integer data, the target processing is performed on the data to be processed by the target network, and obtaining an output result includes:
Calculating convolution or dot product of a convolution kernel contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data is data input by a convolution layer to which the first parameter set belongs;
calculating the sum of the bias contained in the first parameter set and the first data to obtain second data;
Performing inverse quantization on the second data to obtain third data, wherein the third data is floating point data;
And storing the third data into a buffer area, wherein the third data is used for calculating the output result.
The data processing device performs convolution operation or dot product operation on the input data by utilizing the integer data quantized by the convolution layer, so that the operation amount of the convolution layer can be reduced. In addition, after the data processing device dequantizes the data output by the convolution layer, the data is stored in the buffer area so as to facilitate the subsequent processing; the accuracy of the target processing can be improved.
In this implementation, the convolution operation or dot product operation is performed by using the quantized parameter set, and the data output by the convolution layer is dequantized; the calculation amount of the convolution layer can be reduced, and the accuracy of target processing is improved.
In a second aspect, the present application provides a data processing apparatus comprising:
An input unit for inputting data to be processed into a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
and the computing unit is used for carrying out the target processing on the data to be processed through the target network to obtain an output result.
In the embodiment of the application, the target operation is carried out on the data to be processed by adopting the neural network taking the combination of the convolution kernel and the bias as a quantization unit, so that the accuracy of target processing can be greatly improved, and the calculation time is saved.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolution layers and correspond to the same quantization coefficient.
In the implementation manner, parameter sets in different convolution layers can correspond to the same quantization coefficient, so that not only can the accuracy of target processing be improved, but also the cost of hardware can be reduced.
In an alternative implementation, the method for quantifying the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any one of the N parameter sets is a convolution kernel and offset set used for calculating a point in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M classes, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; the accuracy of quantization can be improved and the workload of quantization operation can be reduced.
In an alternative implementation, the apparatus further includes:
the acquisition unit is used for acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein N is more than or equal to 2;
The clustering unit is used for dividing the N parameter sets into M classes, wherein M is more than or equal to 2;
The determining unit is used for determining M quantization coefficients corresponding to the M classes, and the M quantization coefficients are in one-to-one correspondence with the M classes;
and the quantization unit is used for respectively quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, wherein the quantization coefficient corresponding to the fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; can improve the accuracy of quantization and reduce the workload of quantization operation
In an optional implementation manner, the determining unit is specifically configured to determine M center points corresponding to the M classes, where the M center points are in one-to-one correspondence with the M classes; and respectively determining M quantization coefficients corresponding to the M classes one by one according to the M center points.
In the implementation mode, through determining the center points corresponding to the various types, the quantization coefficients corresponding to the various types can be accurately determined, and the implementation is simple.
In an alternative implementation, the apparatus further includes:
A sorting unit, configured to sort the N parameter sets according to a target order when the N parameter sets are quantized by using a shifter, where the target order is used to reduce the number of shifts required by the shifter when the N parameter sets are quantized;
The quantization unit is specifically configured to sequentially quantize the N parameter sets according to the target sequence according to the M quantization coefficients.
In this implementation, by sequentially quantizing the N parameter sets in the target order, the shift operation of the shifter can be reduced, which is simple to implement.
In an optional implementation manner, the determining unit is specifically configured to determine M shift factors that are in one-to-one correspondence with the M quantization coefficients; determining shift factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times the shifter quantizes the fourth parameter set to be shifted; and determining the order of quantizing the N parameter sets according to the shift factors respectively corresponding to the N parameter sets to obtain the target order.
In the implementation manner, the order of quantizing the N parameter sets is determined according to the shift factors respectively corresponding to the N parameter sets, so that the shift times of the shifter are reduced, and the implementation is simple.
In an optional implementation manner, the determining unit is specifically configured to determine N groups of parameters corresponding to the N parameter sets, where the N groups of parameters are in one-to-one correspondence with the N parameter sets, and a target group of parameters in the N groups of parameters includes at least one of a maximum value, a minimum value, a mean value, and a median corresponding to a target parameter set, where the target parameter set is included in each of the N parameter sets and corresponds to the target group of parameters;
The clustering unit is specifically configured to divide the N groups of parameters into the M classes by using a clustering algorithm; and determining M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
In the implementation mode, the parameter sets are classified through one or more parameters of the maximum value, the minimum value, the mean value, the median and the like corresponding to the parameter sets, the calculation is simple, and a good classification effect can be obtained.
In an optional implementation manner, the determining unit is specifically configured to determine N vectors corresponding to the N sets of parameters one to one;
The clustering unit is specifically configured to divide the N vectors into the M classes by using a clustering algorithm.
In the implementation mode, N vectors corresponding to the N groups of parameters are classified by using a clustering algorithm, so that the classification efficiency is high.
In an alternative implementation, the apparatus further includes:
The calculation unit is used for calculating convolution or dot product of a convolution kernel contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data is data input by a convolution layer to which the first parameter set belongs; calculating the sum of the bias contained in the first parameter set and the first data to obtain second data;
The inverse quantization unit is used for inversely quantizing the second data to obtain third data, wherein the third data is floating point data;
and the storage unit is used for storing the third data into a buffer area, and the third data is used for calculating the output result.
In this implementation, the convolution operation or dot product operation is performed by using the quantized parameter set, and the data output by the convolution layer is dequantized; the calculation amount of the convolution layer can be reduced, and the accuracy of target processing is improved.
In a third aspect, an embodiment of the present invention provides another data processing apparatus, including a processor and a memory, where the processor and the memory are connected to each other, where the memory is configured to store a computer program, where the computer program includes program instructions, and where the processor is configured to invoke the program instructions to perform the method according to the first aspect and any implementation manner of the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of the first aspect and any implementation of the first aspect.
Drawings
In order to more clearly describe the embodiments of the present application or the technical solutions in the background art, the following description will describe the drawings that are required to be used in the embodiments of the present application or the background art.
FIG. 1 is a flow chart of a data processing method provided by the application;
fig. 2 is a schematic diagram of an amplifier implementing quantization operation according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a quantization sequence of a parameter set according to an embodiment of the present application;
FIG. 4 is a diagram showing a comparison of a pre-quantized neural network and a post-quantized neural network according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for training a neural network according to an embodiment of the present application;
FIG. 6 is a flowchart of a method for processing a quantized neural network according to an embodiment of the present application;
fig. 7 is a diagram of a picture identifying method according to an embodiment of the present application;
FIG. 8 is a method for calculating a convolutional layer in a neural network according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application.
Detailed Description
The application provides a quantization scheme taking a combination of a convolution kernel and offset as a unit, and simultaneously considers hardware cost and seeks the quantization scheme taking the combination of the convolution kernel and the offset as a unit under the condition of meeting hardware limitation.
In practical applications, the input data of the neural network often needs to be convolved to obtain a result. A 32-bit floating point number (float 32) is typically required for operation in convolution operations. However, in practical applications, the use of floating point numbers for computation is both slow and wasteful of memory space. Therefore, in actual hardware implementation, the 32-bit floating point number operation is generally converted into 8-bit integer data (Int 8) for operation, and the result error of the result and the floating point operation is ensured to be smaller or the application requirement is met as much as possible. The operation of converting 32-bit floating point numbers into 8-bit shaped data is a quantization operation. Implementation of quantization operations requires consideration of overhead issues of hardware. The problem of hardware overhead when the data processing apparatus performs quantization operations is described below.
To understand the problem of hardware overhead, we take the example of running a convolutional neural network (Convolutional Neural Network, CNN) network. The data processing device can store the weight in the ReRAM unit, the process of converting the input external voltage into the current is multiplication operation, and addition operation is realized through superposition of the currents, so that quick matrix operation is realized. Since the ReRAM adopts an analog circuit, the voltage and the current comprise signal samples with a certain range, so that the high-precision calculation of the original network is not possible, and the parameters of the original network need to be quantized into low-bit data (such as int8 type) for calculation. In analog circuits, the quantization operation can be achieved by setting an amplifier. In digital circuits, the quantization is performed by a shifter through a shifting operation. A quantization scheme corresponds to the setting of one amplifier. From a hardware point of view, it is desirable to reduce the number of required amplifiers as much as possible or the number of shifts of the shifter as much as possible. This also makes our quantization scheme necessary to meet the following conditions: (1) From a quantization perspective, we need to find a better combination of quantization schemes so that there is less loss after quantization. (2) From a hardware implementation point of view, it is desirable that the quantization scheme is changed as little as possible, which may reduce the operation of shift setting on the hardware, or the number of amplifiers.
In the conventional quantization method of the neural network, the quantization coefficient is determined after the weight of each convolution layer is counted, and each convolution layer corresponds to one quantization scheme. However, in this quantization method, each convolution layer corresponds to the same quantization coefficient, and the difference between the plurality of parameter sets (the combination of the convolution kernel and the offset) included in the same convolution layer is not considered, resulting in a large loss of accuracy in quantization. That is, different sets of parameters in the same convolutional layer may need to be quantized with different quantization coefficients. This ensures the quantization accuracy of this layer. On the other hand, if too many parameter sets result in too many quantization schemes, it is not feasible in hardware implementation, because each quantization scheme implementation depends on a specific hardware setting. Therefore, we need to solve the problem of how to balance the precision loss with the hardware implementation.
The application provides an efficient quantization method based on the starting point of low hardware cost and improved quantization precision. In particular to the quantization step, in order to ensure that the accuracy of the neural network is not affected, different quantization schemes are required to be set for different characteristics in the data, and then quantization is realized through hardware setting. The parameter sets may first be grouped according to a certain rule. For example, the parameter sets in each convolution layer can be grouped (classified) by adopting methods such as equal division, clustering and the like, and the types of the groups are limited, so that the total quantization scheme number is reduced, and the aim of reducing hardware overhead to a certain extent is fulfilled. In addition, quantization coefficients required by each parameter set can be determined in a targeted manner through grouping, so that quantization accuracy is improved. It will be appreciated that the weights and offsets for the same convolutional layer may employ multiple quantization schemes and that the weights and offsets in different convolutional layers may employ the same quantization scheme.
The following specifically describes a data processing method (quantization method) provided by the present application. FIG. 1 is a flowchart of a data processing method according to the present application, as shown in FIG. 1, the method may include:
101. the data processing apparatus obtains N parameter sets contained in at least one convolutional layer of the reference network.
N is more than or equal to 2, namely N is an integer more than or equal to 2. The data processing device can be a mobile phone, a notebook computer, a desktop computer, a server and the like. The N parameter sets may be all parameter sets included in the reference network. In practical application, the data processing device reads all parameter sets in each convolution layer to obtain N parameter sets. The reference network is a neural network which is obtained through training and is used for carrying out target processing on input data. For example, the reference network is a face recognition network and the target process is face recognition.
102. The data processing device classifies the N parameter sets into M classes.
M is more than or equal to 2, namely M is an integer more than or equal to 2.
In an alternative implementation, the classifying the N parameter sets into M classes includes:
Determining N groups of parameters corresponding to the N parameter sets, wherein the N groups of parameters are in one-to-one correspondence with the N parameter sets, a target group of parameters in the N groups of parameters comprises at least one of a maximum value, a minimum value, a mean value and a median corresponding to a target parameter set, and the target parameter set is contained in each of the N parameter sets and corresponds to the target group of parameters;
dividing the N groups of parameters into M types by adopting a clustering algorithm;
And determining M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
The N groups of parameters comprise the same parameter types and parameter numbers. For example, each set of parameters includes a maximum, minimum, mean, median corresponding to the set of parameters. The embodiment of the application does not limit the type of parameters and the number of parameters contained in each group of parameters. That is, a vector corresponding to a parameter set may be constructed using parameters other than the maximum value, the minimum value, the mean value, and the median value corresponding to the parameter set. It will be appreciated that the vector corresponding to each set of parameters may be determined in a variety of ways.
The classifying of the N groups of parameters into M groups by using a clustering algorithm may be determining N vectors corresponding to the N groups of parameters one by one; and adopting a clustering algorithm to divide the N vectors into M classes. For example, the first set of parameters is {12.036;17.273;15.691;14.258, then the vector corresponding to the first set of parameters is [12.036;17.273;15.691;14.258]. In practical application, the data processing device may use a classification algorithm such as a k-average algorithm to cluster N vectors, so as to divide the N vectors into M classes. The purpose of the k-average algorithm is: n points (which may be one observation or one instance of a sample) are partitioned into k clusters such that each point belongs to a cluster corresponding to his nearest mean (i.e. cluster center) as a criterion for clustering. Optionally, the data processing device clusters the N vectors using an annealing algorithm, a maximum expectation algorithm, and other algorithms. The cluster center of each category may be the center point of this category. And clustering all vectors according to a certain distance (such as Euclidean distance) by using a clustering method (such as K-means).
103. The data processing device determines M quantization coefficients corresponding to the M classes.
The M quantization coefficients are in one-to-one correspondence with the M classes.
In an optional implementation manner, the determining M quantization coefficients corresponding to the M classes includes:
determining M center points corresponding to the M classes, wherein the M center points are in one-to-one correspondence with the M classes;
and respectively determining the M quantization coefficients corresponding to the M classes one by one according to the M center points.
Optionally, the center point corresponding to a class is a cluster center corresponding to each parameter set belonging to the class.
It will be appreciated that each parameter set corresponds to a vector, which may be made up of a variety of attributes such as average, maximum, minimum, etc. Any one of the M center points corresponds to a vector. The center point corresponding to one class is the geometric center point corresponding to all vectors corresponding to the class, namely the geometric center point corresponding to each vector corresponding to each parameter set contained in the class. That is, a geometric mean vector of the respective vectors within each category is calculated, and the geometric mean vector is used as a geometric center point (center point) of the category. For example, a class corresponds to two vectors, namely { a, b, c } and { d, e, f }, and the geometric center point corresponding to this class is { (a+d)/2, (b+e)/2, (c+f)/2 }. In practical application, the average value term (median, etc. can be used as needed) of the corresponding center point of each class can be used as the quantization multiple of the class.
104. And respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
The quantized coefficients corresponding to a parameter set are quantized coefficients corresponding to the class to which the parameter set belongs.
The hardware implementation of the quantization operation may be a shift operation of a shifter or an amplification operation of an amplifier. In practical applications, the same shifter may be used to quantize multiple parameter sets. The data processing apparatus uses a shifter to quantize parameter sets belonging to different classes, the number of shifts required by the shifter being different. The data processing apparatus quantizes each parameter set in different order using a shifter, and the number of shifts required for the shifter to complete the quantization operation is different. The following describes a quantization method capable of reducing the shift times of a shifter, which is specifically as follows: the quantizing the N parameter sets according to the M quantization coefficients, respectively, to obtain the target network includes:
when the N parameter sets are quantized by a shifter, sorting the N parameter sets in a target order, wherein the target order is used for reducing the number of times the shifter needs to shift when quantizing the N parameter sets;
And quantizing the N parameter sets according to the M quantization coefficients in sequence according to the target sequence.
Optionally, the data processing device sequentially quantizes each parameter set included in the M classes. For example, the data processing apparatus quantizes the parameter sets included in each class sequentially from the first class to the M-th class. In this way, only M amplifiers are needed, each one quantizing the set of parameters that a class contains. Optionally, a shifter is adopted to quantize each parameter set contained in each class in turn, and after quantization is completed on each parameter set contained in one class, the shifter is adjusted to a state suitable for quantizing the parameter set contained in the next class. Fig. 2 is a schematic diagram of an amplifier implementing quantization operation according to an embodiment of the present application, where each amplifier quantizes parameter sets included in a packet (class), and each amplifier is connected to a sampler, and each sampler is configured to sample a signal input by the amplifier to obtain quantized data, as shown in fig. 2. The implementation of quantization operations by means of amplifiers and samplers is a common technical means in the art and will not be described in detail here. Each weight group corresponds to a classification. In practical applications, an amplifier and a sampler may be used to quantize parameter sets belonging to the same class.
Optionally, the data processing device sequentially quantizes the parameter set of each convolution layer included in the parameter network. Before the parameter sets included in each convolutional layer are quantized in turn by using the shifter, the order in which each parameter set is quantized needs to be determined. The embodiment of the application provides a method for determining the quantization sequence of parameter sets contained in each convolution layer, which comprises the following steps: before the N parameter sets are ordered according to the target order, the method further includes:
determining M shift factors corresponding to the M quantized coefficients one by one;
Determining shift factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times the shifter quantizes the fourth parameter set to be shifted;
and determining the order of quantizing the N parameter sets according to the shift factors respectively corresponding to the N parameter sets to obtain the target order.
One shift factor corresponds to the number of shifts of one shifter. The shift factor corresponding to one parameter set is the shift factor corresponding to the quantization coefficient corresponding to the parameter set. Determining the shifting factor from the quantized coefficients is a common technical means in the art and will not be described in detail here. It will be appreciated that the data processing apparatus may determine the number of shifts of the shifter based on the quantized coefficients. The shift factor corresponding to a parameter set is the number of shifts required by the shifter to quantize the parameter set. In practical application, the data processing device aims at minimizing the total shift times of the shifter, and determines the order of quantizing the N parameter sets according to the shift factors corresponding to the parameter sets. Fig. 3 is a schematic diagram of a quantization sequence of a parameter set according to an embodiment of the present application, where, as shown in fig. 3, N parameter sets are divided into four types ABCD, and the magnitude relation of shift times (shift factors) is: assuming that the shifter needs to shift right i, j, k, and l times for quantizing the 4 types of parameter sets, the quantization sequence of each parameter set in the first Layer (Layer 1) is ABCD in sequence, and the quantization sequence of each parameter in the second Layer is DCBA, i.e. shift left 0, k, j, and i times respectively.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; the accuracy of quantization can be improved and the hardware overhead can be reduced.
The foregoing embodiments describe a data processing method of a neural network, and in order to illustrate the difference between the neural network before quantization and the neural network after quantization, the neural network for implementing the function of recognizing a picture is taken as an example, and the difference between the neural network before quantization and the neural network after quantization is described below. Fig. 4 is a comparison diagram of a pre-quantization neural network and a post-quantization neural network according to an embodiment of the present application, where, as shown in fig. 4, the neural network includes an input unit, a calculation processing unit, and an output unit, and the pre-quantization neural network and the post-quantization neural network both obtain pictures through a camera and input image data into the neural network; the output units of the neural network before quantization and the neural network after quantization are used for obtaining probability values of the pictures belonging to a certain class according to an activation function and determining the class of the pictures; the difference between the neural network before quantization and the neural network after quantization is that the parameters (weights and offsets) of each convolution layer are floating point data (float 32) when the calculation processing unit of the neural network before quantization calculates, and the parameters (weights and offsets) of each convolution layer are integer data (Int 8) when the calculation processing unit of the neural network after quantization calculates. The calculation processing unit is a calculation unit mentioned below. The quantized weights and offsets in the neural network are quantized into integer data, so that occupied storage space can be reduced on one hand, and the calculated operation amount can be reduced on the other hand.
The foregoing embodiment quantifies the trained neural network. That is, a neural network is trained prior to quantization of the neural network. A method of training a neural network is described below. Fig. 5 is a flowchart of a method for training a neural network according to an embodiment of the present application, as shown in fig. 5, the method may include:
501. the data processing device inputs image data to the neural network.
The image data shown may be image data to be subject to the target processing. For example, the image data is face image data, and the target process is face recognition.
502. The neural network propagates forward.
The neural network can obtain final output through layer-by-layer deduction of the input feature vectors, and the classification or regression problem is solved through the output results. In practice, neural networks are derived layer by layer using forward propagation algorithms. Forward propagation is a common technical approach in the art and is not described in detail here.
503. The neural network obtains an output result.
The output result may be a classification result of the image data.
504. And the neural network adjusts the weight and the bias according to the comparison result of the output result and the labeling result of the image data.
The data processing apparatus may store the labeling result of each image data.
504. After the neural network converges, the training is stopped.
For example, the neural network is used for recognizing the face of the input face image, and training is stopped after the accuracy of the face recognition reaches a certain value. After the training is stopped, the current neural network is the trained network, namely the trained neural network.
In this embodiment, the weights and biases of the neural network can be quickly adjusted by the forward propagation algorithm, which is simple to implement.
The quantization method of the neural network is described above, and the application of the quantized neural network is described below. Fig. 6 is a flowchart of a processing method based on a quantized neural network according to an embodiment of the present application, where, as shown in fig. 6, the method may include:
601. Inputting data to be processed into a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data.
The target network may be a neural network currently stored in the data processing apparatus, that is, a preset neural network; the data processing device may be a neural network obtained from other devices, such as a cloud server; the data processing device may further quantize a neural network obtained by a reference network, which is a neural network obtained by training and used for performing the target processing on the input data. The target processing may be various processing such as target detection, image segmentation, target recognition, target classification, and target tracking. Optionally, the data contained in the first parameter set and the second parameter set in the target network are integer data. It can be understood that each parameter contained in the convolution layer of the target network is integer data, so that the calculation amount can be greatly reduced when the convolution layer carries out convolution calculation or dot product calculation. In addition, different sets of parameters in the same convolutional layer may correspond to different quantization coefficients. That is, the same convolution layer can quantize the convolution kernel (weight) and the offset by adopting various quantization coefficients, and the quantization precision can be effectively improved.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolution layers and correspond to the same quantization coefficient.
In practical applications, the target network may correspond to F quantized coefficients, and any one parameter set included in the target network corresponds to one of the F quantized coefficients. That is, each parameter set included in the target network is quantized using one quantization coefficient of the F quantization coefficients. Each quantization coefficient corresponds to a quantization mode. It can be seen that the target network corresponds to only F quantization modes. In the case of quantization of parameter sets using amplifiers, the data processing apparatus can complete the quantization operation using F amplifiers. In the case of quantizing a parameter set with a shifter, the shifter also only needs F shifting operations to quantize the parameter set. This can greatly reduce the workload of quantization and reduce the overhead of hardware.
In the implementation manner, parameter sets in different convolution layers can correspond to the same quantization coefficient, so that not only can the accuracy of target processing be improved, but also the cost of hardware can be reduced.
In an alternative implementation, the method for quantifying the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any one of the N parameter sets is a convolution kernel and offset set used for calculating one parameter in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M classes, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes. The foregoing embodiments detail the quantization method and are not described in detail herein.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; the accuracy of quantization can be improved and the workload of quantization operation can be reduced.
602. And carrying out the target processing on the data to be processed through the target network to obtain an output result.
In an optional implementation manner, the parameters included in the first parameter set and the second parameter set are integer data, the performing, by using the target network, the target processing on the data to be processed, and obtaining an output result includes:
calculating convolution or dot product of a convolution kernel contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data is data input by a convolution layer to which the first parameter set belongs;
Calculating the sum of the bias included in the first parameter set and the first data to obtain second data;
performing inverse quantization on the second data to obtain third data, wherein the third data is floating point data;
and storing the third data into a buffer area, wherein the third data is used for calculating the output result.
The data processing device performs convolution operation or dot product operation on the input data by utilizing the integer data quantized by the convolution layer, so that the operation amount of the convolution layer can be reduced. In addition, after the data processing device dequantizes the data output by the convolution layer, the data is stored in the buffer area so as to facilitate the subsequent processing; the accuracy of the target processing can be improved.
In this implementation, the convolution operation or dot product operation is performed by using the quantized parameter set, and the data output by the convolution layer is dequantized; the calculation amount of the convolution layer can be reduced, and the accuracy of target processing is improved.
In the embodiment of the application, the target operation is carried out on the data to be processed by adopting the neural network taking the combination of the convolution kernel and the bias as a quantization unit, so that the accuracy of target processing can be greatly improved, and the calculation time is saved.
An embodiment of face recognition using quantized neural networks is described below. Fig. 7 is a diagram of a picture identifying method according to an embodiment of the present application, where the method may include:
701. The data processing device inputs the picture to the target network.
The target network is a quantized neural network and can be used for picture identification. The target network may read the pixel value of each point of the picture. In practical application, a reference network for picture identification can be obtained through training, and a target network is obtained through quantifying the reference network.
702. The data processing device calculates the probability that the picture belongs to each category through the target network and outputs a probability value.
The probability value may indicate a probability that the picture belongs to each category. The convolution layer of the target network adopts the quantized convolution kernel and offset to process the input data during calculation, so that the calculation complexity can be greatly reduced.
703. The data processing device determines the category of the picture according to the probability value.
Optionally, the data processing device determines the category with the highest probability value as the category of the picture.
In the embodiment of the application, the quantized neural network is used for classifying the pictures, so that the computational complexity can be greatly reduced.
The foregoing embodiments do not describe how the quantized neural network is used to process the data read in by the convolutional layer, and a convolutional layer processing procedure is provided below. Because of the layer-by-layer reasoning characteristics of the deep neural network, general data is calculated from the first layer until the last layer is inferred. Fig. 8 is a calculation method of a convolutional layer in a neural network according to an embodiment of the present application, as shown in fig. 8, the method may include:
801. the ith convolution layer reads in feature map data Di.
The feature map data Di is the data to be processed of the ith convolution layer, namely the feature map data output by the previous layer or the input data to be processed. I=1 is first set, and i increases by 1 after each cycle until the last layer.
802. The sequence number aij of the j-th convolution kernel of the i-layer is read in.
803. The corresponding shift factor bij is read in according to the sequence number aij.
The data processing apparatus may store a shift factor corresponding to each sequence number in the memory.
804. And reading in the j-th convolution kernel Dij of the i layer according to the sequence number aij.
The data in the convolution kernel is quantized integer data.
805. A convolution Ei or dot product Ei of the convolution kernel Dij and the feature map data Di is calculated.
806. Bias is added to Ei and Ei is shifted according to a shift factor.
The bias of each layer in the neural network is quantized integer data.
807. And writing the calculation result D (i+1) into the buffer area.
The calculation result is the result obtained by the convolution calculation or dot product calculation.
808. Judging whether the calculation of the layer is completed.
If yes, the next layer is entered, otherwise, the calculation is performed by using the remaining convolution kernels of the layer, that is, the step 802 is performed (the sequence number of the (j+1) -th convolution kernel of the i layer is read in).
809. It is determined whether all layers have been calculated.
If yes, stopping calculation, otherwise, calculating the next layer, namely executing 801 (i+1) th convolution layer to read in the characteristic map data.
810. The calculation is stopped.
In the embodiment of the application, the convolution layer carries out convolution operation or dot product operation by utilizing the quantized convolution kernel and offset, so that the calculation complexity can be greatly reduced, and the calculation efficiency can be improved.
Fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, as shown in fig. 9, where the apparatus includes:
an input unit 901 for inputting data to be processed to a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
And a calculating unit 902, configured to perform the target processing on the data to be processed through the target network, so as to obtain an output result.
The specific implementation is the same as in fig. 6 and will not be described in detail here.
In an optional implementation manner, the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training and used for performing the target processing on the input data, and a third parameter set in the target network and the first parameter set belong to different convolution layers and correspond to the same quantization coefficient.
In the implementation manner, parameter sets in different convolution layers can correspond to the same quantization coefficient, so that not only can the accuracy of target processing be improved, but also the cost of hardware can be reduced.
In an alternative implementation, the method for quantifying the reference network to obtain the target network includes: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any one of the N parameter sets is a convolution kernel and offset set used for calculating a point in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M classes, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; the accuracy of quantization can be improved and the workload of quantization operation can be reduced.
In an alternative implementation, the apparatus further includes:
An obtaining unit 903, configured to obtain N parameter sets contained in at least one convolution layer of the reference network, where N is greater than or equal to 2;
A clustering unit 904, configured to divide the N parameter sets into M classes, where M is greater than or equal to 2;
a determining unit 905, configured to determine M quantization coefficients corresponding to the M classes, where the M quantization coefficients are in one-to-one correspondence with the M classes;
And a quantization unit 906, configured to quantize the N parameter sets according to the M quantization coefficients, respectively, to obtain the target network, where a quantization coefficient corresponding to a fourth parameter set of the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set of the M classes.
In the implementation mode, N parameter sets contained in a reference network are divided into M classes, quantization coefficients corresponding to the parameter sets in each class are determined, and the reference network is quantized; can improve the accuracy of quantization and reduce the workload of quantization operation
In an optional implementation manner, the determining unit 905 is specifically configured to determine M center points corresponding to the M classes, where the M center points are in one-to-one correspondence with the M classes; and respectively determining the M quantization coefficients corresponding to the M classes one by one according to the M center points.
In the implementation mode, through determining the center points corresponding to the various types, the quantization coefficients corresponding to the various types can be accurately determined, and the implementation is simple.
In an alternative implementation, the apparatus further includes:
An ordering unit 907 configured to, when quantizing the N parameter sets by using a shifter, order the N parameter sets in a target order, where the target order is used to reduce the number of shifts required by the shifter when quantizing the N parameter sets;
The quantization unit 906 is specifically configured to sequentially quantize the N parameter sets according to the target order based on the M quantization coefficients.
In this implementation, by sequentially quantizing the N parameter sets in the target order, the shift operation of the shifter can be reduced, which is simple to implement.
In an optional implementation manner, the determining unit 905 is specifically configured to determine M shift factors that are in one-to-one correspondence with the M quantization coefficients; determining shift factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times the shifter quantizes the fourth parameter set to be shifted; and determining the order of quantizing the N parameter sets according to the shift factors respectively corresponding to the N parameter sets to obtain the target order.
In the implementation manner, the order of quantizing the N parameter sets is determined according to the shift factors respectively corresponding to the N parameter sets, so that the shift times of the shifter are reduced, and the implementation is simple.
In an optional implementation manner, the determining unit 905 is specifically configured to determine N sets of parameters corresponding to the N parameter sets, where the N sets of parameters are in one-to-one correspondence with the N parameter sets, and a target set of parameters in the N sets of parameters includes at least one of a maximum value, a minimum value, a mean value, and a median value corresponding to a target parameter set, where the target parameter set is included in each of the N parameter sets and corresponds to the target set of parameters;
The clustering unit 904 is specifically configured to divide the N groups of parameters into the M classes by using a clustering algorithm; and determining M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
In the implementation mode, the parameter sets are classified through one or more parameters of the maximum value, the minimum value, the mean value, the median and the like corresponding to the parameter sets, the calculation is simple, and a good classification effect can be obtained.
In an optional implementation manner, the determining unit 905 is specifically configured to determine N vectors corresponding to the N sets of parameters one to one;
The clustering unit 904 is specifically configured to divide the N vectors into the M classes by using a clustering algorithm.
In the implementation mode, N vectors corresponding to the N groups of parameters are classified by using a clustering algorithm, so that the classification efficiency is high.
In an optional implementation manner, the calculating unit 902 is configured to calculate a convolution or dot product of a convolution kernel included in the first parameter set and intermediate data, to obtain first data, where the intermediate data is data input by a convolution layer to which the first parameter set belongs; calculating the sum of the bias included in the first parameter set and the first data to obtain second data; the device further comprises:
an inverse quantization unit 908, configured to inverse quantize the second data to obtain third data, where the third data is floating point data;
a storage unit 909 for storing the third data in a buffer, the third data being used for calculating the output result.
In this implementation, the convolution operation or dot product operation is performed by using the quantized parameter set, and the data output by the convolution layer is dequantized; the calculation amount of the convolution layer can be reduced, and the accuracy of target processing is improved.
Fig. 10 is a hardware configuration diagram of a data processing apparatus according to an embodiment of the present invention, which includes a central processing unit (Central Processing Unit, CPU), a neural network processor (Neural Network Processing Unit, NPU), and an external memory. The neural network processor NPU 100NPU is mounted as a coprocessor to a Host CPU (Host CPU) which distributes tasks. The core part of the NPU is an arithmetic circuit 100, and the controller 1004 controls the arithmetic circuit 1003 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 1003 includes a plurality of processing units (PEs) inside. In some implementations, the operational circuit 1003 is a two-dimensional systolic array. The arithmetic circuit 1003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1003 is a general purpose matrix processor.
For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit takes the data corresponding to matrix B from the weight memory 1002 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1001 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in the accumulator 1008 accumulator.
The unified memory 1006 is used for storing input data and output data. The weight data is carried directly into the weight memory 1002 through the memory cell access controller 1005Direct Memory Access Controller,DMAC. The input data is also carried into the unified memory 1006 through the DMAC.
BIU is Bus Interface Unit, bus interface unit 510, for the AXI bus to interact with the DMAC and finger memory 1009Instruction Fetch Buffer.
The bus interface unit 1010 (Bus Interface Unit, abbreviated as BIU) is configured to obtain an instruction from the external memory by the instruction fetch memory 1009, and further configured to obtain the raw data of the input matrix a or the weight matrix B from the external memory by the storage unit access controller 1005.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1006 or to transfer weight data to the weight memory 1002 or to transfer input data to the input memory 1001.
The vector calculation unit 1007 is a plurality of operation processing units that perform further processing such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and the like on the output of the operation circuit as needed. The method is mainly used for non-convolution/FC layer network calculation in the neural network, such as Pooling (pooling), batch Normalization (batch normalization), local Response Normalization (local response normalization) and the like.
In some implementations, the vector calculation unit 1007 can store the vector of processed outputs to the unified buffer 1006. For example, the vector calculation unit 1007 may apply a nonlinear function to an output of the operation circuit 1003, such as a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 1007 generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1003, for example for use in subsequent layers in a neural network.
An instruction fetch memory (instruction fetch buffer) 1009 connected to the controller 1004, for storing instructions used by the controller 1004;
the unified memory 1006, the input memory 1001, the weight memory 1002, and the finger memory 1009 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture. The CPU may realize the functions of the acquisition unit 901, the input unit 902, the clustering unit 903, the determination unit 904, and the sorting unit 906. The NPU may implement the functionality of the computing unit 908. The above-described data processing apparatus also has an amplifier or shifter (not shown in fig. 10) for realizing the functions of the quantization unit 905 and the inverse quantization unit 907. The function of the storage unit 909 is implemented by the unified memory 1006.
In an embodiment of the present invention, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements: inputting data to be processed into a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data; and carrying out the target processing on the data to be processed through the target network to obtain an output result.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (20)

1. A method of data processing, comprising:
Inputting data to be processed into a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
The target network is used for carrying out target processing on the data to be processed to obtain an output result, the data to be processed is image data, and the target processing is any one of target detection, image segmentation, target identification, target classification or target tracking;
The parameters contained in the first parameter set and the second parameter set are integer data, the target processing is performed on the data to be processed through the target network, and the obtaining of an output result includes:
Calculating convolution or dot product of a convolution kernel contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data is data input by a convolution layer to which the first parameter set belongs;
calculating the sum of the bias contained in the first parameter set and the first data to obtain second data;
Performing inverse quantization on the second data to obtain third data, wherein the third data is floating point data;
And storing the third data into a buffer area, wherein the third data is used for calculating the output result.
2. The method according to claim 1, wherein the target network is a neural network obtained by quantizing a reference network, the reference network is a neural network obtained by training for performing the target processing on the input data, and a third parameter set in the target network belongs to a different convolution layer from the first parameter set and corresponds to the same quantization coefficient.
3. The method of claim 2, wherein quantifying the reference network to obtain the target network comprises: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any one of the N parameter sets is a convolution kernel and offset set used for calculating one parameter in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M classes, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
4. The method of claim 1, wherein prior to inputting the data to be processed into the target network, the method further comprises:
acquiring N parameter sets contained in at least one convolution layer of a reference network, wherein N is more than or equal to 2;
dividing the N parameter sets into M classes, wherein M is more than or equal to 2;
determining M quantization coefficients corresponding to the M classes, wherein the M quantization coefficients are in one-to-one correspondence with the M classes;
And respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
5. The method of claim 3 or 4, wherein determining M quantization coefficients corresponding to the M classes comprises:
M center points corresponding to the M classes are determined, and the M center points are in one-to-one correspondence with the M classes;
and respectively determining M quantization coefficients corresponding to the M classes one by one according to the M center points.
6. The method according to claim 3 or 4, wherein quantizing the N parameter sets according to the M quantization coefficients, respectively, to obtain the target network includes:
in the case of quantizing the N parameter sets with a shifter, sorting the N parameter sets in a target order for reducing the number of shifts required by the shifter when quantizing the N parameter sets;
And quantizing the N parameter sets according to the M quantization coefficients in sequence according to the target sequence.
7. The method of claim 6, wherein prior to said ordering of said N parameter sets in the target order, the method further comprises:
determining M shifting factors corresponding to the M quantized coefficients one by one;
Determining shift factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times the shifter quantizes the fourth parameter set to be shifted;
And determining the order of quantizing the N parameter sets according to the shift factors respectively corresponding to the N parameter sets to obtain the target order.
8. The method of claim 4, wherein the classifying the N parameter sets into M classes comprises:
Determining N groups of parameters corresponding to the N parameter sets, wherein the N groups of parameters are in one-to-one correspondence with the N parameter sets, a target group of parameters in the N groups of parameters comprises at least one of a maximum value, a minimum value, a mean value and a median corresponding to a target parameter set, and the target parameter set is contained in the N parameter sets and corresponds to the target group of parameters;
Dividing the N groups of parameters into M classes by adopting a clustering algorithm;
And determining M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
9. The method of claim 8, wherein the employing a clustering algorithm to divide the N sets of parameters into M classes comprises:
Determining N vectors corresponding to the N groups of parameters one by one;
and adopting a clustering algorithm to divide the N vectors into M classes.
10. A data processing apparatus, comprising:
An input unit for inputting data to be processed into a target network; the target network is a neural network comprising at least one convolution layer, one parameter set is a convolution kernel and offset set used for calculating one point in a feature map, a first parameter set and a second parameter set in the target network are convolution kernels and offset sets used for calculating different points in the feature map in the same convolution layer, parameters contained in the first parameter set and the second parameter set are quantized parameters and correspond to different quantization coefficients, and the target network is used for performing target processing on input data;
The computing unit is used for carrying out target processing on the data to be processed through the target network to obtain an output result, the data to be processed is image data, and the target processing is any one of target detection, image segmentation, target identification, target classification or target tracking;
The computing unit is used for computing convolution or dot product of a convolution kernel contained in the first parameter set and intermediate data to obtain first data, wherein the intermediate data is data input by a convolution layer to which the first parameter set belongs; calculating the sum of the bias contained in the first parameter set and the first data to obtain second data; the apparatus further comprises:
The inverse quantization unit is used for inversely quantizing the second data to obtain third data, wherein the third data is floating point data;
and the storage unit is used for storing the third data into a buffer area, and the third data is used for calculating the output result.
11. The apparatus of claim 10, wherein the target network is a neural network derived by quantizing a reference network, the reference network is a neural network derived by training for performing the target processing on the input data, and a third set of parameters in the target network is the same as the corresponding quantization coefficients and belongs to different convolutional layers than the first set of parameters.
12. The apparatus of claim 11, wherein the means for quantizing the reference network to the target network comprises: acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein any one of the N parameter sets is a convolution kernel and offset set used for calculating a point in a feature map, and N is more than or equal to 2; dividing the N parameter sets into M classes, wherein M is more than or equal to 2; determining M quantization coefficients corresponding to the M classes; and respectively quantizing the N parameter sets according to the M quantized coefficients to obtain the target network, wherein the quantized coefficient corresponding to a fourth parameter set in the N parameter sets is a quantized coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
13. The apparatus of claim 10, wherein the apparatus further comprises:
The acquisition unit is used for acquiring N parameter sets contained in at least one convolution layer of the reference network, wherein N is more than or equal to 2;
The clustering unit is used for dividing the N parameter sets into M classes, wherein M is more than or equal to 2;
The determining unit is used for determining M quantization coefficients corresponding to the M classes, and the M quantization coefficients are in one-to-one correspondence with the M classes;
and the quantization unit is used for respectively quantizing the N parameter sets according to the M quantization coefficients to obtain the target network, wherein the quantization coefficient corresponding to the fourth parameter set in the N parameter sets is a quantization coefficient corresponding to a target class, and the target class is a class corresponding to the fourth parameter set in the M classes.
14. The apparatus of claim 13, wherein the device comprises a plurality of sensors,
The determining unit is specifically configured to determine M center points corresponding to the M classes, where the M center points are in one-to-one correspondence with the M classes; and respectively determining M quantization coefficients corresponding to the M classes one by one according to the M center points.
15. The apparatus of claim 13, wherein the apparatus further comprises:
A sorting unit, configured to sort the N parameter sets according to a target order when the N parameter sets are quantized by using a shifter, where the target order is used to reduce the number of shifts required by the shifter when the N parameter sets are quantized;
The quantization unit is specifically configured to sequentially quantize the N parameter sets according to the target sequence according to the M quantization coefficients.
16. The apparatus of claim 15, wherein the device comprises a plurality of sensors,
The determining unit is specifically configured to determine M shift factors corresponding to the M quantization coefficients one to one; determining shift factors respectively corresponding to the N parameter sets; the shift factor corresponding to the fourth parameter set is the number of times the shifter quantizes the fourth parameter set to be shifted; and determining the order of quantizing the N parameter sets according to the shift factors respectively corresponding to the N parameter sets to obtain the target order.
17. The apparatus of claim 13, wherein the device comprises a plurality of sensors,
The determining unit is specifically configured to determine N sets of parameters corresponding to the N parameter sets, where the N sets of parameters are in one-to-one correspondence with the N parameter sets, a target set of parameters in the N sets of parameters includes at least one of a maximum value, a minimum value, a mean value, and a median corresponding to the target parameter set, and the target parameter set is included in the N parameter sets and corresponds to the target set of parameters;
The clustering unit is specifically configured to divide the N groups of parameters into the M classes by using a clustering algorithm; and determining M classes corresponding to the N parameter sets according to the classification result of the N groups of parameters, wherein the class corresponding to the target group of parameters is the class corresponding to the target parameter set.
18. The apparatus of claim 17, wherein the device comprises a plurality of sensors,
The determining unit is specifically configured to determine N vectors corresponding to the N sets of parameters one by one;
The clustering unit is specifically configured to divide the N vectors into the M classes by using a clustering algorithm.
19. A data processing apparatus comprising a processor and a memory, the processor and the memory being interconnected, wherein the memory is adapted to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-9.
20. A computer readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-9.
CN201811034336.4A 2018-09-04 Data processing method, data processing device and computer readable medium Active CN110874627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811034336.4A CN110874627B (en) 2018-09-04 Data processing method, data processing device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811034336.4A CN110874627B (en) 2018-09-04 Data processing method, data processing device and computer readable medium

Publications (2)

Publication Number Publication Date
CN110874627A CN110874627A (en) 2020-03-10
CN110874627B true CN110874627B (en) 2024-06-28

Family

ID=

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Quantized Convolutional Neural Networks for Mobile Devices》;Jiaxiang Wu等;《IEEE Xplore》;第1-3节 *
《Toward a Stable Quantized Convolutional Neural Networks:An Embedded Perspective》;Motaz Al-Hami等;《ResearchGate》;第1-7节 *

Similar Documents

Publication Publication Date Title
CN109478144B (en) Data processing device and method
CN108701250B (en) Data fixed-point method and device
Li et al. A high performance FPGA-based accelerator for large-scale convolutional neural networks
US11593658B2 (en) Processing method and device
US11307865B2 (en) Data processing apparatus and method
US10802992B2 (en) Combining CPU and special accelerator for implementing an artificial neural network
US20190164043A1 (en) Low-power hardware acceleration method and system for convolution neural network computation
US20180330235A1 (en) Apparatus and Method of Using Dual Indexing in Input Neurons and Corresponding Weights of Sparse Neural Network
CN112200295B (en) Ordering method, operation method, device and equipment of sparse convolutional neural network
WO2023231794A1 (en) Neural network parameter quantification method and apparatus
WO2022028323A1 (en) Classification model training method, hyper-parameter searching method, and device
CN113326930A (en) Data processing method, neural network training method, related device and equipment
US20200184245A1 (en) Improper neural network input detection and handling
WO2022088063A1 (en) Method and apparatus for quantizing neural network model, and method and apparatus for processing data
US10733498B1 (en) Parametric mathematical function approximation in integrated circuits
CN110647974A (en) Network layer operation method and device in deep neural network
CN112926570A (en) Adaptive bit network quantization method, system and image processing method
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN115601692A (en) Data processing method, training method and device of neural network model
CN112085175B (en) Data processing method and device based on neural network calculation
US11423313B1 (en) Configurable function approximation based on switching mapping table content
CN111382839B (en) Method and device for pruning neural network
WO2023109748A1 (en) Neural network adjustment method and corresponding apparatus
CN110874627B (en) Data processing method, data processing device and computer readable medium
CN115983362A (en) Quantization method, recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant