CN110968832A - Data processing method and device - Google Patents
Data processing method and device Download PDFInfo
- Publication number
- CN110968832A CN110968832A CN201811148307.0A CN201811148307A CN110968832A CN 110968832 A CN110968832 A CN 110968832A CN 201811148307 A CN201811148307 A CN 201811148307A CN 110968832 A CN110968832 A CN 110968832A
- Authority
- CN
- China
- Prior art keywords
- data
- weight
- data set
- address
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
- G06F17/153—Multidimensional correlation or convolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Complex Calculations (AREA)
Abstract
The application provides a method for processing data and a data processing device, wherein the data processing device comprises a data processing module, and the data processing module is used for acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented as n rows and m columns of weight data, and the data in the first weight data set come from the same input channel; acquiring a second weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing multiplication operation on the first weight matrix and a first characteristic data set, wherein data in the first characteristic data set come from the same input channel; performing multiplication operation by using the second weight matrix and the first characteristic data set; and determining a target data set according to the operation result of the multiplication operation. The technical scheme can reduce the times of accessing the storage device.
Description
Technical Field
The present application relates to the field of information technology, and more particularly, to a method of processing data and a data processing apparatus.
Background
Convolutional Neural Networks (CNNs) are the most widely used algorithms in deep learning, and are widely used in various applications such as image classification, speech recognition, video understanding, and face detection.
The core of the convolutional neural network operation is the convolutional operation. The amount of data that the convolution operation needs to process is typically large. Therefore, the storage and operation resources required by the convolution operation are large. The current processors are increasingly meeting the requirements of being difficult to satisfy convolution operations. In addition, with the development of mobile intelligent devices, the mobile intelligent devices also have the requirement of convolution operation. Mobile devices have limited computing and memory capabilities. Therefore, how to improve the efficiency of convolution operation is an urgent problem to be solved.
Disclosure of Invention
The application provides a method for processing data and a data processing device, which can reduce the times of accessing a storage device.
In a first aspect, an embodiment of the present application provides a data processing apparatus, including: a data processing module, configured to obtain a first weight matrix in a first weight data set, where the first weight matrix is represented by n rows and m columns of weight data, and data in the first weight data set are from the same input channel, where n is an integer greater than or equal to 2 and m is an integer greater than or equal to 2; acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing a first multiplication operation by using the first weight matrix and the first feature data set; performing a second multiplication operation by using the second weight matrix and the first feature data set; the data processing module further comprises a control module for determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
The target data set comprises a product result between elements in the first feature data set and elements in the first weight matrix, a partial Cartesian product and a partial convolution result of the first feature data set and the first weight matrix can be further obtained according to the product result, and the partial Cartesian product and the partial convolution result can be output from the data processing device, so that the convolution result can be predicted through a small calculation amount and a fast calculation rate. For example, assuming that the first weight matrix is a matrix of 3 rows and 3 columns, the second weight matrix is the first weight matrix rearranged by rows, after the data of a certain 3 rows and 3 columns in the first feature data set is input into the data processing module to be multiplied by the first weight matrix and the second weight matrix, respectively, the convolution result of the feature data and the first weight matrix and the convolution portion sum of the feature data of the 3 rows and 3 columns at the adjacent position and the first weight matrix can be obtained according to the target data set, because the feature data of the adjacent position often has continuity, the data processing apparatus can predict the convolution result by using the convolution result and the convolution portion sum in the target data set, for example, when the data processing apparatus performs object identification by using the feature data and according to the scheme provided by the present application, when the obtained convolution result and the convolution portion in the target data set do not conform to the expected value range, the exclusion can be directly performed without performing subsequent calculation, thereby saving the amount of calculation. After the data processing device realizes object identification according to the technical scheme provided by the application, other functions can be further realized by using the object identification result, for example, commodity sorting, target monitoring and the like can be realized by using the object identification result.
In the above scheme, the data processing apparatus obtains a second weight matrix according to the first weight matrix, where the second weight matrix is a matrix in which the first weight matrix is rearranged in rows, and performs multiplication operation with the first feature data set using the first weight matrix and the second weight matrix, and may multiplex the feature data when obtaining partial cartesian products and partial convolution results of the first feature data set and the first weight matrix, thereby improving the operation efficiency.
Specifically, in the prior art, when calculating the convolution of the feature matrix and the weight matrix, the convolution is implemented by sliding the weight matrix on the feature matrix and performing multiplication of the elements of the weight matrix and the corresponding feature data. Because the feature data in the same feature matrix is often used in the multiplication operation after the weight matrix slides for many times, the feature data needs to be loaded for many times in the actual operation. That is, it is necessary to perform a plurality of read operations on the memory in which the feature data is stored, thereby acquiring the feature data a plurality of times. Referring to fig. 1, when calculating the cartesian product of the feature data set and the weight data set, a plurality of convolution steps are required. When the first convolution is executed, the characteristic data a is acquired by reading the memory21To thereby calculate a21And b21The product of (a). When the convolution of the fourth step is calculated (the weight matrix slides from top to bottom and from left to right), the feature data a needs to be obtained again by reading the memory21And calculate a21And b11The product of (a). That is, it is necessary to save the feature data a21The memory of (2) performs multiple read operations, increasing overhead. According to the technical scheme, the weight matrix is rearranged, so that multiplication operation can be performed on more weight matrix elements by loading the feature data once. The number of times the feature data is loaded is reduced. In addition, the multiplexing of the acquired feature data is realized by calculating the product result between the feature data and the elements in the first weight matrix and the product result between the feature data and the elements in the second weight matrix. In conclusion, the scheme improves the operation efficiency.
With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes an address processing module, where the address processing module is configured to: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; the data processing module is used for determining a target data set according to the operation result of the multiplication operation, and comprises: the control module is used for determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.
According to the scheme, an address processing module is introduced, addresses of products of weight data in a first weight matrix and a second weight matrix and feature data in a first data set are calculated through the address processing module, the Cartesian product of the feature data and the weight matrix and convolution results can be further obtained to serve as a target data set, and therefore functions of the data processing device are expanded.
With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing module is further configured to: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; the address processing module is further configured to: acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.
According to the scheme, the first weight matrix with n rows is rearranged according to the rows to obtain n weight matrices, and any two row vectors in the n row vectors in the same row in the n weight matrices are different, so that after multiplication operation is carried out on the feature data and the n weight matrices, a Cartesian product of the feature data and the first weight matrix is obtained, the multiplexing degree of the feature data is improved, and the operation efficiency is further improved.
With reference to the first aspect, in a possible implementation manner of the first aspect, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the address processing module is further configured to calculate, according to the address of the weight data stored in the array, the address of the first feature data set, the size of the first feature matrix, a padding size, and a weight size, determine a first target address, where the weight size is n rows and m columns, the padding size includes a horizontal padding size and a vertical padding size, the horizontal padding size is (n-1)/2, and the vertical padding size is (m-1)/2.
The scheme further refines a method for obtaining the target data address according to the address of the weight data and the address of the feature data, thereby improving the realizability of the data processing device for obtaining the convolution result through the Cartesian product.
With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes a compression module, configured to: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set.
According to the scheme, the characteristic data and the weight data are thinned, namely, elements with the value of 0 in the characteristic data set and the weight data set are removed, so that the calculation amount of convolution operation is reduced, and the operation efficiency of the data processing device is improved.
In a second aspect, an embodiment of the present application provides a data processing method, where the method includes: acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2; acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing a first multiplication operation by using the first weight matrix and the first feature data set; performing a second multiplication operation by using the second weight matrix and the first feature data set; and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; determining a target data set according to operation results of the first multiplication operation and the second multiplication operation, including: and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation and the operation result of the address operation.
With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.
With reference to the second aspect, in a possible implementation manner of the second aspect, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the method further includes: and determining a first target address according to the address of the weight data stored in each address calculation array, the address of the first feature data set, the size corresponding to the first feature matrix, the filling size and the weight size, wherein the weight size is n rows and m columns, the filling size comprises a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1)/2, and the vertical filling size is (m-1)/2.
With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set.
In a third aspect, the present application provides a data processing apparatus comprising a processor and a memory, the memory storing program code, the processor being configured to invoke the program code in the memory to perform a method of data processing as provided in the second aspect of the present application.
Drawings
Fig. 1 is a schematic diagram of a convolution operation process in the prior art.
Fig. 2 is a block diagram of a data processing apparatus according to an embodiment of the present application.
FIG. 3 is a schematic diagram of a data computation array provided by an embodiment of the present application.
Fig. 4 is a block diagram of a data calculation unit in a data calculation array according to an embodiment of the present application.
Fig. 5 is a schematic diagram of a multiplication operation performed on a first feature data set according to an embodiment of the present application.
Fig. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set according to an embodiment of the present application.
FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application.
Fig. 8 is a block diagram of a structure of an address calculation unit in an address calculation array according to an embodiment of the present application.
Fig. 9 is a schematic diagram of weight data stored in two data calculation arrays according to an embodiment of the present application.
Fig. 10 is a schematic diagram of weight data stored in a data calculation array according to an embodiment of the present application.
Fig. 11 is a schematic diagram of a weight matrix with 3 filters and performing sparsification according to an embodiment of the present application.
Fig. 12 is a schematic diagram of a weight matrix that is not subjected to thinning processing according to an embodiment of the present application.
Fig. 13 is a schematic flowchart of a data processing method according to an embodiment of the present application.
Fig. 14 is a block diagram illustrating a result of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
In the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a. b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c can be single or multiple. In addition, in the embodiments of the present application, the words "first", "second", and the like do not limit the number and the execution order.
It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
Fig. 1 is a schematic diagram of a convolution operation process in the prior art.
Fig. 1 shows a feature data set, which comprises a total of 5 × 5 feature data. Fig. 1 also shows a set of weight data, which comprises a total of 3 × 3 weight data. The weight data set can be used as a convolution kernel to perform convolution operation with the data feature set.
Fig. 1 also shows a schematic of a two-step operation with step size 1 in the process of performing a convolution operation on a feature data set with a weight data set. As shown in fig. 1, 3 × 3 weight data in the feature data set need to be multiplied by 3 × 3 data in the feature data set, respectively. The results of the multiplication are added to obtain the value of one data of the convolution result. Specifically, according to FIG. 1, the convolution result c11Can be expressed as equation 1.1, convolution result c22Can be expressed as equation 1.2:
c11=a11×b11+a12×b12+a13×b13+a21×b21+a22×b22+a23×b23+a31×b31+a32×b32+a33×b33equation 1.1
c12=a12×b11+a13×b12+a14×b13+a22×b21+a23×b22+a24×b23+a32×b31+a33×b32+a34×b33Equation 1.2
After the two-step operation shown in fig. 1 is completed, the feature data set continues to slide to the right, and the next operation continues until the entire feature data set is traversed.
Assumption set E1={a11,a12,a13,a21,a22,a23,a31,a32,a33Is set F1={b11,b12,b13,b21,b22,b23,b31,b32,b33}. For set E1And set F1Performing Cartesian product operation to obtain a set G1Set G1A plurality of multiplication results as shown in table 1 may be included.
TABLE 1
a11×b11 | a11×b12 | a11×b13 | a11×b21 | a11×b22 | a11×b23 | a11×b31 | a11×b32 | a11×b33 |
a21×b11 | a21×b12 | a21×b13 | a21×b21 | a21×b22 | a21×b23 | a21×b31 | a21×b32 | a21×b33 |
a31×b11 | a31×b12 | a31×b13 | a31×b21 | a31×b22 | a31×b23 | a31×b31 | a31×b32 | a31×b33 |
a12×b11 | a12×b12 | a12×b13 | a12×b21 | a12×b22 | a12×b23 | a12×b31 | a12×b32 | a12×b33 |
a22×b11 | a22×b12 | a22×b13 | a22×b21 | a22×b22 | a22×b23 | a22×b31 | a22×b32 | a22×b33 |
a32×b11 | a32×b12 | a32×b13 | a32×b21 | a32×b22 | a32×b23 | a32×b31 | a32×b32 | a32×b33 |
a13×b11 | a13×b12 | a13×b13 | a13×b21 | a13×b22 | a13×b23 | a13×b31 | a13×b32 | a13×b33 |
a23×b11 | a23×b12 | a23×b13 | a23×b21 | a23×b22 | a23×b23 | a23×b31 | a23×b32 | a23×b33 |
a33×b11 | a33×b12 | a33×b13 | a33×b21 | a33×b22 | a33×b23 | a33×b31 | a33×b32 | a33×b33 |
Set E as shown in Table 11And set F1Includes the calculation of c11When required to be usedThe result of all multiplications: a is11×b11、a12×b12、a13×b13、a21×b21、a22×b22、a23×b23、a31×b31、a32×b32、a33×b33. The calculation of c is also included in the results of the Cartesian product operation of the set E and the set F12Partial multiplication results needed to be used: a is12×b11、a13×b12、a22×b21、a23×b22、a32×b31、a33×b32。
Assumption set E2={a12,a13,a14,a22,a23,a24,a32,a33,a34}. For set E2And set F1Performing Cartesian product operation to obtain a set G2Set G2A plurality of multiplication results as shown in table 2 may be included.
TABLE 2
a12×b11 | a12×b12 | a12×b13 | a12×b21 | a12×b22 | a12×b23 | a12×b31 | a12×b32 | a12×b33 |
a22×b11 | a22×b12 | a22×b13 | a22×b21 | a22×b22 | a22×b23 | a22×b31 | a22×b32 | a22×b33 |
a32×b11 | a32×b12 | a32×b13 | a32×b21 | a32×b22 | a32×b23 | a32×b31 | a32×b32 | a32×b33 |
a13×b11 | a13×b12 | a13×b13 | a13×b21 | a13×b22 | a13×b23 | a13×b31 | a13×b32 | a13×b33 |
a23×b11 | a23×b12 | a23×b13 | a23×b21 | a23×b22 | a23×b23 | a23×b31 | a23×b32 | a23×b33 |
a33×b11 | a33×b12 | a33×b13 | a33×b21 | a33×b22 | a33×b23 | a33×b31 | a33×b32 | a33×b33 |
a14×b11 | a14×b12 | a14×b13 | a14×b21 | a14×b22 | a14×b23 | a14×b31 | a14×b32 | a14×b33 |
a24×b11 | a24×b12 | a24×b13 | a24×b21 | a24×b22 | a24×b23 | a24×b31 | a24×b32 | a24×b33 |
a34×b11 | a34×b12 | a34×b13 | a34×b21 | a34×b22 | a34×b23 | a34×b31 | a34×b32 | a34×b33 |
Set E as shown in Table 22And set F1Includes the calculation of c12Partial multiplication results needed to be used: a is14×b13、a24×b23、a34×b33。
In the calculation of c shown in tables 1 and 211And c12The unneeded multiplication result may also be applied in subsequent convolution operations.
As can be seen from the analysis of the above convolution and Cartesian product operation processes, the convolution can be decomposed into Cartesian product operations. The operation result obtained by one Cartesian product operation can be used for multi-step convolution operation. The one-step convolution operation may be one or more times of adding the results of the cartesian product operations.
Fig. 2 is a block diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 200 shown in fig. 2 includes: a storage module 210, a data processing module 220, an address processing module 230, and a control module 240.
The storage module 210 is configured to store a first feature data set, an address of each feature data in the first feature data set, a first weight value set, and an address of each weight value in the first weight value set.
The data processing module 220 includes N data computation arrays. Each of the N data calculation arrays includes N × m data calculation units, where N is a positive integer greater than or equal to 2, and m is a positive integer greater than or equal to 2.
The address processing module 230 includes N address calculation arrays. Each of the N address calculation arrays includes N × m address calculation units.
Each data calculation array is configured to obtain n × m weight data from the storage module 210, and store the obtained weight data in n × m data calculation units of each data calculation array.
Each address calculation array is configured to obtain addresses of n × m weight data from the storage module 210 and store the obtained addresses of the weight data in n × m address calculation units of each address calculation array. The address of the weight data stored in the N address calculation arrays is the address of the weight data stored in the N data calculation arrays. In other words, the N address calculation arrays correspond to the N data calculation arrays one to one, and each of the N address calculation arrays stores an address of the weight data stored in the corresponding data calculation array. For example, assume that the weight data stored in one of the N data calculation arrays is b11,b12、b13、b21、b22、b23、b31、b32、b33If the address calculation array corresponding to the data calculation array in the N address calculation arrays stores the address b11Address of b12Address of b13Address of b21Address of b22Address of b23Address of b31Address of b32Address of (a) and (b)33The address of (2).
The N data calculation arrays multiply the first feature data set using the weight data held by the N data calculation arrays. In the operation process of the first characteristic data set, the weight data stored by the N data calculation arrays are unchanged.
Similarly, the N address calculation arrays perform address operations on the addresses of the first feature data set using the addresses of the weight data held by the N address calculation arrays, wherein the addresses of the weight data held by the N address calculation arrays remain unchanged during the address operations on the addresses of the first feature data set.
And the control module 240 is configured to determine a target data set according to the operation result of the multiplication operation and the operation result of the address operation by the N data calculation arrays.
Therefore, the N data calculation arrays may determine, according to the multiplication result and the operation result of the address operation, an operation result of performing a convolution operation on the first feature data set by the weight data stored in the N data calculation arrays. In other words, in some embodiments, the target data set may be a set of data obtained by performing a convolution operation on the first feature data set by using the weight data stored in the N data calculation arrays.
The operation of the first feature data set shown in fig. 1 by using the saved weight data for the N data calculation arrays is described below with reference to fig. 1 and fig. 3 to 5.
FIG. 3 is a schematic diagram of a data computation array provided by an embodiment of the present application. The data calculation array 300 shown in fig. 3 includes 9 data calculation units, which are a data calculation unit 311, a data calculation unit 312, a data calculation unit 313, a data calculation unit 321, a data calculation unit 322, a data calculation unit 323, a data calculation unit 331, a data calculation unit 332, and a data calculation unit 333.
It will be appreciated that the data computation array may include input-output cells (not shown) in addition to the data computation cells shown in FIG. 3. The input-output unit is used to acquire the data that needs to be input to the data computation array 300. The input/output unit is also used for inputting the data required to be output by the data calculation array 300 to the corresponding unit and/or module. For example, the input/output unit may obtain the weight data and the feature data from the storage module, and send the obtained weight data and feature data to the corresponding data calculation unit. The input and output unit is also used for acquiring the target data calculated by each data calculation unit and sending the target data to the storage module.
Optionally, in some embodiments, data transfer between the various compute units in the data compute array is unidirectional. Taking fig. 3 as an example, arrows for connecting the data computing units in fig. 3 may indicate a unidirectional transmission direction of data. Take the data calculation unit 311, the data calculation unit 312, and the data calculation unit 313 as an example. The data calculation unit 311 may transmit data (e.g., feature data) to the data calculation unit 312, and the data calculation unit 312 cannot transmit the data to the data calculation unit 311. The data calculation unit 312 may transmit data to the data calculation unit 313, and the data calculation unit 313 cannot transmit data to the data calculation unit 312.
Fig. 4 is a block diagram of a data calculation unit in a data calculation array according to an embodiment of the present application. As shown in fig. 4, the data calculation unit 400 may include a storage subunit 401 and a data calculation subunit 402. It will be appreciated that the data computation unit 400 may also include an input output subunit. The input and output subunit is used for acquiring the data required to be acquired by the data calculation unit and outputting the data required to be output by the data calculation unit.
Specifically, the data calculation array 300 shown in fig. 3 may obtain 3 × 3 weight data in the weight data set shown in fig. 1, and store the 3 × 3 weight data in the 3 × 3 data calculation units of the data calculation array 300, respectively.
Specifically, weight data b11Weight data b can be stored in a storage subunit of the data calculation unit 31112Weight data b can be stored in a memory sub-unit of the data calculation unit 31213May be stored in a storage subunit of the data calculation unit 313, and so on. Thus, 3 × 3 weight data are stored in the data calculation array 300.
After 3 × 3 pieces of weight data are stored, the data calculation array 300 may slide the first feature data set in one direction, and multiply the first feature data set by using the weight data stored in the data calculation array 300. In the process of multiplying the first feature data set by the data calculation array 300, the weight data stored in the data calculation array 300 is not changed. In other words, during the multiplication operation of the first feature data by the data calculation array 300, the data calculation units in the data calculation array 300 do not delete the saved weight data. Accordingly, the data calculation unit will not read and store new weight data from the storage module.
The manner in which the first feature data set slides in one direction may be referred to in fig. 5. Fig. 5 is a schematic diagram of a process of multiplying the first feature data set according to an embodiment of the present application. As shown in fig. 5, the first feature data set may be first flipped 180 degrees. As shown in fig. 5, column 1 of the first feature data set becomes column 5 after flipping, column 2 becomes column 4 after flipping, and so on. It should be noted that, as shown in fig. 5, the first feature data set is first flipped 180 and then slid to the right for convenience of describing the feature data a11、a21、a31、a12、a22、a32、a13、a23、a33And weight data b11、b21、b31、b12、b22、b32、b13、b23And b33The calculation process of (2). In practical implementation, the first feature data set may be directly multiplied by the weight data stored in the data calculation array 300 by sliding to the right. The operation result of the first characteristic data set directly performing the multiplication operation by sliding right is the same as the data value of the operation result of the first characteristic data set performing the multiplication operation by firstly turning over 180 degrees and then sliding right in the manner shown in fig. 5, except that the obtained final data have different sequence.
The first feature data set after being turned over slides to the right in a single direction, and multiplication operation is respectively carried out on the first feature data set and weight data stored in the data calculation array 300. Specifically, at the time of the first operation, the feature data a11、a21And a31Respectively associated with weight data b11、b21And b31Multiplication. After the first operation, the reversed first characteristic data set slides to the right, and a second operation is carried out. At the second operation, the feature data a11、a21And a31Respectively associated with weight data b12、b22And b32Multiplication, feature data a12、a22And a32Respectively associated with weight data b11、b21And b31Multiplication. After the second operation, the feature data set after the turnover continues to slide to the right, and the third operation is carried out, and so on. In the above embodiment, the step size of each sliding of the first feature data set is 1. Of course, in some other embodiments, the step size of each sliding of the first feature data set may be a positive integer greater than 1.
Taking the first operation as an example, the data calculation unit 311 may obtain the feature data a from the first feature data set stored in the storage module 21011And the acquired feature data a11Stored in a storage sub-unit of the data calculation unit 311. In this case, the weight data b is stored in the memory sub-unit of the data calculating unit 31111And characteristic data a11. The data calculation subunit in the data calculation unit 311 holds the storage weight data b in the storage subunit11And characteristic data a11Multiplying to obtain intermediate data k(11,11). Weight data b11And characteristic data a11The multiplication operation may be implemented by a multiplier in the data calculation subunit.
The data calculation unit 311 may also acquire the cache data r held in the first target address based on the target address determined by the address calculation unit corresponding to the data calculation unit 311(11,11). Specifically, the address calculation unit corresponding to the data calculation unit 311 may calculate the address based on the feature data a11Address and weight data b11Determines the first target address. The data calculation unit 311 may acquire the current cache data r held in the first target address(11,11). The manner in which the address calculation unit determines the first target address will be described later. The data calculation subunit calculates the intermediateData k(11,11)And the current cache data r(11,11)Adding to obtain target data d(11,11). The intermediate data k(11,11)And the current cache data r(11,11)May be implemented by an adder in the data computation subunit. The target data d(11,11)May be saved to the first target address. In other words, the current cache data r held in the first target address(11,11)Is updated to the target data d(11,11)。
Similarly, the data calculation unit 321 can determine the weight data b held by the data calculation unit 321 in the same manner21And characteristic data a21Product of (c) (hereinafter referred to as intermediate data k)(21,21)). The target address determined by the address calculation unit corresponding to the data calculation unit 321 is also the first target address. The data calculating unit 321 converts the intermediate data k(21,21)The current cache data stored with the first target address (the current cache data is updated to the target data d at this time)(11,11)) Adding to obtain target data d(21,21). The target data d(21,21)May be saved to the first target address. In other words, the current cache data d held in the first target address(11,11)Is updated to the target data d(21,21)。
The data calculation unit 331 can determine the weight data b held by the data calculation unit 331 according to the same manner31And characteristic data a31Product of (c) (hereinafter referred to as intermediate data k)(31,31)). The target address determined by the address calculation unit corresponding to the data calculation unit 331 is also the first target address. The data calculation unit 331 calculates the intermediate data k(31,31)The current cache data stored with the first target address (the current cache data is updated to the target data d at this time)(21,21)) Adding to obtain target data d(31,31). The target data d(31,31)May be saved to the first target address. In other words, the current cache data d held in the first target address(21,21)Is updated to the target data d(31,31)。
After the first operation, the target data stored in the first target address is a11×b11+a21×b21+a31×b31。
In a similar manner, the data computation array 300 may continue to operate on the first set of feature data using the weight data maintained by the data computation units in the data computation array 300.
After the third operation, the data stored in the first target address is a11×b11+a21×b21+a31×b31+a12×b12+a22×b22+a32×b32. That is, in the third operation, the target address determined by the address calculation unit corresponding to the data calculation unit 312, the data calculation unit 322, and the data calculation unit 332 is also the first target address. Therefore, after the third budget, the target data stored in the first target address is the data stored in the first target address after the first operation and the data a determined by the data calculating unit 31212×b12A determined by the data calculation unit 32222×b22And a determined by the data calculation unit 33232×b32And (4) summing. After the fifth operation, the data stored in the first target address is a11×b11+a21×b21+a31×b31+a12×b12+a22×b22+a32×b32+a13×b13+a23×b23+a33×b33. That is, in the fifth operation, the target address determined by the address calculation unit corresponding to the data calculation unit 313, the data calculation unit 323, and the data calculation unit 333 is also the first target address. Therefore, after the fifth operation, the target data stored in the first target address is the data stored in the first target address after the third operation and a determined by the data calculation unit 31313×b13A determined by the data calculating unit 32323×b23And a determined by the data calculating unit 33333×b33And (4) summing.
Thus, after five calculations, the data stored in the first target address is the convolution result c shown in equation 1.111. Similarly, the convolution operation of the first feature data set and the weight data set can be completed by using the multiplication operation and the address operation result.
The address calculation of the addresses of the first feature data set shown in fig. 1 by using the addresses of the saved weight data by the N address calculation arrays is described below with reference to fig. 1 and 3 to 8.
Fig. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set according to an embodiment of the present application. The address of the first characteristic data set shown in fig. 6 is the address of the first characteristic data set shown in fig. 1. In particular, the address Adda11Is characteristic data a11Address of, address Adda12Is characteristic data a12And so on. The addresses of the weight data sets as shown in fig. 6 are the addresses of the weight data sets as shown in fig. 1. In particular, the address Addb11Is weight data b11Address of, address Addb12Is weight data b12And so on.
FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application. The address calculation array 700 shown in fig. 7 includes 9 data calculation units in total, which are an address calculation unit 711, an address calculation unit 712, an address calculation unit 713, an address calculation unit 721, an address calculation unit 722, an address calculation unit 723, an address calculation unit 731, an address calculation unit 732, and an address calculation unit 733, respectively.
It will be appreciated that the address calculation array may include input-output units (not shown) in addition to the address calculation unit shown in fig. 7. The input-output unit is used to obtain the data that needs to be input into the address calculation array 700. The input/output unit is further configured to input data that needs to be output by the address calculation array 700 to a corresponding unit and/or module. For example, the input/output unit may obtain the address of the weight data and the address of the feature data from the storage module, and send the obtained address of the weight data and the address of the feature data to the corresponding address calculation unit. The input and output unit is further used for acquiring the target address calculated by each address calculation unit and sending the target address to the corresponding data calculation unit.
The N address calculation arrays correspond to the N data calculation arrays one by one. The one-to-one correspondence here means that one of the N data calculation arrays corresponds to one of the N address calculation arrays, and the address calculation arrays corresponding to different data calculation arrays are different. For example, assuming that N is equal to 3, 3 data calculation arrays are the data calculation array 1, the data calculation array 2, and the data calculation array 3, respectively, and 3 address calculation arrays are the address calculation array 1, the address calculation array 2, and the address calculation array 3, respectively. The data calculation array 1 corresponds to the address calculation array 1, the data calculation array 2 corresponds to the address calculation array 2, and the data calculation array 3 corresponds to the address calculation array 3. The address calculation array corresponding to the data calculation array is used to calculate the target address of each target data in the data calculation array. Further, the data calculation units in the data calculation array are in one-to-one correspondence with the address calculation units in the address calculation array. Assuming that the data calculation array shown in fig. 3 corresponds to the address calculation array shown in fig. 7, the data calculation unit 311 corresponds to the address calculation unit 711, the data calculation unit 312 corresponds to the address calculation unit 712, the data calculation unit 313 corresponds to the address calculation unit 731, and so on. The address calculation unit is used for determining the address of the target data of the corresponding data calculation unit. Specifically, the cache data r acquired by the data calculation unit 311 as described above(11,11)The first target address is obtained by the address calculation unit 711 through address operation.
Fig. 8 is a block diagram of a structure of an address calculation unit in an address calculation array according to an embodiment of the present application. As shown in fig. 8, the address calculation unit 800 may include a storage sub-unit 801 and an address calculation sub-unit 802. It is understood that the address calculation unit 800 may further include an input-output subunit. The input and output subunit is used for acquiring the data required to be acquired by the address calculation unit and outputting the data required to be output by the address calculation unit.
Specifically, the address calculation array 700 shown in fig. 7 may obtain addresses of 3 × 3 weight data in the addresses of the weight data set shown in fig. 6, and store the addresses of the 3 × 3 weight data in 3 × 3 data calculation units of the address calculation array 700, respectively.
In particular, the address Addb11Can be stored in a memory sub-unit of the address calculation unit 711, address Addb12May be stored in a memory subunit of the address calculation unit 712, address Addb13May be stored in a memory subunit of the address calculation unit 713, and so on. Thus, 3 × 3 addresses of weight data are stored in the address calculation array 700.
After storing the addresses of 3 × 3 pieces of weight data, the address calculation array 700 may perform unidirectional sliding on the address of the first feature data set, and perform address operation on the address of the first feature data set using the address of the weight data stored in the address calculation array 700. In the process of performing address operation on the address of the first feature data set by the address calculation array 700, the address of the weight data stored in the address calculation array 700 is not changed. In other words, during the address operation of the address calculation array 700 on the address of the first feature data, the address calculation unit in the address calculation array 700 does not delete the address of the saved weight data. Correspondingly, the address calculation unit will not read and store the address of the new weight data from the storage module.
The process of performing address calculation by unidirectional address sliding to the right of the first feature data set is similar to the process of performing multiplication by unidirectional address sliding to the right of the first feature data set, and thus, the description is not repeated here.
How the address calculation unit performs the address operation will be described below.
For convenience of description, hereinafter, the address of the weight acquired by the address calculation unit 800 is referred to as the address of the first weight, the address of the feature data acquired by the address calculation unit 800 is referred to as the address of the first feature data, and the address obtained by the address calculation unit 800 performing the address operation is referred to as the first target address.
The input/output subunit in the address calculation unit 800 may obtain the following information in addition to the address of the first feature data and the address of the first weight data from the storage module: the size of the input data corresponding to the first feature data set, the fill size, and the weight size, where the weight size is the size of the address calculation array to which the address calculation unit 800 belongs, and the fill size is a preset size. In this example, the weight size is 3 × 3. The size of the input data corresponding to the first feature data set, the fill size, and the weight size may also be saved in the storage subunit 801 of the address calculation unit 800. The address calculation subunit 801 may determine the first target address according to the first weight data address, the first feature data address, the size of the input data corresponding to the first feature data set, the padding size, and the weight size.
Assuming that the input picture size is a rows and b columns and the convolution kernel size is n rows and m columns, the convolved output picture size is (a-n +1) × (b-m + 1). This has two problems: 1, after each convolution operation, reducing the size of an output picture; and 2, fewer pixel points of corners and edge regions of the original picture are adopted in output, and much information of the edge position of the output picture is lost.
To address these issues, the original picture may be padded (Padding) on the boundary to increase the size of the matrix before the convolution operation is performed. 0 is usually used as a padding value.
And if the number of the horizontal and vertical extension pixel points is respectively p and q, the size of the filled original picture is (a +2p) x (b +2q), the size of the convolution kernel is kept unchanged by n rows and m columns, the size of the output picture is unchanged, and the size of the output picture is (a +2p-n +1) x (b +2q-m + 1). The number of pixel points p and q extending in each direction is the fill size. It can be concluded that the lateral filling dimension p is equal to (n-1)/2 and the longitudinal filling dimension q is equal to (m-1)/2.
The address calculation subunit 801 may determine the target address specifically according to the following formula:
result_cord=(input_cord/input_sizex-w_cord/kernel_sizex+padding_sizex)×input_sizey+(input_cord%input_sizey-w_cord%kernel_sizey+padding_sizey) (formula 1.3)
Wherein,% represents remainder, result _ code represents the target address, input _ code represents the address of the characteristic data, and input _ sizexAn abscissa, input _ size, representing the size of the input data corresponding to the first set of feature datayAn ordinate indicating the size of input data corresponding to the first feature data set, w _ code indicating the address of the weight data, kernel _ sizexThe abscissa, kernel _ size, representing the weight sizeyOrdinate, padding _ size, indicating the weight sizexIndicates the horizontal padding size, padding _ sizeyIndicating the vertical fill size.
The addresses of the feature data and the addresses of the weight data in equation 1.3 are absolute addresses. The absolute address refers to an absolute position of the feature data/weight data in the corresponding feature data set/weight data set. Suppose the feature data set includes X feature data, and the absolute address of the xth feature data in the X feature data is X-1, where X is a positive integer greater than 1 and less than or equal to X. For example, the feature data set includes: 5, 0, 0, 32, 0, 0, 0, 0, 23, the absolute addresses of feature data 5, 32, and 23 are: 0,3,8. The absolute addresses listed above refer to the positions of the feature data in the feature data, and may be converted into addresses composed of abscissa and ordinate according to the specification of the feature matrix. Similarly, the absolute address of the weight data may be converted into an address formed by the abscissa and the ordinate.
Optionally, in some embodiments, the address calculation subunit 801 may further determine the target address according to the following formula:
result_cord=((base_input+input_cord)/input_sizex-(base_w+w_cord)/kernel_sizex+padding_sizex)×input_sizey+((base_cord+input_cord)%input_sizey-(base_w+w_cord)%kernel__sizey+padding__sizey) (equation 1.4)
Wherein,% represents remainder, result _ code represents the target address, input _ code represents the address of the characteristic data, and input _ sizexAn abscissa, input _ size, representing the size of the input data corresponding to the first set of feature datayAn ordinate indicating the size of input data corresponding to the first feature data set, w _ code indicating the address of the weight data, kernel _ sizexThe abscissa, kernel _ size, representing the weight sizeyOrdinate, padding _ size, indicating the weight sizexIndicates the horizontal padding size, padding _ sizeyIndicates the vertical fill size, base _ input indicates the base address of the feature data, and base _ w indicates the base address of the weight data.
The address of the feature data and the address of the weight data in equation 1.4 are relative addresses. The relative address refers to the position of the feature data/weight data in the corresponding feature data set/weight data set relative to the address of the first feature data/weight data. And assuming that the address of the first feature data combined by the feature data is Y, the address of the ith feature data in the feature data set is Y + Y-1, wherein Y and Y are positive integers greater than or equal to 1.
Optionally, in some embodiments, after determining the target address, the address calculation unit may directly send the target address to the corresponding data calculation unit. The data calculation unit may determine the cache data in the target address according to the target address.
Optionally, in other embodiments, after determining the target address, the address calculation unit may determine cache data in the target address, and then send the cache data and the target address together to the corresponding data calculation unit.
The above describes how a data compute array performs multiplication operations and an address compute array performs address operations.
As described above, the data processing apparatus may include 2 or more data calculation arrays and corresponding address calculation arrays.
The weight data set shown in fig. 1 only includes 3 × 3 weight data, and only one weight data set is used for performing convolution operation on the feature data set. Optionally, in other embodiments, two or more weight data sets used for performing convolution operation on the feature data set may also be used.
Optionally, in some embodiments, each of the N data calculation arrays may obtain and store a weight data set, and perform a multiplication operation on the first feature data set by using the stored weight data. Correspondingly, each address calculation array in the N address calculation arrays may obtain and store an address of corresponding weight data, and perform multiplication operation on the address of the first feature data set by using the address of the stored weight data.
If the number of the weight data sets used for performing convolution operation on the feature data set is greater than N, the N data calculation arrays can acquire the N weight data sets each time to perform multiplication operation on the first feature data set. And if the number of the weight data sets which can be obtained at one time is less than N, obtaining all the weight data sets to multiply the first characteristic data set. Assuming that the value of N is 4, the number of weight data sets is 9. In this case, the 4 data calculation arrays may first obtain the 1 st to 4 th weight data sets to multiply the first feature data set, then the 4 data calculation arrays obtain the 5 th to 8 th weight data sets to multiply the first feature data set, and then the 4 data calculation arrays obtain the 9 th weight data set to multiply the first feature data set. The manner of address operation performed by the N address calculation arrays is similar, and thus, the description thereof is not repeated.
Optionally, in other embodiments, the weight data stored in different data calculation arrays of the N data calculation arrays may be the result of rearranging the same weight data in rows. For example, it is assumed that the N data calculation arrays include a first data calculation array and a second data calculation array, and that the N × m weight data stored in the second data calculation array is N × m weight data obtained by rearranging the N × m weight data stored in the first data calculation array in rows.
Fig. 9 is a schematic diagram of weight data stored in two data calculation arrays according to an embodiment of the present application.
As shown in FIG. 9, the data calculation array 1 stores 3 × 3 weight data, where the weight data in the first row is b11,b12And b13(ii) a The second row weight data is b21,b22And b23(ii) a The third row weight data is b31,b32And b33. The data calculation array 2 stores 3 x 3 weight data, wherein the weight data in the first row is b31,b32And b33(ii) a The second row weight data is b11,b12And b13(ii) a The third row weight data is b21,b22And b23. It can be seen that the result of the rearrangement of the weight data stored in the data calculation array 1 is the weight data stored in the data calculation array 2. Correspondingly, the weight data stored in the data calculation array 1 may also be considered as a result of rearranging the weight data stored in the data calculation array 2 in rows. For convenience of description, the weight data obtained after such line rearrangement is referred to as rearranged weight data, and the weight data stored in the two data calculation arrays shown in fig. 9 are referred to as mutually rearranged weight data.
FIG. 9 shows the relationship between the weight data held by two data calculation arrays. Optionally, in some embodiments, the weight data stored in any two of the three or more data calculation arrays are also mutually rearranged weight data. For example, the N data calculation arrays further include a data calculation array 3 as shown in fig. 10, the data calculation array 3 stores 3 × 3 weight data, wherein the weight data in the first row isb21,b22And b23(ii) a The second row weight data is b31,b32And b33(ii) a The third row weight data is b11,b12And b13. Therefore, as shown in fig. 9, the weight data stored in the data calculation array 1 and the data calculation array 3 are rearranged weight data; the weight data stored in the data calculation array 2 and the data calculation array 3 are also rearranged mutually. In summary, if the value of N is greater than or equal to N, the weight data includes N rows in total, the weight data may be rearranged at most N-1 times, and the weight data stored in the 2 nd to nth data calculation arrays in the N data calculation arrays are all weight data obtained by rearranging the weight data stored in the 1 st data calculation array in the N data calculation arrays by rows, where any two row vectors of the row vectors in which the N weight data stored in the N data calculation arrays are located in the same row are different. N is a positive integer greater than or equal to N. In this case, the first data calculation array and the second data calculation array are any two data calculation arrays of the n data calculation arrays. In other words, the first row weight data stored in each of the n data calculation arrays is the second row weight data to the nth row weight data in the remaining n-1 data calculation arrays, respectively.
Optionally, in some embodiments, the data calculation array 2 and the data calculation array 3 may obtain 3 × 3 pieces of weight data as shown in fig. 1, and then perform data rearrangement to obtain rearranged weight data.
Optionally, in other embodiments, the storage module may store the rearrangement weight data, and the data calculation array 2 and the data calculation array 3 directly obtain the rearrangement weight data from the storage module.
It can be understood that, since the data calculation array corresponds to the address calculation array, the address of the weight data held by the second address calculation array corresponding to the second data calculation array is also the result of the rearrangement by rows of the addresses of the weight data held by the first address calculation array corresponding to the first data calculation array.
Similarly, if the value of N is greater than or equal to N, the weight data includes N rows in total, and the address of the weight data also includes N rows. The addresses of the weight data can be rearranged for N-1 times at most, and the addresses of the weight data stored in the 2 nd to nth address calculation arrays in the N address calculation arrays are all addresses of the weight data after the addresses of the weight data stored in the 1 st address calculation array in the N address calculation arrays are rearranged according to rows. N is a positive integer greater than or equal to N. In this case, the first address calculation array and the second address calculation array are any two address calculation arrays of n address calculation arrays. In other words, the address of the first row weight data stored by each address calculation array in the n address calculation arrays is the address of the second row weight data to the address of the nth row weight data in the rest n-1 data calculation arrays respectively.
The access times of the data calculation array and the address calculation array to the storage module can be further reduced by multiplexing the characteristic value data after the weight data and the corresponding weight data address are rearranged according to rows.
For example, in the process of performing convolution operation on the feature data set shown in fig. 1 by using the weight data set shown in fig. 1, it is also necessary to determine an operation result shown in equation 1.3:
c21=a21×b11+a22×b12+a23×b13+a31×b21+a32×b22+a33×b23+a41×b31+a42×b32+a43×b33equation 1.4
If the weight data stored in the second data calculation array after the weight data is rearranged is as shown in fig. 9, a partial result of formula 1.4 can be obtained after one access to the storage module.
Specifically, when the data calculation array 2 shown in fig. 9 multiplies the feature data set by the saved weight data, a can be obtained21×b11A result of the operation of (a)22×b12A result of the operation of (a)23×b13A result of the operation of (a)31×b21A result of the operation of (a)32×b22A sum of the operation results of33×b23The operation result of (1). According to the operation rule described above, the sum of the above 6 operation results is stored in the same target address.
It is assumed that the data processing apparatus includes only the data calculation array 1 and the data calculation array 2 and that the weight data held by the data calculation array 1 and the data calculation array 2 is as shown in fig. 9. In the process of multiplying the feature data set shown in fig. 1 by using the data calculation array 1 and the data calculation array 2, after multiplying the feature data in the first row to the third row of the feature data set, the data calculation array 1 and the data calculation array 2 may multiply the feature data in the third row to the fifth row of the feature data set. In other words, the step size of the downslide may be 2 in the multiplication operation through the feature data set. If a is desired to be obtained without rearranging the weight data (in other words, the data processing apparatus has only the data calculation array 1 shown in FIG. 9)21×b11、a22×b12、a23×b13For example, after the multiplication of the feature data of the first row to the third row is completed, the multiplication of the feature data of the second row to the fourth row needs to be performed by using the data calculation array 1. The multiplication requires that the feature data of the second to third rows of the feature data set be acquired again. In other words, the second to third rows of the feature data set need to be read for the second time to obtain a21×b11、a22×b12、a23×b13Etc., which results in the same feature data needing to be read multiple times.
Because the weight data is rearranged, the operation result of the data calculation array 2 for multiplying the feature data in the second row to the third row of the feature data set is equivalent to the operation result of the data calculation array 1 for multiplying the feature data in the second row to the third row after sliding downwards with the step length of 1. In other words, the feature data in the second row to the third row of the feature data set can be read once, and multiplication of the feature data in the second row to the third row by two weight data sets can be realized. Thus more partial cartesian products can be obtained with one reading of the characteristic data. In practice, partial cartesian products of the feature data set and the weight data set are used for prediction, so that the number of accesses to the storage module can be reduced and the data processing speed can be increased by rearranging the weight data in rows, performing multiplication operation on the feature data set, the original weight data and the rearranged weight data respectively according to the feature data set, and obtaining a target data set including partial cartesian products according to the result.
When the first weight matrix with n rows is rearranged (n-1) times, and any two row vectors in the n row vectors of the obtained n weight matrices in the same row are different, after the characteristic data set and the n weight matrices are subjected to multiplication, the Cartesian product of the characteristic data set and the first weight matrix can be obtained, and the convolution of the characteristic data set and the first weight matrix can be further obtained, wherein each characteristic data in the characteristic data set only needs to be loaded into the data processing unit once.
Fig. 10 is a schematic diagram of weight data stored in a data calculation array according to an embodiment of the present application.
In the process of multiplying the first row to the third row of feature data of the feature data set shown in fig. 1 by using the weight data shown in fig. 10, a can be obtained31×b11、a32×b12And a33×b13The operation result of (1). The first data calculation array, the second data calculation array, and the third data calculation array may multiply the feature data of the fourth row to the fifth row of the feature data set after the multiplication of the feature data of the first row to the third row of the feature data set. In other words, the feature data set is traversedThe step size of the downslide may be 3 during the multiplication.
Assuming that there are three data calculation arrays among the N data calculation arrays, which are respectively the data calculation array 1 shown in fig. 9, the data calculation array 2 and the data calculation array 3 shown in fig. 10, the three data calculation arrays can complete the cartesian product operation on the feature data set.
And also with characteristic data a11、a21、a31、a12、a22、a32、a13、a23、a33For example. These three data calculation arrays may be respectively associated with the feature data a11、a21、a31、a12、a22、a32、a13、a23、a33The multiplication process as shown in fig. 5 is performed. The three data calculation arrays use the weight data stored respectively to complete the feature data a11、a21、a31、a12、a22、a32、a13、a23、a33The operation results of the multiplication in (2) are shown in table 1.
In summary, if the weight data includes n rows, the weight data can be rearranged n-1 times at most. If the weight data is rearranged for one time, the step length of downward sliding can be 2 in the process of traversing the characteristic data set to carry out multiplication operation; if the weight data is rearranged twice, the step length of downward sliding can be 3 in the process of traversing the characteristic data set to carry out multiplication operation; if the weight data is rearranged for n-1 times, the step length of downward sliding can be n in the process of traversing the feature data set to perform multiplication operation.
Optionally, in some embodiments, the first feature data set is a feature data set obtained by performing a thinning process on the second feature data set. The first weight data set is a weight data set obtained after thinning processing. The data processing apparatus 200 as shown in fig. 2 may further comprise a compression module. The compression module is used for acquiring a second characteristic data set, and performing sparsification processing on the second characteristic data set to obtain the first characteristic data set, wherein the second characteristic data set comprises characteristic data corresponding to input data. The compression module is further configured to obtain a second weight data set, and perform sparsification on the second weight data set to obtain the first weight data set. The compression module is further configured to determine an address of each feature data in the first feature data set, and determine an address of each weight data in the first weight data set. The compression module sends the acquired first feature data set, the acquired first weight data set, the address of each feature data in the first feature data set and the address of each weight data in the first weight data set to a storage module, and the storage module stores the addresses. If the weight data after thinning is less than n × m, the remaining bits are complemented by 0.
The input data referred to in the embodiments of the present application may be any data capable of multiplication, cartesian product operation, and/or convolution operation. For example, image data, voice data, and the like may be used. The input data is a generic term of all data input to the data processing apparatus. The input data may consist of characteristic data. The feature data corresponding to the input data may be all data included in the input data or may be partial feature data of the input data. Taking image data as an example, assuming that the input data is a whole image, all data of the image is referred to as feature data. The second weight data set may include all feature data of the input data, or may be all or part of feature data of the image after some processing. For example, the second weight data may be feature data of a partial image obtained by segmenting the image.
Assume that the second set of feature data includes: 5, 0, 0, 32, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 43, 54, 0, 0, 0, 0, 1, 4, 9, 34, 0, 0, 0, 0, 0, 0, 87, 0, 0, 0, 0, 5, 8, the first feature data set obtained after the thinning includes: 5,32, 23, 43, 54,1,4,9, 34, 87,5,8. Assuming that the address of the first feature data in the second feature data set is 0, the address of the second feature data is 1, the address of the third feature data is 2, and the address of the nth feature data is n-1, the address (absolute address) of the first feature data set is: 0,3,8, 14, 15, 19, 20, 21, 22, 29, 34, 35.
Assuming that the second weight data set includes 8, 4, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 24, 54, 0, 0, 0, 0, 0, 0, 12, 0, 0, 22, 3, 45, 0, 0, 0, 0, 67, 44, 0, 0, 0, 0, 0, 0, 0, 0, 35, 65, 75, the thinned second weight data set includes: 8,4,2, 24, 54, 12, 22,3, 45, 67, 44, 35, 65, 75. It can be seen that the thinned second set of weight data includes 14 weight data. Assume that each data calculation array includes 3 × 3 data calculation units. Therefore, the number of the weight data in the thinned second weight data set is less than the number of data calculation units included in 2 data calculation arrays. Therefore, the second weight data set after the thinning is supplemented with 40 s finally, and the first weight data set is obtained. Therefore, the set of first weight data corresponding to the second weight data is: 8,4,2, 24, 54, 12, 22,3, 45, 67, 44, 35, 65, 75,0,0,0,0. Assuming that the address of the first weight data in the second weight data set is 0, the address of the second weight data is 1, the address of the third weight data is 2, and the address of the nth weight data is n-1, the address (absolute address) of the first weight data set is: 0,1,6, 16, 17, 23, 26, 27, 28, 33, 34, 43, 44, 45.
In some embodiments, the first feature data set may also be a feature data set that has not been subjected to a thinning process. In other words, the first set of characteristic data may be equal to the second set of characteristic data.
The first feature data set in the above embodiment corresponds to a matrix, and correspondingly, the weight data for performing the convolution operation on the first feature data set also corresponds to a matrix. In other words, the convolution operation described in the above embodiment is a two-dimensional convolution operation.
The technical solution of the embodiment of the present application may also be applied to T-dimensional multiplication, cartesian product calculation, and/or convolution calculation (T is a positive integer greater than or equal to 3). In addition, a plurality of weight data sets for performing convolution operation on the first feature data set may be provided.
The following describes the technical solution of the present application by taking a three-dimensional convolution operation as an example.
If the input data corresponding to the first set of feature data is a color picture data, the first set of feature data may be a three-dimensional tensor. A three-dimensional convolution operation may be performed on the first set of feature data.
The first set of feature data includes three subsets: a feature data subset 1, a feature data subset 2, and a feature data subset 3. The feature data of the three subsets correspond to the three input channels, red, green and blue, respectively. The feature data in each of the three subsets may correspond to a matrix.
It is assumed that three sets of weight data are used to perform a three-dimensional convolution operation on the first set of feature data. The set of weight data used to perform the convolution operation on the set of feature data may also be referred to as a Filter (Filter). Thus, the three sets of weight data may be referred to as filter 1, filter 2, and filter 3. Each of the three weight data sets includes three weight channels, which are channel 1, channel 2, and channel 3. The weight data included in each of the three weight channels may correspond to a matrix. The three weight channels correspond to the three feature data subsets one to one. For example, lane 1 corresponds to feature data subset 1, lane 2 corresponds to feature data subset 2, and lane 3 corresponds to feature data subset 3. The weight channel may perform convolution operations on the corresponding feature data subsets. Filter 1, filter 2, and filter 3 may each perform a three-dimensional convolution operation on the first feature data set. That is, the channel 1 of the filter 1 performs convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 1 performs convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 1 performs convolution operation on the feature data subset 3 of the first feature data set; the channel 1 of the filter 2 performs convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 2 performs convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 2 performs convolution operation on the feature data subset 3 of the first feature data set; the channel 1 of the filter 3 performs convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 3 performs convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 3 performs convolution operation on the feature data subset 3 of the first feature data set.
It can be seen that the process of performing the three-dimensional convolution operation on the first feature data set by each of the three filters can be decomposed into three two-dimensional convolution operation processes. The specific implementation of these three two-dimensional convolution operations is similar to that of the two-dimensional convolution operation in the above-described embodiment. Taking the example that the channel 1 performs convolution operation on the feature data subset 1, the channel 1 may be regarded as the weight data set shown in fig. 1, and the feature data subset 1 may be regarded as the feature data set shown in fig. 1. The process of performing convolution operation on the feature data subset by the channel 1 is the process of performing convolution operation on the feature data set by the weight data set as shown in fig. 1. As described above, the convolution process can be decomposed into multiplication and addition operations. Therefore, the data processing apparatus shown in fig. 2 can also perform a three-dimensional convolution operation. In the case where the input data corresponding to the feature data set is a three-dimensional tensor, the first feature data set referred to in the above embodiments may be regarded as a subset of feature data in the feature data set corresponding to the three-dimensional tensor. In the case of performing convolution operation on the feature data set by using a plurality of weight data sets, the first weight data set may be regarded as one weight data set of the plurality of weight data sets. In the case that the set of weight data also corresponds to a three-dimensional tensor, the first set of weight data can be considered as one channel of the set of weight data of the three-dimensional tensor.
The process of the multidimensional convolution operation of three or more dimensions is similar to the three-dimensional convolution operation process, and the description is not repeated here.
Optionally, in a case where a plurality of weight data sets are used to perform convolution operation on the feature data set, the first weight data set may also be a weight data set obtained by performing thinning processing on the plurality of weight data sets. Specifically, the non-0 weight data included in the first weight data set is from one channel of the same weight data set or the same channel of different weight data sets.
The thinning process for the multiple weight data sets will be described below with reference to fig. 11.
Fig. 11 is a schematic diagram of a weight matrix with 3 filters and performing sparsification according to an embodiment of the present application. Each of the 3 filters shown in fig. 11 includes 3 weight channels, and each weight channel includes 3 × 3 weight data.
As shown in FIG. 11, the weight data of weight data set 1 is from the weight data in channel 1 of filter 1 and filter 2, and the weight data of weight data set 4 is from the weight data in channel 1 of filter 2 and filter 3. The weight data of the weight data set 2 comes from the weight data in the channel 2 of the filter 1 and the filter 2, and the weight data of the weight data set 5 comes from the weight data in the channel 2 of the filter 2 and the filter 3. The weight data of the weight data set 3 is from the weight data in the channel 3 of the filter 1 and the filter 2, and the weight data of the weight data set 6 is from the weight data in the channel 3 of the filter 2 and the filter 3.
As shown in fig. 11, the same channel where the weight data is from different weight data means that the weight data may belong to different filters, but the channels in different filters are the same. The weight data, e.g., weight data set 4, comes from the weight data in channel 1 of filter 2 and the weight data in channel 1 of filter 3.
For convenience of description, a weight data set obtained by thinning the weight data in the plurality of filters is referred to as a thinned weight data set hereinafter)
In some embodiments, the weight data included in the set of sparse weight data may come from the same filter. The operation process of the feature data multiplication by the sparse weight data set and the process of determining the convolution operation result of the sparse weight data set and the feature data according to the operation result of the multiplication are the same as those in the above embodiments, and thus, the description thereof is not repeated.
In some embodiments, the weight data included in the set of sparse weight data may come from different filters. The operation process of multiplying the feature data by the sparse weight data set is the same as that of the above embodiment, and thus, the description is not repeated here. In the case that the weight data included in the thinned weight data set can come from different filters, the process of determining the convolution operation result of the thinned weight data set and the feature data according to the operation result of the multiplication operation is not exactly the same as the above-described embodiment.
Specifically, it is assumed that the set of thinned weight data includes weight data from P filters (P is a positive integer greater than or equal to 2). The set of sparse weight data may be divided into P subsets of sparse weight data, a pth subset of sparse weight data of the P subsets of sparse weight data comprising weight data from a pth filter of the P filters, P being 1, … …, P. Suppose that the p-th sparse weight data subset includes NumpA weight value data of NumpIs a positive integer greater than or equal to 1, and NumpLess than n x m.
And carrying out Cartesian product operation on the sparse weight data set and the characteristic data set by utilizing the N data calculation arrays to obtain multiplication results required by convolution operation of each filter and the characteristic data set, and adding the corresponding multiplication results to obtain the convolution operation result of each filter and the characteristic data set.
The weight data stored in the three data calculation arrays shown in fig. 9 and 10 are also taken as an example. Suppose that the weight data shown in FIGS. 9 and 10 areThe weight data of the channel 1 of the filter 1 and the weight data of the channel 1 of the filter 2 shown in fig. 12 are thinned. Array pairs { a ] were calculated using the three data shown in FIG. 1011,a12,a13,a21,a22,a23,a31,a32,a33Carrying out Cartesian product operation to obtain the following operation results: a is11×b11、a12×b12、a13×b13、a21×b21、a22×b22、a23×b23、a11×b31、a12×b32、a13×b33. Can see a11×b11、a12×b12、a13×b13、a21×b21、a22×b22、a23×b23The sum of (a) is the weight data pair of channel 1 of filter 111,a12,a13,a21,a22,a23,a31,a32,a33Carrying out convolution operation on the operation result; a is11×b31、a12×b32、a13×b33Is the weight data pair of channel 1 of filter 2 { a11,a12,a13,a21,a22,a23,a31,a32,a33And (6) carrying out convolution operation on the operation result.
Further, the compression module may also perform sparsification on the target data set, and delete 0 in the target data set.
Through the technical scheme, the product of each feature data in the first feature data set and each weight data in the first weight data set can be obtained. After that, the convolution operation result of the first feature data set and the first weight data set can be obtained by adding the corresponding product results.
In addition, in the above embodiment, in the process of determining the convolution operation result of the first feature data set and the first weight value set according to the operation result of the cartesian product and the address operation result, each data calculation unit in the data calculation array adds the product of the weight value data and the feature data to the data stored in the target address determined by the corresponding address calculation unit, and writes the added data back to the target address. Thus, the final result held by the target address is the convolution operation result.
In other embodiments, each data calculation unit in the data calculation array may perform only a multiplication operation, that is, multiply the data by the feature data, store the multiplication result to the target address determined by the corresponding address calculation unit, then obtain the multiplication result from the corresponding target address, and add the obtained multiplication results to obtain the corresponding convolution operation result. For example, a11×b11The result of (2) is stored at the target address 1, a21×b21The result of (2) is stored at the target address 2, a31×b31The result of (2) is stored at the target address 3, a12×b12The result of (2) is stored at the target address 4, a22×b22The result of (2) is stored at the target address 5, a32×b32The result of (2) is stored at the target address 6, a13×b13The result of (2) is stored at the target address 7, a23×b23The result of (2) is stored at the target address 8, a33×b33The result of (2) is stored at the target address 9. When calculating the convolution result, the data stored in the target addresses 1 to 9 can be added to obtain c shown in formula 1.111。
In other embodiments, the memory module may include an addition unit. Each data calculation unit in the data calculation array can only perform multiplication operation, namely, data is multiplied by the characteristic data, the multiplication result is output to the storage module, and when the storage module stores the received data to the target address determined by the address calculation unit corresponding to the data calculation unit, the storage module firstly adds the received data and the data stored in the target address, and stores the added data to the target address. Thus, the final result held by the target address is the result of the convolution operation.
Fig. 13 is a schematic flow chart of a data processing method provided according to an embodiment of the present application. The method shown in fig. 13 may be performed by the data processing apparatus shown in fig. 2 or fig. 14.
1301, a first weight matrix in a first weight data set is obtained, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set are from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2.
1302, a second weight matrix is obtained, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix in rows.
1303, a multiplication operation is performed on the first weight matrix and a first feature data set, where data in the first feature data set are from the same input channel.
And 1304, multiplying the first feature data set by the second weight matrix.
1305, a target data set is determined based on the operation result of the multiplication.
Specific implementation manners of the steps of the method shown in fig. 13 can be seen from the descriptions of fig. 2 to fig. 12, and thus, detailed descriptions thereof are omitted.
Optionally, in some embodiments, the method further includes: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; the determining a target data set according to the operation result of the multiplication operation includes: and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.
Optionally, in some embodiments, the method further includes: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.
Optionally, in some embodiments, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the method further includes: and determining a first target address according to the address of the weight data stored in each address calculation array, the address of a first feature data set, the size corresponding to the first feature matrix, a filling size and a weight size, wherein the weight size is n rows and m columns, and the filling size is the difference between the size of the first feature data set and the size of the result matrix. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.
Optionally, in some embodiments, the method further includes: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.
Fig. 14 is a block diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 1400 shown in fig. 14 includes: data processing module 1401 and control module 1404, data processing module 1401 includes N data calculation units, N is an integer greater than or equal to 2, wherein: a data processing module 1401, configured to obtain a first weight matrix in a first weight data set, where the first weight matrix is represented by n rows and m columns of weight data, and data in the first weight data set are from the same input channel, where n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2; acquiring a second weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing multiplication operation on the first weight matrix and a first characteristic data set, wherein data in the first characteristic data set come from the same input channel; performing multiplication operation by using the second weight matrix and the first characteristic data set; the control module 1404 is configured to determine a target data set according to an operation result of the multiplication.
Optionally, in some embodiments, the data processing apparatus 1400 further includes an address processing module 1402, where the address processing module 1402 includes N address calculation units, and the data calculation units and the address calculation units are in one-to-one correspondence, where: the address processing module 1402 is configured to: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; the control module 1404 is configured to determine, according to an operation result of the multiplication operation, a target data set including: and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.
Optionally, in some embodiments, the data processing module 1401 is further configured to: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; the address processing module 1402 is further configured to: acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.
Optionally, in some embodiments, the target data set includes a result matrix, where the result matrix is a result of performing a convolution operation on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the address processing module 1402 is further configured to determine a first target address according to an address of the weight data stored in each address calculation array, an address of the first feature data set, a size corresponding to the first feature matrix, a padding size, and a weight size, where the weight size is n rows and m columns, and the padding size is a difference between a size of the first feature data set and a size of the result matrix.
Optionally, in some embodiments, the data processing apparatus 1400 further comprises a compression module 1403 for: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set.
The detailed functions and advantages of the modules in the data processing apparatus 1400 shown in fig. 14 can be referred to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.
In the embodiment of the application, the terminal device or the network device includes a hardware layer, an operating system layer running on the hardware layer, and an application layer running on the operating system layer. The hardware layer includes hardware such as a Central Processing Unit (CPU), a Memory Management Unit (MMU), and a memory (also referred to as a main memory). The operating system may be any one or more computer operating systems that implement business processing through processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system. The application layer comprises applications such as a browser, an address list, word processing software, instant messaging software and the like. Furthermore, the embodiment of the present application does not particularly limit the specific structure of the execution main body of the method provided by the embodiment of the present application, as long as the communication can be performed according to the method provided by the embodiment of the present application by running the program recorded with the code of the method provided by the embodiment of the present application, for example, the execution main body of the method provided by the embodiment of the present application may be a terminal device or a network device, or a functional module capable of calling the program and executing the program in the terminal device or the network device.
In addition, various aspects or features of the present application may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer-readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, etc.), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD), etc.), smart cards, and flash memory devices (e.g., erasable programmable read-only memory (EPROM), card, stick, or key drive, etc.). In addition, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media capable of storing, containing, and/or carrying instruction(s) and/or data.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (11)
1. A data processing apparatus, characterized in that the data processing apparatus comprises:
the data processing module is used for acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2;
acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix in rows;
performing a first multiplication operation using the first weight matrix and the first feature data set
Performing a second multiplication operation by using the second weight matrix and the first feature data set;
and the control module is used for determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
2. The data processing apparatus of claim 1, further comprising:
an address processing module to: acquiring addresses of weight data in the first weight matrix and the second weight matrix;
address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set;
the control module is used for controlling the operation of the electronic device,
and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.
3. The data processing apparatus of claim 2,
the data processing module is further configured to: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix in rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different;
the address processing module is further configured to:
acquiring addresses of weight data in the third weight matrix to the nth weight matrix;
and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.
4. A data processing apparatus according to claim 2 or 3, wherein the target data set comprises a result matrix, the result matrix being a result of a convolution operation of the first feature data set with the first weight data set, the first feature data set being represented as a first feature matrix;
the address processing module is further configured to determine a first target address according to an address of weight data stored in each address calculation array, an address of a first feature data set, a size of the first feature matrix, a filling size, and a weight size, where the weight size is n rows and m columns, the filling size includes a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1)/2, and the vertical filling size is (m-1)/2.
5. The data processing apparatus according to any of claims 1 to 4, further comprising a compression module for: acquiring a second characteristic data set, and removing elements with a median value of 0 in the second characteristic data set to obtain the first characteristic data set;
acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set;
determining an address of each feature data in the first feature data set, and determining an address of each weight value in the first weight value data set.
6. A method of data processing, the method comprising:
acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2;
acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix in rows;
performing a first multiplication operation by using the first weight matrix and the first feature data set;
performing a second multiplication operation by using the second weight matrix and the first feature data set;
and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
7. The method of claim 6, further comprising:
acquiring addresses of weight data in the first weight matrix and the second weight matrix;
address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set;
determining a target data set according to operation results of the first multiplication operation and the second multiplication operation, including:
and determining a target data set according to the operation result of the first multiplication operation, the operation result of the second multiplication operation and the operation result of the address operation.
8. The method of claim 7, further comprising: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrix in rows, and any two row vectors in the n row vectors in the same row of the first to nth weight matrixes are different;
acquiring addresses of weight data in the third weight matrix to the nth weight matrix;
and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.
9. The method according to claim 7 or 8, wherein the target data set comprises a result matrix, the result matrix being a result of a convolution operation of the first feature data set with the first weight data set, the first feature data set being represented as a first feature matrix,
the method further comprises the following steps:
and determining a first target address according to the address of the weight data stored in each address calculation array, the address of a first feature data set, the size corresponding to the first feature matrix, a filling size and a weight size, wherein the weight size is n rows and m columns, the filling size comprises a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1)/2, and the vertical filling size is (m-1)/2.
10. The method according to any one of claims 5 to 9, further comprising:
acquiring a second characteristic data set, and removing elements with a median value of 0 in the second characteristic data set to obtain the first characteristic data set;
acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set;
determining an address of each feature data in the first feature data set, and determining an address of each weight value in the first weight value data set.
11. A data processing apparatus, characterized in that the data processing apparatus comprises:
a processor and a memory, the memory storing program code, the processor for invoking the program code in the memory to perform a method of data processing according to any one of claims 6 to 10.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811148307.0A CN110968832B (en) | 2018-09-29 | 2018-09-29 | Data processing method and device |
PCT/CN2019/102252 WO2020063225A1 (en) | 2018-09-29 | 2019-08-23 | Data processing method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811148307.0A CN110968832B (en) | 2018-09-29 | 2018-09-29 | Data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110968832A true CN110968832A (en) | 2020-04-07 |
CN110968832B CN110968832B (en) | 2023-10-20 |
Family
ID=69951080
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811148307.0A Active CN110968832B (en) | 2018-09-29 | 2018-09-29 | Data processing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110968832B (en) |
WO (1) | WO2020063225A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI799169B (en) * | 2021-05-19 | 2023-04-11 | 神盾股份有限公司 | Data processing method and circuit based on convolution computation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326985A (en) * | 2016-08-18 | 2017-01-11 | 北京旷视科技有限公司 | Neural network training method, neural network training device, data processing method and data processing device |
CN107402905A (en) * | 2016-05-19 | 2017-11-28 | 北京旷视科技有限公司 | Computational methods and device based on neutral net |
CN107844827A (en) * | 2017-11-28 | 2018-03-27 | 北京地平线信息技术有限公司 | The method and apparatus for performing the computing of the convolutional layer in convolutional neural networks |
US20180096226A1 (en) * | 2016-10-04 | 2018-04-05 | Magic Leap, Inc. | Efficient data layouts for convolutional neural networks |
CN108122030A (en) * | 2016-11-30 | 2018-06-05 | 华为技术有限公司 | A kind of operation method of convolutional neural networks, device and server |
US20180165575A1 (en) * | 2016-12-08 | 2018-06-14 | Via Alliance Semiconductor Co., Ltd. | Neural network unit with mixed data and weight size computation capability |
-
2018
- 2018-09-29 CN CN201811148307.0A patent/CN110968832B/en active Active
-
2019
- 2019-08-23 WO PCT/CN2019/102252 patent/WO2020063225A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402905A (en) * | 2016-05-19 | 2017-11-28 | 北京旷视科技有限公司 | Computational methods and device based on neutral net |
CN106326985A (en) * | 2016-08-18 | 2017-01-11 | 北京旷视科技有限公司 | Neural network training method, neural network training device, data processing method and data processing device |
US20180096226A1 (en) * | 2016-10-04 | 2018-04-05 | Magic Leap, Inc. | Efficient data layouts for convolutional neural networks |
CN108122030A (en) * | 2016-11-30 | 2018-06-05 | 华为技术有限公司 | A kind of operation method of convolutional neural networks, device and server |
US20180165575A1 (en) * | 2016-12-08 | 2018-06-14 | Via Alliance Semiconductor Co., Ltd. | Neural network unit with mixed data and weight size computation capability |
CN107844827A (en) * | 2017-11-28 | 2018-03-27 | 北京地平线信息技术有限公司 | The method and apparatus for performing the computing of the convolutional layer in convolutional neural networks |
Non-Patent Citations (1)
Title |
---|
王智超;徐及;张鹏远;颜永红;: "卷积神经网络声学模型的结构优化和加速计算", 重庆邮电大学学报(自然科学版), no. 03 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI799169B (en) * | 2021-05-19 | 2023-04-11 | 神盾股份有限公司 | Data processing method and circuit based on convolution computation |
Also Published As
Publication number | Publication date |
---|---|
CN110968832B (en) | 2023-10-20 |
WO2020063225A1 (en) | 2020-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112840356B (en) | Operation accelerator, processing method and related equipment | |
US10909447B2 (en) | Transposing neural network matrices in hardware | |
CN107145939B (en) | Computer vision processing method and device of low-computing-capacity processing equipment | |
KR102315346B1 (en) | Performing Average Pooling in Hardware | |
CN111465924B (en) | System and method for converting matrix input into vectorized input for matrix processor | |
JP7007488B2 (en) | Hardware-based pooling system and method | |
KR20190066473A (en) | Method and apparatus for processing convolution operation in neural network | |
US11328395B2 (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN110188869B (en) | Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm | |
KR20200100190A (en) | Image Transformation for Machine Learning | |
CN112749666B (en) | Training and action recognition method of action recognition model and related device | |
KR20200081044A (en) | Method and apparatus for processing convolution operation of neural network | |
CN110109646B (en) | Data processing method, data processing device, multiplier-adder and storage medium | |
US12106222B2 (en) | Neural network training under memory restraint | |
WO2022041188A1 (en) | Accelerator for neural network, acceleration method and device, and computer storage medium | |
CN113918120A (en) | Computing device, neural network processing apparatus, chip, and method of processing data | |
CN114138231B (en) | Method, circuit and SOC for executing matrix multiplication operation | |
CN111767243A (en) | Data processing method, related device and computer readable medium | |
US20170076211A1 (en) | Feature-converting device, feature-conversion method, learning device, and recording medium | |
CN118193914A (en) | LU decomposition method, device, equipment and storage medium for distributed platform | |
CN117851742A (en) | Data storage method, data processing method, data memory and data processor | |
CN110968832B (en) | Data processing method and device | |
CN116400884A (en) | Control method and device of multiplier-adder computer device and storage medium | |
CN116304677A (en) | Channel pruning method and device for model, computer equipment and storage medium | |
CN115424038A (en) | Multi-scale image processing method, system and device and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |