CN110968832B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN110968832B
CN110968832B CN201811148307.0A CN201811148307A CN110968832B CN 110968832 B CN110968832 B CN 110968832B CN 201811148307 A CN201811148307 A CN 201811148307A CN 110968832 B CN110968832 B CN 110968832B
Authority
CN
China
Prior art keywords
data
weight
address
data set
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811148307.0A
Other languages
Chinese (zh)
Other versions
CN110968832A (en
Inventor
梁晓峣
景乃锋
崔晓松
廖健行
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201811148307.0A priority Critical patent/CN110968832B/en
Priority to PCT/CN2019/102252 priority patent/WO2020063225A1/en
Publication of CN110968832A publication Critical patent/CN110968832A/en
Application granted granted Critical
Publication of CN110968832B publication Critical patent/CN110968832B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

The application provides a method for processing data and a data processing device, wherein the data processing device comprises a data processing module, a first weight matrix and a second weight matrix, wherein the data processing module is used for acquiring a first weight matrix in a first weight data set, the first weight matrix is expressed as n rows and m columns of weight data, and the data in the first weight data set come from the same input channel; acquiring a second weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; multiplying a first weight matrix with a first characteristic data set, wherein data in the first characteristic data set come from the same input channel; multiplying the first characteristic data set by using the second weight matrix; and determining a target data set according to the operation result of the multiplication operation. The technical scheme can reduce the times of accessing the storage equipment.

Description

Data processing method and device
Technical Field
The present application relates to the field of information technology, and more particularly, to a method of processing data and a data processing apparatus.
Background
Convolutional neural networks (convolutional neural networks, CNN) are the most widely used algorithms in deep learning, and are widely used in a variety of applications such as image classification, speech recognition, video understanding, face detection, etc.
The core of convolutional neural network operations is convolutional operations. The amount of data that needs to be processed by convolution operations is typically large. The storage and operation resources required for convolution operations are therefore large. Current processors are increasingly meeting the need for convolution operations. In addition, with the development of mobile intelligent devices, the mobile intelligent devices also have the requirement of convolution operation. But mobile devices have limited computing and storage capabilities. Therefore, how to improve the efficiency of convolution operation is a problem to be solved.
Disclosure of Invention
The application provides a method for processing data and a data processing device, which can reduce the times of accessing a storage device.
In a first aspect, an embodiment of the present application provides a data processing apparatus, including: the data processing module is used for acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is expressed as n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2; acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing a first multiplication operation by using the first weight matrix and the first characteristic data set; performing a second multiplication operation with the first characteristic data set by using the second weight matrix; the data processing module further comprises a control module for determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
The target data set comprises a product result between elements in the first characteristic data set and elements in the first weight matrix, and a partial Cartesian product and partial convolution result of the first characteristic data set and the first weight matrix can be further obtained according to the product result, and the partial Cartesian product and partial convolution result can be output from the data processing device, so that prediction of the convolution result can be achieved through smaller calculated amount and faster calculation rate. For example, assuming that the first weight matrix is a matrix of 3 rows and 3 columns, and the second weight matrix is a first weight matrix rearranged according to rows, when the data input data processing module of a certain 3 rows and 3 columns in the first feature data set is multiplied by the first weight matrix and the second weight matrix respectively, a convolution result of the feature data and the first weight matrix and a convolution part sum of feature data of 3 rows and 3 columns in adjacent positions and the first weight matrix can be obtained according to the target data set, and because the feature data of adjacent positions often have continuity, the data processing device can predict the convolution result by using the convolution result and the convolution part sum in the target data set, for example, when the data processing device performs object recognition by using the feature data and according to the scheme provided by the application, when the convolution result and the convolution part in the obtained target data set are different from the expected value range, the subsequent calculation can be directly performed without performing subsequent calculation, thereby saving the calculation amount. After the data processing device realizes the object recognition according to the technical scheme provided by the application, other functions can be realized by further utilizing the object recognition result, for example, goods, monitoring targets and the like can be sorted by utilizing the object recognition result.
In the above scheme, the data processing device obtains the second weight matrix according to the first weight matrix, where the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows, and performs multiplication operation on the first weight matrix and the second weight matrix with the first feature data set, so that feature data can be multiplexed when a partial cartesian product and a partial convolution result of the first feature data set and the first weight matrix are obtained, thereby improving operation efficiency.
Specifically, when the convolution of the feature matrix and the weight matrix is calculated in the prior art, the convolution is realized by sliding the weight matrix on the feature matrix and performing multiplication operation of the weight matrix element and the corresponding feature data. Since the feature data in the same feature matrix are often used in multiplication operations after multiple weight matrix sliding, the feature data needs to be loaded multiple times in actual operation. That is, it is necessary to perform a plurality of read operations on the memory that holds the feature data, thereby acquiring the feature data a plurality of times. Referring to fig. 1, in calculating the cartesian product of the feature data set and the weight data set, it is necessary to perform multi-step convolution. When the first convolution is performed, the characteristic data a is acquired by performing a read operation on the memory 21 Thereby calculating a 21 And b 21 Is a product of (3). When the fourth convolution is calculated (the weight matrix slides in the order from top to bottom, left to right), it is necessary to acquire the feature data a again by performing a read operation on the memory 21 And calculate a 21 And b 11 Is a product of (3). That is, it is necessary to store the feature data a 21 And the memory of the memory array performs multiple read operations, increasing overhead. According to the technical scheme provided by the application, the characteristic data can be loaded once by rearranging the weight matrix, so that multiplication operation can be carried out with more weight matrix elements. The number of times of loading the feature data is reduced. In addition, multiplexing of the acquired feature data is achieved by calculating the product result between the feature data and the elements in the first weight matrix and the product result between the feature data and the elements in the second weight matrix. In conclusion, the scheme improves the operation efficiency.
With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes an address processing module, where the address processing module is configured to: acquiring addresses of weight data in the first weight matrix and the second weight matrix; performing address operation by using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first characteristic data set; the data processing module is configured to determine, according to an operation result of the multiplication operation, a target data set including: the control module is used for determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.
According to the scheme, the address processing module is introduced, and the address of the product of the weight data in the first weight matrix and the second weight matrix and the characteristic data in the first data set is calculated through the address processing module, so that the Cartesian product of the characteristic data and the weight matrix and the convolution result can be further obtained as a target data set, and the function of the data processing device is expanded.
With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing module is further configured to: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first to nth weight matrixes, which are positioned in the same row, are different; the address processing module is further configured to: acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the addresses of the feature data in the first feature data set.
According to the scheme, the first weight matrix with n rows is rearranged according to the rows to obtain n weight matrices, any two row vectors in n row vectors in the same row in the n weight matrices are different, so that after multiplication operation is carried out on the characteristic data and the n weight matrices, cartesian products of the characteristic data and the first weight matrix are obtained, the multiplexing degree of the characteristic data is improved, and the operation efficiency is further improved.
With reference to the first aspect, in a possible implementation manner of the first aspect, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, the first feature data set is represented as a first feature matrix, and the address processing module is further configured to calculate, according to each address, an address of weight data saved by the array, an address of the first feature data set, a size of the first feature matrix, a filling size, and a weight size, determine a first target address, where the weight size is n rows and m columns, the filling size includes a lateral filling size and a longitudinal filling size, the lateral filling size is (n-1)/2, and the longitudinal filling size is (m-1)/2.
The scheme further refines the method for obtaining the target data address according to the address of the weight data and the address of the characteristic data, thereby improving the realizability of the data processing device for obtaining the convolution result through Cartesian products.
With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes a compression module, configured to: acquiring a second characteristic data set, and removing an element with a value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing an element with a value of 0 in the second weight data set to obtain the first weight data set; an address of each feature data in the first set of feature data is determined, and an address of each weight in the first set of weight data is determined.
According to the scheme, the characteristic data and the weight data are thinned, namely, elements with the value of 0 in the characteristic data set and the weight data set are removed, the operand of convolution operation is reduced, and therefore the operation efficiency of the data processing device is improved.
In a second aspect, an embodiment of the present application provides a data processing method, including: acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is expressed as n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2; acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing a first multiplication operation by using the first weight matrix and the first characteristic data set; performing a second multiplication operation with the first characteristic data set by using the second weight matrix; and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring addresses of weight data in the first weight matrix and the second weight matrix; performing address operation by using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first characteristic data set; determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation, including: and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation and the operation result of the address operation.
With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first to nth weight matrixes, which are positioned in the same row, are different; acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the addresses of the feature data in the first feature data set.
With reference to the second aspect, in a possible implementation manner of the second aspect, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed by the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the method further includes: and determining a first target address according to the address of the weight data stored in the array, the address of the first characteristic data set, the size corresponding to the first characteristic matrix, the filling size and the weight size, wherein the weight size is n rows and m columns, the filling size comprises a transverse filling size and a longitudinal filling size, the transverse filling size is (n-1)/2, and the longitudinal filling size is (m-1)/2.
With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring a second characteristic data set, and removing an element with a value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing an element with a value of 0 in the second weight data set to obtain the first weight data set; an address of each feature data in the first set of feature data is determined, and an address of each weight in the first set of weight data is determined.
In a third aspect, the present application provides a data processing apparatus comprising a processor and a memory, the memory storing program code, the processor being for invoking the program code in the memory to perform the method of data processing as provided in the second aspect of the application.
Drawings
Fig. 1 is a schematic diagram of a convolution operation process in the prior art.
Fig. 2 is a block diagram of a data processing apparatus according to an embodiment of the present application.
FIG. 3 is a schematic diagram of a data computing array according to an embodiment of the present application.
Fig. 4 is a block diagram of a data computing unit in a data computing array according to an embodiment of the present application.
Fig. 5 is a schematic diagram of multiplication operation on the first feature data set according to an embodiment of the present application.
Fig. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set according to an embodiment of the present application.
FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application.
Fig. 8 is a block diagram of an address calculation unit in an address calculation array according to an embodiment of the present application.
Fig. 9 is a schematic diagram of weight data stored in two data computing arrays according to an embodiment of the present application.
Fig. 10 is a schematic diagram of weight data stored in a data computing array according to an embodiment of the present application.
Fig. 11 is a schematic diagram of a weight matrix with 3 filters and performing thinning processing according to an embodiment of the present application.
Fig. 12 is a schematic diagram of a weight matrix without sparsification according to an embodiment of the present application.
Fig. 13 is a schematic flow chart of a data processing method according to an embodiment of the present application.
Fig. 14 is a block diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The technical scheme of the application will be described below with reference to the accompanying drawings.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a. b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural. In addition, in the embodiments of the present application, the words "first", "second", and the like do not limit the number and the order of execution.
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Fig. 1 is a schematic diagram of a convolution operation process in the prior art.
Fig. 1 shows a feature data set comprising a total of 5×5 feature data. Fig. 1 also shows a weight data set including 3×3 weight data in total. The weight data set may be convolved with the data feature set as a convolution kernel.
Fig. 1 also shows an illustration of a two-step operation with a step size of 1 during a convolution operation of a feature data set with a weight data set. As shown in fig. 1, 3×3 weight data in the feature data set needs to be multiplied by 3×3 data in the feature data set, respectively. The results of the multiplication are added to obtain a data value of the convolution result. Specifically, according to the illustration of FIG. 1, convolutions are made Result c 11 Can be expressed as formula 1.1, convolution result c 22 Can be expressed as formula 1.2:
c 11 =a 11 ×b 11 +a 12 ×b 12 +a 13 ×b 13 +a 21 ×b 21 +a 22 ×b 22 +a 23 ×b 23 +a 31 ×b 31 +a 32 ×b 32 +a 33 ×b 33 equation 1.1
c 12 =a 12 ×b 11 +a 13 ×b 12 +a 14 ×b 13 +a 22 ×b 21 +a 23 ×b 22 +a 24 ×b 23 +a 32 ×b 31 +a 33 ×b 32 +a 34 ×b 33 Equation 1.2
After the two-step operation shown in fig. 1 is completed, the feature data set continues to slide to the right, continuing the next operation until the complete feature data set is traversed.
Hypothesis set E 1 ={a 11 ,a 12 ,a 13 ,a 21 ,a 22 ,a 23 ,a 31 ,a 32 ,a 33 Set F 1 ={b 11 ,b 12 ,b 13 ,b 21 ,b 22 ,b 23 ,b 31 ,b 32 ,b 33 }. For set E 1 And set F 1 The Cartesian product operation is performed to obtain a set G 1 Set G 1 Multiple multiplication results as shown in table 1 may be included.
TABLE 1
a 11 ×b 11 a 11 ×b 12 a 11 ×b 13 a 11 ×b 21 a 11 ×b 22 a 11 ×b 23 a 11 ×b 31 a 11 ×b 32 a 11 ×b 33
a 21 ×b 11 a 21 ×b 12 a 21 ×b 13 a 21 ×b 21 a 21 ×b 22 a 21 ×b 23 a 21 ×b 31 a 21 ×b 32 a 21 ×b 33
a 31 ×b 11 a 31 ×b 12 a 31 ×b 13 a 31 ×b 21 a 31 ×b 22 a 31 ×b 23 a 31 ×b 31 a 31 ×b 32 a 31 ×b 33
a 12 ×b 11 a 12 ×b 12 a 12 ×b 13 a 12 ×b 21 a 12 ×b 22 a 12 ×b 23 a 12 ×b 31 a 12 ×b 32 a 12 ×b 33
a 22 ×b 11 a 22 ×b 12 a 22 ×b 13 a 22 ×b 21 a 22 ×b 22 a 22 ×b 23 a 22 ×b 31 a 22 ×b 32 a 22 ×b 33
a 32 ×b 11 a 32 ×b 12 a 32 ×b 13 a 32 ×b 21 a 32 ×b 22 a 32 ×b 23 a 32 ×b 31 a 32 ×b 32 a 32 ×b 33
a 13 ×b 11 a 13 ×b 12 a 13 ×b 13 a 13 ×b 21 a 13 ×b 22 a 13 ×b 23 a 13 ×b 31 a 13 ×b 32 a 13 ×b 33
a 23 ×b 11 a 23 ×b 12 a 23 ×b 13 a 23 ×b 21 a 23 ×b 22 a 23 ×b 23 a 23 ×b 31 a 23 ×b 32 a 23 ×b 33
a 33 ×b 11 a 33 ×b 12 a 33 ×b 13 a 33 ×b 21 a 33 ×b 22 a 33 ×b 23 a 33 ×b 31 a 33 ×b 32 a 33 ×b 33
As shown in Table 1, set E 1 And set F 1 The result of the Cartesian product operation of (c) includes the calculation of c 11 All multiplication results needed in the process: a, a 11 ×b 11 、a 12 ×b 12 、a 13 ×b 13 、a 21 ×b 21 、a 22 ×b 22 、a 23 ×b 23 、a 31 ×b 31 、a 32 ×b 32 、a 33 ×b 33 . The Cartesian product operation result of the set E and the set F also comprises a calculation c 12 Partial multiplication results needed to be used in the time: a, a 12 ×b 11 、a 13 ×b 12 、a 22 ×b 21 、a 23 ×b 22 、a 32 ×b 31 、a 33 ×b 32
Hypothesis set E 2 ={a 12 ,a 13 ,a 14 ,a 22 ,a 23 ,a 24 ,a 32 ,a 33 ,a 34 }. For set E 2 And set F 1 The Cartesian product operation is performed to obtain a set G 2 Set G 2 Multiple multiplication results as shown in table 2 may be included.
TABLE 2
a 12 ×b 11 a 12 ×b 12 a 12 ×b 13 a 12 ×b 21 a 12 ×b 22 a 12 ×b 23 a 12 ×b 31 a 12 ×b 32 a 12 ×b 33
a 22 ×b 11 a 22 ×b 12 a 22 ×b 13 a 22 ×b 21 a 22 ×b 22 a 22 ×b 23 a 22 ×b 31 a 22 ×b 32 a 22 ×b 33
a 32 ×b 11 a 32 ×b 12 a 32 ×b 13 a 32 ×b 21 a 32 ×b 22 a 32 ×b 23 a 32 ×b 31 a 32 ×b 32 a 32 ×b 33
a 13 ×b 11 a 13 ×b 12 a 13 ×b 13 a 13 ×b 21 a 13 ×b 22 a 13 ×b 23 a 13 ×b 31 a 13 ×b 32 a 13 ×b 33
a 23 ×b 11 a 23 ×b 12 a 23 ×b 13 a 23 ×b 21 a 23 ×b 22 a 23 ×b 23 a 23 ×b 31 a 23 ×b 32 a 23 ×b 33
a 33 ×b 11 a 33 ×b 12 a 33 ×b 13 a 33 ×b 21 a 33 ×b 22 a 33 ×b 23 a 33 ×b 31 a 33 ×b 32 a 33 ×b 33
a 14 ×b 11 a 14 ×b 12 a 14 ×b 13 a 14 ×b 21 a 14 ×b 22 a 14 ×b 23 a 14 ×b 31 a 14 ×b 32 a 14 ×b 33
a 24 ×b 11 a 24 ×b 12 a 24 ×b 13 a 24 ×b 21 a 24 ×b 22 a 24 ×b 23 a 24 ×b 31 a 24 ×b 32 a 24 ×b 33
a 34 ×b 11 a 34 ×b 12 a 34 ×b 13 a 34 ×b 21 a 34 ×b 22 a 34 ×b 23 a 34 ×b 31 a 34 ×b 32 a 34 ×b 33
As shown in Table 2, set E 2 Sum setF 1 The result of the Cartesian product operation of (c) includes the calculation of c 12 Partial multiplication results needed to be used in the time: a, a 14 ×b 13 、a 24 ×b 23 、a 34 ×b 33
In calculation c shown in tables 1 and 2 11 And c 12 The unwanted multiplication results may also be applied in subsequent convolution operations.
As can be seen from the analysis of the convolution operation and the cartesian product operation process, the convolution operation can be decomposed into a cartesian product operation. The operation result obtained by one Cartesian product operation can be used for multi-step convolution operation. The one-step convolution operation result may be the addition of one or more Cartesian product operation results.
Fig. 2 is a block diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 200 shown in fig. 2 includes: a storage module 210, a data processing module 220, an address processing module 230, and a control module 240.
The storage module 210 is configured to store a first feature data set, an address of each feature data in the first feature data set, a first weight set, and an address of each weight in the first weight set.
The data processing module 220 includes N data compute arrays. Each of the N data computing arrays includes n×m data computing units, where N is a positive integer greater than or equal to 2, and m is a positive integer greater than or equal to 2.
The address processing module 230 includes N address compute arrays. Each of the N address calculation arrays includes n×m address calculation units.
Wherein each data computing array is configured to acquire n×m weight data from the storage module 210, and store the acquired weight data to n×m data computing units of the each data computing array.
Each address calculation array for acquiring addresses of n×m weight data from the storage module 210 and storing the acquired addresses of the weight data to the each address calculation arrayN×m address calculation units of the array. The addresses of the weight data held by the N address calculation arrays are addresses of the weight data held by the N data calculation arrays. In other words, the N address calculation arrays are in one-to-one correspondence with the N data calculation arrays, and each of the N address calculation arrays holds an address of the weight data held by the corresponding data calculation array. For example, assume that weight data held by one data compute array of the N data compute arrays is b 11 ,b 12 、b 13 、b 21 、b 22 、b 23 、b 31 、b 32 、b 33 The address stored in the address computing array corresponding to the data computing array in the N address computing arrays is b 11 Address, b 12 Address, b 13 Address, b 21 Address, b 22 Address, b 23 Address, b 31 Address, b 32 Address sum b of (b) 33 Is a logical address of the host.
The N data computing arrays multiply the first set of feature data using the weight data held by the N data computing arrays. During the operation of the first characteristic data set, the weight data stored in the N data computing arrays are unchanged.
Similarly, the N address calculation arrays perform an address operation on the address of the first feature data set using the address of the weight data held by the N address calculation arrays, wherein the address of the weight data held by the N address calculation arrays remains unchanged during the address operation on the address of the first feature data set.
The control module 240 is configured to determine a target data set according to the N data computing arrays and the operation result of the multiplication operation and the operation result of the address operation.
Therefore, the N data computing arrays may determine the operation result of the convolution operation on the first feature data set by the weight data stored by the N data computing arrays according to the multiplication operation result and the operation result of the address operation. In other words, in some embodiments, the target data set may be a set of data obtained by convolving the first feature data set with the weight data stored by the N data calculation arrays.
The first characteristic data set operation shown in fig. 1 is described below with reference to fig. 1, 3 to 5 using the saved weight data for the N data calculation arrays.
FIG. 3 is a schematic diagram of a data computing array according to an embodiment of the present application. The data computing array 300 as shown in fig. 3 includes 9 data computing units, namely, a data computing unit 311, a data computing unit 312, a data computing unit 313, a data computing unit 321, a data computing unit 322, a data computing unit 323, a data computing unit 331, a data computing unit 332, and a data computing unit 333, respectively.
It will be appreciated that the data computing array may include an input-output unit (not shown) in addition to the data computing unit shown in fig. 3. The input-output unit is used for acquiring data to be input to the data computing array 300. The input/output unit is further configured to input data to be output by the data computing array 300 to a corresponding unit and/or module. For example, the input-output unit may acquire the weight data and the feature data from the storage module, and send the acquired weight data and feature data to the corresponding data calculation unit. The input/output unit is also used for acquiring the target data calculated by each data calculation unit and sending the target data to the storage module.
Optionally, in some embodiments, data transfer between individual compute units in the data compute array is unidirectional. Taking fig. 3 as an example, arrows for connecting the respective data calculation units in fig. 3 may represent unidirectional transfer directions representing data. Take the data calculation unit 311, the data calculation unit 312, and the data calculation unit 313 as examples. The data calculation unit 311 may transmit data (e.g., feature data) to the data calculation unit 312, and the data calculation unit 312 cannot transmit data to the data calculation unit 311. The data calculation unit 312 may transmit data to the data calculation unit 313, and the data calculation unit 313 cannot transmit data to the data calculation unit 312.
Fig. 4 is a block diagram of a data computing unit in a data computing array according to an embodiment of the present application. As shown in fig. 4, the data calculation unit 400 may include a storage sub-unit 401 and a data calculation sub-unit 402. It is understood that the data calculation unit 400 may further include an input-output subunit. The input/output subunit is configured to acquire data that needs to be acquired by the data computing unit, and output data that needs to be output by the data computing unit.
Specifically, the data computing array 300 shown in fig. 3 may acquire 3×3 weight data in the weight data set shown in fig. 1, and store the 3×3 weight data into 3×3 data computing units of the data computing array 300, respectively.
Specifically, the weight data b 11 Can be stored in a storage subunit of the data calculation unit 311, weight data b 12 May be stored in a storage subunit of the data calculation unit 312, weight data b 13 May be stored in a storage subunit of the data calculation unit 313 and so on. Thus, 3×3 weight data are stored in the data calculation array 300.
After storing 3×3 weight data, the data computing array 300 may slide the first feature data set in one direction, and multiply the first feature data set using the weight data stored by the data computing array 300. The weight data stored in the data computing array 300 does not change during the multiplication of the first feature data set by the data computing array 300. In other words, the data calculation unit in the data calculation array 300 does not delete the saved weight data in the process of multiplying the first feature data by the data calculation array 300. Correspondingly, the data calculation unit does not read and store new weight data from the storage module.
The way in which the first set of feature data slides unidirectionally may be referred to fig. 5. Fig. 5 is a schematic diagram of a multiplication process performed on the first feature data set according to an embodiment of the present application. As shown in FIG. 5, the first set of feature data may be first And the device is turned 180 degrees. As shown in fig. 5, column 1 of the first feature data set becomes column 5 after inversion, column 2 becomes column 4 after inversion, and so on. It should be noted that, as shown in fig. 5, the first feature data set is flipped 180 and then slid to the right for convenience of describing the feature data a 11 、a 21 、a 31 、a 12 、a 22 、a 32 、a 13 、a 23 、a 33 And weight data b 11 、b 21 、b 31 、b 12 、b 22 、b 32 、b 13 、b 23 And b 33 Is calculated by the computer. In a practical implementation, the first feature data set may be directly multiplied by the weight data stored in the data computing array 300 in a manner that slides to the right. The operation result of the multiplication operation by the first feature data set directly sliding to the right and the data value of the operation result of the multiplication operation by the first feature data set by turning over 180 degrees first and then sliding to the right in the manner shown in fig. 5 are the same, but the sequence of the obtained final data is different.
The flipped first feature data set slides rightward in one direction, and performs multiplication operation with the weight data stored in the data computing array 300, respectively. Specifically, at the time of the first operation, the feature data a 11 、a 21 And a 31 Respectively with weight data b 11 、b 21 And b 31 Multiplying. After the first operation, the flipped first feature data set slides rightward, and a second operation is performed. At the time of the second operation, the characteristic data a 11 、a 21 And a 31 Respectively with weight data b 12 、b 22 And b 32 Multiplication, characteristic data a 12 、a 22 And a 32 Respectively with weight data b 11 、b 21 And b 31 Multiplying. After the second operation, the flipped feature data set continues to slide to the right, and a third operation is performed, and so on. In the above embodiment, the step size of each sliding of the first feature data set is 1. Of course, in some other embodiments, the first feature data set is provided one at a timeThe step size of the sliding may also be a positive integer greater than 1.
Taking the first operation as an example, the data computing unit 311 may obtain the feature data a from the first feature data set stored in the storage module 210 11 And the acquired characteristic data a 11 Stored in a storage subunit of the data calculation unit 311. In this case, the weight data b is held in the storage subunit of the data calculation unit 311 11 And characteristic data a 11 . The data calculation subunit in the data calculation unit 311 stores the weight data b in the storage subunit 11 And characteristic data a 11 Multiplying to obtain intermediate data k (11,11) . Weight data b 11 And characteristic data a 11 The multiplication operation may be implemented by a multiplier in the data calculation subunit.
The data computing unit 311 may further obtain the cache data r stored in the first target address according to the target address determined by the address computing unit corresponding to the data computing unit 311 (11,11) . Specifically, the address calculation unit corresponding to the data calculation unit 311 may calculate the address based on the feature data a 11 Address and weight data b of (2) 11 The address of which determines the first target address. The data computing unit 311 may obtain the current cache data r stored in the first target address (11,11) . The manner in which the address calculation unit determines the first target address will be described later. The data calculation subunit calculates the intermediate data k (11,11) And the current cache data r (11,11) Adding to obtain target data d (11,11) . The intermediate data k (11,11) And the current cache data r (11,11) The addition of (a) may be performed by an adder in the data computation subunit. The target data d (11,11) May be saved to the first destination address. In other words, the current cache data r stored in the first target address (11,11) Is updated to the target data d (11,11)
Similarly, the data calculation unit 321 can determine the weight data b held by the data calculation unit 321 in the same manner 21 And characteristic data a 21 Product of (2)(hereinafter referred to as intermediate data k) (21,21) ). The target address determined by the address calculation unit corresponding to the data calculation unit 321 is also the first target address. The data calculation unit 321 calculates intermediate data k (2121) With the current cache data held by the first target address (at this time the current cache data has been updated to target data d (1111) ) Adding to obtain target data d (21,21) . The target data d (21,21) May be saved to the first destination address. In other words, the current cache data d stored in the first target address (11,11) Is updated to the target data d (21 ,21)。
The data calculation unit 331 can determine the weight data b held by the data calculation unit 331 in the same manner 31 And characteristic data a 31 Is hereinafter referred to as intermediate data k (31,31) ). The target address determined by the address calculation unit corresponding to the data calculation unit 331 is also the first target address. The data calculation unit 331 calculates intermediate data k (31,31) With the current cache data held by the first target address (at this time the current cache data has been updated to target data d (21,21) ) Adding to obtain target data d (31,31) . The target data d (31,31) May be saved to the first destination address. In other words, the current cache data d stored in the first target address (21,21) Is updated to the target data d (31,31)
After the first operation, the target data stored in the first target address is a 11 ×b 11 +a 21 ×b 21 +a 31 ×b 31
In a similar manner, the data computing array 300 may continue to operate on the first set of feature data using the weight data held by the data computing units in the data computing array 300.
After the third operation, the data stored in the first target address is a 11 ×b 11 +a 21 ×b 21 +a 31 ×b 31 +a 12 ×b 12 +a 22 ×b 22 +a 32 ×b 32 . That is, in the third operation, the target address determined by the address calculation units corresponding to the data calculation unit 312, the data calculation unit 322, and the data calculation unit 332 is also the first target address. Thus, after the third budget, the target data stored in the first target address is a determined by the data and data calculation unit 312 after the first operation 12 ×b 12 A determined by the data calculation unit 322 22 ×b 22 A determined by the data calculation unit 332 32 ×b 32 And (3) summing. After the fifth operation, the data stored in the first target address is a 11 ×b 11 +a 21 ×b 21 +a 31 ×b 31 +a 12 ×b 12 +a 22 ×b 22 +a 32 ×b 32 +a 13 ×b 13 +a 23 ×b 23 +a 33 ×b 33 . That is, in the fifth operation, the target address determined by the address calculation units corresponding to the data calculation unit 313, the data calculation unit 323, and the data calculation unit 333 is also the first target address. Therefore, after the fifth operation, the target data stored in the first target address is a determined by the data and data calculation unit 313 13 ×b 13 A determined by the data calculation unit 323 23 ×b 23 A determined by the data calculation unit 333 33 ×b 33 And (3) summing.
Thus, after five calculations, the data stored in the first destination address is the convolution result c as shown in equation 1.1 11 . Similarly, the convolution operation of the first feature data set and the weight data set can be completed by using the multiplication operation and the address operation result.
Address operations of the addresses of the first feature data set shown in fig. 1 using the addresses of the saved weight data are described below with reference to fig. 1, 3 to 8 for the N address calculation arrays.
Fig. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set according to an embodiment of the present application. The address of the first set of characteristic data as shown in fig. 6 is the address of the first set of characteristic data as shown in fig. 1. Specifically, address Add a11 Is the characteristic data a 11 Address of (a), address Add a12 Is the characteristic data a 12 And so on. The address of the weight data set shown in fig. 6 is the address of the weight data set shown in fig. 1. Specifically, address Add b11 Is weight data b 11 Address of (a), address Add b12 Is weight data b 12 And so on.
FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application. The address calculation array 700 shown in fig. 7 includes 9 data calculation units, namely an address calculation unit 711, an address calculation unit 712, an address calculation unit 713, an address calculation unit 721, an address calculation unit 722, an address calculation unit 723, an address calculation unit 731, an address calculation unit 732, and an address calculation unit 733, respectively.
It will be appreciated that the address calculation array may include an input-output unit (not shown) in addition to the address calculation unit shown in fig. 7. The input-output unit is used for acquiring data to be input to the address calculation array 700. The input/output unit is further configured to input data that needs to be output by the address calculation array 700 to a corresponding unit and/or module. For example, the input-output unit may acquire the address of the weight data and the address of the feature data from the storage module, and send the acquired address of the weight data and the address of the feature data to the corresponding address calculation unit. The input/output unit is further configured to obtain the target address calculated by each address calculation unit, and send the target address to the corresponding data calculation unit.
The N address computing arrays are in one-to-one correspondence with the N data computing arrays. The one-to-one correspondence here means that one data computing array of the N data computing arrays corresponds to one address computing array of the N address computing arrays, different data computingThe address calculation arrays corresponding to the arrays are different. For example, let N be equal to 3,3 data compute arrays be data compute array 1, data compute array 2, and data compute array 3, respectively, and 3 address compute arrays be address compute array 1, address compute array 2, and address compute array 3, respectively. The data computing array 1 corresponds to the address computing array 1, the data computing array 2 corresponds to the address computing array 2, and the data computing array 3 corresponds to the address computing array 3. An address calculation array corresponding to the data calculation array is used to calculate a target address for each target data in the data calculation array. Further, the data calculation units in the data calculation array are also in one-to-one correspondence with the address calculation units in the address calculation array. Assuming that the data calculation array shown in fig. 3 corresponds to the address calculation array shown in fig. 7, the data calculation unit 311 corresponds to the address calculation unit 711, the data calculation unit 312 corresponds to the address calculation unit 712, the data calculation unit 313 corresponds to the address calculation unit 731, and so on. The address calculation unit is used for determining the address of the target data of the corresponding data calculation unit. Specifically, the cache data r acquired by the data calculation unit 311 as described above (11,11) The first target address is obtained by performing an address operation by the address calculation unit 711.
Fig. 8 is a block diagram of an address calculation unit in an address calculation array according to an embodiment of the present application. As shown in fig. 8, the address calculation unit 800 may include a storage sub-unit 801 and an address calculation sub-unit 802. It is understood that the address calculation unit 800 may further include an input-output subunit. The input/output subunit is configured to acquire data that needs to be acquired by the address calculation unit, and output data that needs to be output by the address calculation unit.
Specifically, the address calculation array 700 shown in fig. 7 may acquire the addresses of 3×3 weight data among the addresses of the weight data set shown in fig. 6, and store the addresses of the 3×3 weight data into the 3×3 data calculation units of the address calculation array 700, respectively.
Specifically, address Add b11 Can be stored in a memory subunit of the address calculation unit 711, address Add b12 May be stored in a memory subunit of the address calculation unit 712, address Add b13 May be stored in a memory subunit of the address calculation unit 713 and so on. Thus, the address calculation array 700 stores the addresses of 3×3 weight data.
After storing the addresses of the 3×3 weight data, the address calculation array 700 may slide the addresses of the first feature data set in one direction, and perform an address operation on the addresses of the first feature data set using the addresses of the weight data stored in the address calculation array 700. In the process of performing the address operation on the address of the first feature data set by the address calculation array 700, the address of the weight data stored in the address calculation array 700 is not changed. In other words, in the process of performing the address operation on the address of the first feature data by the address calculation array 700, the address calculation unit in the address calculation array 700 does not delete the address of the saved weight data. Accordingly, the address calculation unit will not read and store the address of the new weight data from the memory module.
The process of performing address calculation by sliding the address of the first feature data set to the right in one direction is similar to the process of performing multiplication by sliding the address of the first feature data set to the right, and thus, a detailed description is omitted.
The following describes how the address calculation unit performs an address operation.
For convenience of description, the address of the weight value acquired by the address calculation unit 800 is hereinafter referred to as the address of the first weight value, the address of the feature data acquired by the address calculation unit 800 is referred to as the address of the first feature data, and the address obtained by performing the address operation by the address calculation unit 800 is referred to as the first target address.
The input/output subunit in the address calculation unit 800 may obtain, in addition to the address of the first feature data and the address of the first weight data from the storage module, the following information: the size of the address calculation array to which the address calculation unit 800 belongs, the filling size, and the weight size corresponding to the size of the input data of the first feature data set, the filling size being a preset size. In this example, the weight size is 3×3. The size of the input data corresponding to the first feature data set, the filling size and the weight size may also be saved in the storage subunit 801 of the address calculation unit 800. The address calculation subunit 801 may determine the first target address according to the first weight data address, the first feature data address, the size of the input data corresponding to the first feature data set, the padding size, and the weight size.
Assuming that the size of the input picture is a row b column and the size of the convolution kernel is n row m column, the convolved output picture size is (a-n+1) × (b-m+1). This has two problems: 1, reducing the size of an output picture after each convolution operation; 2, fewer pixels in corners and edge areas of the original picture are adopted in output, and a lot of information of the lost edge position of the picture is output.
To solve these problems, the original picture may be padded (Padding) on the boundaries before performing the convolution operation to increase the size of the matrix. Typically 0 is taken as the filling value.
Assuming that the number of the horizontal and vertical extension pixel points is p and q respectively, the size of the original picture after filling is (a+2p) x (b+2q), the convolution kernel size is kept unchanged for n rows and m columns, the output picture size is unchanged, and the output picture size is (a+2p-n+1) x (b+2q-m+1). The number of pixels p and q extending in each direction is the fill size. It can be derived that the lateral filling dimension p is equal to (n-1)/2 and the longitudinal filling dimension q is equal to (m-1)/2.
The address calculation subunit 801 may specifically determine the target address according to the following formula:
result_cord=(input_cord/input_size x -w_cord/kernel_size x +padding_size x )×input_size y +(input_cord%input_size y -w_cord%kernel_size y +padding_size y ) (equation 1.3)
Wherein,% represents the remainder, result_cord represents the target address, input_cord represents the address of the feature data, and input_size x Squatting representing the size of the input data corresponding to the first set of characteristic dataTarget, input_size y An ordinate representing the size of the input data corresponding to the first set of feature data, w_cord representing the address of the weight data, kernel_size x Abscissa representing the weight size, kernel_size y Ordinate representing the weight size, padding_size x Representing lateral fill size, padding_size y Representing the longitudinal fill dimension.
The address of the feature data and the address of the weight data in the formula 1.3 are absolute addresses. The absolute address refers to the absolute position of the feature data/weight data in the corresponding feature data set/weight data set. The feature data set is assumed to include X feature data, where an absolute address of an xth feature data in the X feature data is X-1, where X is a positive integer greater than 1, and X is a positive integer greater than 1 and less than or equal to X. For example, the feature data set includes: 5,0,0, 32,0,0,0,0, 23, the absolute addresses of the characteristic data 5, 32 and 23 are respectively: 0,3,8. The absolute addresses listed above refer to the locations of the feature data in the feature data, and can be converted into addresses consisting of the abscissa and the ordinate according to the specification of the feature matrix. Similarly, the absolute address of the weight data may be converted into an address composed of an abscissa and an ordinate.
Optionally, in some embodiments, the address calculation subunit 801 may also determine the target address according to the following formula:
result_cord=((base_input+input_cord)/input_size x -(base_w+w_cord)/kernel_size x +padding_size x )×input_size y +((base_cord+input_cord)%input_size y -(base_w+w_cord)%kernel__size y +padding__size y ) (equation 1.4)
Wherein,% represents the remainder, result_cord represents the target address, input_cord represents the address of the feature data, and input_size x An abscissa representing a size of input data corresponding to the first feature data set, input_size y An ordinate representing the size of the input data corresponding to the first set of feature data, w_cord representing the address of the weight data, kernel_size x Abscissa representing the weight size, kernel_size y Ordinate representing the weight size, padding_size x Representing lateral fill size, padding_size y The vertical stuffing size is indicated, base_input is indicated by the base address of the feature data, and base_w is indicated by the base address of the weight data.
The address of the feature data and the address of the weight data in the formula 1.4 are relative addresses. The relative address refers to the location of the feature data/weight data in the corresponding feature data set/weight data set relative to the address of the first feature data/weight data. Assuming that the address of the first feature data of the feature data combination is Y, the address of the Y-th feature data in the feature data set is y+y-1, wherein Y and Y are positive integers greater than or equal to 1.
Alternatively, in some embodiments, the address calculation unit may directly send the target address to the corresponding data calculation unit after determining the target address. The data calculation unit may determine the cache data in the target address from the target address.
Alternatively, in other embodiments, after determining the target address, the address calculating unit may determine the cache data in the target address, and then send the cache data and the target address together to the corresponding data calculating unit.
The above describes how a data compute array performs multiplication operations and an address compute array performs address operations.
As described above, the data processing apparatus may include 2 or more data calculation arrays and corresponding address calculation arrays.
The weight data set shown in fig. 1 includes only 3×3 weight data, and only one weight data set is used for convolution operation of the feature data set. Alternatively, in other embodiments, the number of weight data sets used to convolve the feature data set may be two or more.
Alternatively, in some embodiments, each of the N data compute arrays may acquire and store a set of weight data and multiply the first set of feature data with the stored weight data. Correspondingly, each address calculation array in the N address calculation arrays can acquire and store the address of the corresponding weight data, and the address of the first characteristic data set is multiplied by the address of the stored weight data.
If the number of the weight data sets used for carrying out convolution operation on the feature data sets is greater than N, the N data computing arrays can acquire the N weight data sets each time to carry out multiplication operation on the first feature data set. If the number of the weight data sets which can be obtained at one time is smaller than N, all the weight data sets are obtained, and multiplication operation is carried out on the first characteristic data set. Let N take a value of 4 and the number of weight data sets be 9. In this case, the 4 data computing arrays may first obtain the 1 st to 4 th weight data sets to multiply the first feature data set, then the 4 data computing arrays may obtain the 5 th to 8 th weight data sets to multiply the first feature data set, and then the 4 data computing arrays may obtain the 9 th weight data set to multiply the first feature data set. The manner in which the N address calculation arrays perform address operations is similar, and need not be described here.
Alternatively, in other embodiments, the weight data stored in different data computing arrays of the N data computing arrays may be a result of row-wise rearrangement of the same weight data. For example, it is assumed that the N data calculation arrays include a first data calculation array and a second data calculation array, and that the n×m weight data held by the second data calculation array is n×m weight data obtained by rearranging the n×m weight data held by the first data calculation array in rows.
Fig. 9 is a schematic diagram of weight data stored in two data computing arrays according to an embodiment of the present application.
As shown in fig. 9, the data calculation array 1 holds 3×3 weight data, wherein the first row weight data is b 11 ,b 12 And b 13 The method comprises the steps of carrying out a first treatment on the surface of the The weight data of the second row is b 21 ,b 22 And b 23 The method comprises the steps of carrying out a first treatment on the surface of the The third row weight data is b 31 ,b 32 And b 33 . The data computing array 2 stores 3×3 weight data, wherein the first row weight data is b 31 ,b 32 And b 33 The method comprises the steps of carrying out a first treatment on the surface of the The weight data of the second row is b 11 ,b 12 And b 13 The method comprises the steps of carrying out a first treatment on the surface of the The third row weight data is b 21 ,b 22 And b 23 . From this, it can be seen that the weight data held by the data calculation array 1 is rearranged by rows, and the result is the weight data held by the data calculation array 2. Correspondingly, the weight data stored in the data calculation array 1 may be considered to be a result of the rearrangement of the weight data stored in the data calculation array 2 by rows. For convenience of description, the weight data obtained after such row-wise rearrangement will be referred to as rearranged weight data hereinafter, and the weight data stored in the two data calculation arrays shown in fig. 9 will be referred to as mutually rearranged weight data.
The relationship of the weight data held by the two data calculation arrays is shown in fig. 9. Optionally, in some embodiments, the weight data stored in any two data computing arrays of the three or more data computing arrays are also rearranged weight data. For example, the N data computing arrays further include a data computing array 3 as shown in fig. 10, where the data computing array 3 stores 3×3 weight data, and the first row weight data is b 21 ,b 22 And b 23 The method comprises the steps of carrying out a first treatment on the surface of the The weight data of the second row is b 31 ,b 32 And b 33 The method comprises the steps of carrying out a first treatment on the surface of the The third row weight data is b 11 ,b 12 And b 13 . As shown in fig. 9, the weight data stored in the data computing array 1 and the data computing array 3 are rearranged weight data; the weight data stored in the data computing array 2 and the data computing array 3 are also mutually rearranged weight data. In summary, if the value of N is greater than or equal to N, the weight data includes N rows, the weight data may be rearranged at most N-1 times, and the weight data stored in the N-th to N-th data computing arrays of the N-th data computing arrays are all for the N-th data computing arraysAnd the weight data stored in the 1 st data computing array in the data computing array are rearranged according to the rows, wherein n weight data stored in the n data computing arrays are respectively positioned in any two row vectors in the row vector of the same row and are different. N is a positive integer greater than or equal to N. In this case, the first data computing array and the second data computing array are any two data computing arrays of n data computing arrays. In other words, the first row weight data stored in each of the n data computing arrays is the second row weight data to the nth row weight data in the remaining n-1 data computing arrays, respectively.
Alternatively, in some embodiments, the data computing array 2 and the data computing array 3 may acquire 3×3 weight data as shown in fig. 1, and then perform data rearrangement to obtain rearranged weight data.
Alternatively, in other embodiments, the storage module may store rearrangement weight data, and the data computing array 2 and the data computing array 3 directly obtain the rearrangement weight data from the storage module.
It will be appreciated that, since the data calculation array corresponds to the address calculation array, the address of the weight data held by the second address calculation array corresponding to the second data calculation array is also a row-wise rearranged result of the address of the weight data held by the first address calculation array corresponding to the first data calculation array.
Similarly, if the value of N is greater than or equal to N, the weight data includes N rows, and the address of the weight data also includes N rows. The addresses of the weight data can be rearranged for N-1 times at most, and the addresses of the weight data stored in the 2 nd to N th address calculation arrays in the N address calculation arrays are all the addresses of the weight data obtained by rearranging the addresses of the weight data stored in the 1 st address calculation array in the N address calculation arrays according to rows. N is a positive integer greater than or equal to N. In this case, the first address calculation array and the second address calculation array are any two address calculation arrays among n address calculation arrays. In other words, the addresses of the first row weight data stored in each of the n address calculation arrays are the addresses of the second row weight data to the n-th row weight data in the remaining n-1 data calculation arrays, respectively.
After the weight data and the corresponding weight data address are rearranged according to the row, the access times of the data computing array and the address computing array to the storage module can be further reduced by multiplexing the characteristic value data.
For example, in the process of performing a convolution operation on the feature data set shown in fig. 1 using the weight data set shown in fig. 1, it is also necessary to determine the operation result shown in formula 1.3:
c 21 =a 21 ×b 11 +a 22 ×b 12 +a 23 ×b 13 +a 31 ×b 21 +a 32 ×b 22 +a 33 ×b 23 +a 41 ×b 31 +a 42 ×b 32 +a 43 ×b 33 equation 1.4
If the weight data stored in the second data computing array after the rearrangement is as shown in fig. 9, the partial result of formula 1.4 can be obtained after one access to the storage module.
Specifically, when the data calculation array 2 shown in fig. 9 multiplies the feature data set by the saved weight data, a can be obtained 21 ×b 11 A) the result of the operation of (a) 22 ×b 12 A) the result of the operation of (a) 23 ×b 13 A) the result of the operation of (a) 31 ×b 21 A) the result of the operation of (a) 32 ×b 22 Sum of the operation results of (a) 33 ×b 23 Is a result of the operation of (a). According to the operation rule described above, the sum of the 6 operation results is stored to the same target address.
It is assumed that the data processing apparatus includes only the data calculation array 1 and the data calculation array 2 and weight data held by the data calculation array 1 and the data calculation array 2 is as shown in fig. 9. Multiplication of a feature data set as shown in fig. 1 is performed using a data calculation array 1 and a data calculation array 2 After the data computing array 1 and the data computing array 2 multiply the feature data of the first line to the third line of the feature data set, the feature data of the third line to the fifth line of the feature data set may be multiplied. In other words, the step size of the downward slide may be 2 during the multiplication operation traversing the feature data set. In the case where the weight data is not rearranged (in other words, the data processing apparatus has only the data calculation array 1 as shown in fig. 9), if it is desired to obtain a 21 ×b 11 、a 22 ×b 12 、a 23 ×b 13 And the like, it is necessary to multiply the feature data of the second to fourth rows using the data calculation array 1 after completing the multiplication of the feature data of the first to third rows. Multiplication requires the feature data of the second to third rows of the feature data set to be acquired again. In other words, the second to third lines of the feature data set need to be read a second time to obtain a 21 ×b 11 、a 22 ×b 12 、a 23 ×b 13 Etc., which makes the same feature data need to be read multiple times.
Since the weight data is rearranged, the operation result of the data computing array 2 for multiplying the second line to the third line of the feature data set is equivalent to the operation result of the data computing array 1 for multiplying the feature data of the second line to the third line after sliding down with the step length of 1. In other words, the second to third lines of feature data of the feature data set can be multiplied by two weight data sets as long as they are read once. In this way, more partial Cartesian products can be obtained by one reading of the feature data. In practice, there is also a method of predicting by using a partial cartesian product of a feature data set and a weight data set, so that by rearranging weight data in rows, multiplying the rearranged weight data with original weight data according to the feature data set, and obtaining a target data set including a partial cartesian product according to the result, the number of accesses to a memory module can be reduced, and the speed of data processing can be increased.
When the first weight matrix with n rows is rearranged (n-1) times, any two row vectors in n row vectors in the same row of the obtained n weight matrices are different, the Cartesian product of the feature data set and the first weight matrix can be obtained after multiplication operation is carried out on the feature data set and the n weight matrices, convolution of the feature data set and the first weight matrix can be further obtained, and each feature data in the feature data set only needs to be loaded into the data processing unit once.
Fig. 10 is a schematic diagram of weight data stored in a data computing array according to an embodiment of the present application.
In the process of multiplying the first to third lines of the feature data set shown in fig. 1 using the weight data shown in fig. 10, a can be obtained 31 ×b 11 、a 32 ×b 12 And a 33 ×b 13 Is a result of the operation of (a). The first data computing array, the second data computing array and the third data computing array may multiply the fourth line to the fifth line of the feature data set after multiplying the first line to the third line of the feature data set. In other words, the step size of the downward slide may be 3 during the multiplication operation traversing the feature data set.
Assuming that there are three data compute arrays, such as data compute array 1, data compute array 2, and data compute array 3 shown in fig. 9, respectively, of the N data compute arrays, the three data compute arrays may perform a cartesian product operation on the feature data set.
Also in characteristic data a 11 、a 21 、a 31 、a 12 、a 22 、a 32 、a 13 、a 23 、a 33 As an example. The three data computing arrays can be respectively matched with the characteristic data a 11 、a 21 、a 31 、a 12 、a 22 、a 32 、a 13 、a 23 、a 33 The multiplication process shown in fig. 5 is performed. The three data computing arrays complete the characteristic data a by using the weight data stored in each data computing array 11 、a 21 、a 31 、a 12 、a 22 、a 32 、a 13 、a 23 、a 33 The operation result of the multiplication of (a) is shown in table 1.
In summary, if the weight data includes n rows, the weight data may be rearranged n-1 times at most. If the weight data is rearranged once, the step length of downward sliding in the process of traversing the characteristic data set to carry out multiplication operation can be 2; if the weight data are rearranged twice, the step length of sliding downwards can be 3 in the process of traversing the characteristic data set to carry out multiplication operation; if the weight data is rearranged n-1 times, the step size of sliding downwards in the process of traversing the characteristic data set to carry out multiplication operation can be n.
Optionally, in some embodiments, the first feature data set is a feature data set obtained by subjecting the second feature data set to a thinning process. The first weight data set is a weight data set obtained after sparsification processing. The data processing apparatus 200 as shown in fig. 2 may further comprise a compression module. The compression module is used for acquiring a second characteristic data set, and performing sparsification processing on the second characteristic data set to obtain the first characteristic data set, wherein the second characteristic data set comprises characteristic data corresponding to input data. The compression module is also used for acquiring a second weight data set, and performing sparsification processing on the second weight data set to obtain the first weight data set. The compression module is further configured to determine an address of each feature data in the first set of feature data, and determine an address of each weight data in the first set of weight data. The compression module sends the acquired first feature data set, the first weight data set, the address of each feature data in the first feature data set and the address of each weight data in the first weight data set to the storage module, and the storage module stores the addresses. If the weight data after the thinning is less than n multiplied by m, the rest bits are complemented with 0.
The input data referred to in embodiments of the present application may be any data capable of performing multiplication, cartesian product operations, and/or convolution operations. For example, image data, voice data, or the like may be used. The input data is a generic term for all data input to the data processing apparatus. The input data may consist of characteristic data. The feature data corresponding to the input data may be all data included in the input data or may be part of the feature data of the input data. Taking image data as an example, it is assumed that the input data is an entire image, and all data of the image is referred to as feature data. The second weight data set may include all feature data of the input data, or may be all or part of feature data of the image after some processing. For example, the second weight data may be feature data of a partial image obtained by dividing the image.
The second feature data set is assumed to include: 5,0,0, 32,0,0,0,0, 23,0,0,0,0,0, 43, 54,0,0,0,1,4,9, 34,0,0,0,0,0,0, 87,0,0,0,0,5,8, the first feature data set obtained after the thinning comprises: 5,32, 23, 43, 54,1,4,9, 34, 87,5,8. Assuming that the address of the first feature data in the second feature data set is 0, the address of the second feature data is 1, the address of the third feature data is 2, and the address of the nth feature data is n-1, the address (absolute address) of the first feature data set is: 0,3,8, 14, 15, 19, 20, 21, 22, 29, 34, 35.
Assuming that the second set of weight data includes 8,4,0,0,0,0,2,0,0,0,0,0,0,0,0,0, 24, 54,0,0,0,0,0, 12,0, 22,3, 45,0,0,0,0, 67, 44,0,0,0,0,0,0,0,0, 35, 65, 75, the thinned second set of weight data includes: 8,4,2, 24, 54, 12, 22,3, 45, 67, 44, 35, 65, 75. It can be seen that the thinned second weight data set includes 14 weight data. Each data calculation array is assumed to include 3×3 data calculation units. The number of weight data of the thinned second weight data set is less than the number of data calculation units included in the 2 data calculation arrays. Therefore, 4 0 are finally supplemented in the thinned second weight data set, and the first weight data set is obtained. Thus, the set of first weight data corresponding to the second weight data is: 8,4,2, 24, 54, 12, 22,3, 45, 67, 44, 35, 65, 75,0,0,0,0. Assuming that the address of the first weight data in the second weight data set is 0, the address of the second weight data is 1, the address of the third weight data is 2, and the address of the nth weight data is n-1, the address (absolute address) of the first weight data set is: 0,1,6, 16, 17, 23, 26, 27, 28, 33, 34, 43, 44, 45.
In some embodiments, the first feature data set may also be a feature data set that has not undergone a thinning process. In other words, the first set of feature data may be equal to the second set of feature data.
The first feature data set in the above embodiment corresponds to a matrix, and correspondingly, the weight data for performing the convolution operation on the first feature data set also corresponds to a matrix. In other words, the convolution operation described in the above embodiment is a two-dimensional convolution operation.
The technical scheme of the embodiment of the application can also be applied to T-dimensional multiplication operation, cartesian product operation and/or convolution operation (T is a positive integer greater than or equal to 3). The number of the weight data sets to be used for the convolution operation of the first feature data set may be plural.
The technical scheme of the application is described below by taking three-dimensional convolution operation as an example.
If the input data corresponding to the first set of feature data is a color picture data, the first set of feature data may be a three-dimensional tensor. A three-dimensional convolution operation may be performed on the first set of feature data.
The first feature data set includes three subsets: feature data subset 1, feature data subset 2, and feature data subset 3. The characteristic data of the three subsets correspond to the three input channels red, green and blue, respectively. The feature data in each of the three subsets may correspond to a matrix.
Assume that three sets of weight data are used to perform a three-dimensional convolution operation on the first set of feature data. The set of weight data used to convolve the feature data set may also be referred to as a Filter (Filter). Thus, the three sets of weight data may be referred to as filter 1, filter 2, and filter 3. Each weight data set of the three weight data sets comprises three weight channels, namely a channel 1, a channel 2 and a channel 3. The weight data included in each of the three weight channels may correspond to a matrix. The three weight channels are in one-to-one correspondence with the three feature data subsets. For example, channel 1 corresponds to feature data subset 1, channel 2 corresponds to feature data subset 2, and channel 3 corresponds to feature data subset 3. The weight channel may perform a convolution operation on the corresponding feature data subset. The filters 1, 2 and 3 may perform a three-dimensional convolution operation on the first feature data set, respectively. That is, the channel 1 of the filter 1 performs a convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 1 performs a convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 1 performs a convolution operation on the feature data subset 3 of the first feature data set; the channel 1 of the filter 2 carries out convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 2 carries out convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 2 carries out convolution operation on the feature data subset 3 of the first feature data set; the channel 1 of the filter 3 performs a convolution operation on the subset of feature data 1 of the first feature data set, the channel 2 of the filter 3 performs a convolution operation on the subset of feature data 2 of the first feature data set, and the channel 3 of the filter 3 performs a convolution operation on the subset of feature data 3 of the first feature data set.
It follows that the process of three-dimensional convolution operation of the first feature data set by each of the three filters can be decomposed into three two-dimensional convolution operation processes. The specific implementation of these three two-dimensional convolution operations is similar to that of the above-described embodiment. Taking the example of the convolution operation of the channel 1 on the feature data subset 1, the channel 1 may be regarded as the weight data set shown in fig. 1, and the feature data subset 1 may be regarded as the feature data set shown in fig. 1. The process of convolving the feature data subset by the channel 1 is the process of convolving the feature data set by the weight data set as shown in fig. 1. As described above, the convolution operation process can be decomposed into a multiplication operation and an addition operation. Therefore, the data processing apparatus as shown in fig. 2 can also perform three-dimensional convolution operation. In the case where the input data corresponding to the feature data set is a three-dimensional tensor, the first feature data set referred to in the above embodiment may be regarded as a feature data subset of the feature data sets corresponding to the three-dimensional tensor. In the case where the feature data set is convolved using a plurality of weight data sets, the first weight data set may be regarded as one of the plurality of weight data sets. In case the weight data set also corresponds to a three-dimensional tensor, the first weight data set may be considered as one channel in the weight data set of the three-dimensional tensor.
The procedure of the multidimensional convolution operation above three dimensions is similar to the three-dimensional convolution operation procedure, and the description is not necessary here.
Optionally, in the case that the feature data set is convolved by using a plurality of weight data sets, the first weight data set may also be a weight data set obtained by performing sparsification processing on the plurality of weight data sets. Specifically, the non-0 weight data included in the first weight data set is from one channel of the same weight data set or from the same channel of a different weight data set.
The thinning process of the plurality of weight data sets will be described below with reference to fig. 11.
Fig. 11 is a schematic diagram of a weight matrix with 3 filters and performing thinning processing according to an embodiment of the present application. Each of the 3 filters shown in fig. 11 includes 3 weight channels, each of which includes 3×3 weight data.
As shown in fig. 11, the weight data of the weight data set 1 comes from the weight data in the channels 1 of the filters 1 and 2, and the weight data of the weight data set 4 comes from the weight data in the channels 1 of the filters 2 and 3. The weight data of the weight data set 2 is from the weight data in the channels 2 of the filter 1 and the filter 2, and the weight data of the weight data set 5 is from the weight data in the channels 2 of the filter 2 and the filter 3. The weight data of the weight data set 3 is from the weight data in the channels 3 of the filter 1 and the filter 2, and the weight data of the weight data set 6 is from the weight data in the channels 3 of the filter 2 and the filter 3.
As shown in fig. 11, the same channel in which the weight data comes from different weight data means that the weight data may belong to different filters, but the channels in different filters are the same. The weight data as set of weight data 4 comes from the weight data in channel 1 of filter 2 and the weight data in channel 1 of filter 3.
For convenience of description, the weight data set obtained by thinning the weight data in the plurality of filters will be referred to as a thinned weight data set hereinafter
In some embodiments, the weight data included in the set of sparsified weight data may be from the same filter. The operation process of multiplying the feature data by the sparse weight data set and the process of determining the convolution operation result of the sparse weight data set and the feature data according to the operation result of the multiplying operation are the same as the above embodiments, and need not be described herein.
In some embodiments, the set of sparsified weight data may include weight data from different filters. The operation process of the sparse weight data set for multiplying the feature data is the same as the above embodiment, and need not be described here again. In the case where the weight data included in the thinned weight data set may come from different filters, the procedure of determining the convolution operation result of the thinned weight data set and the feature data from the operation result of the multiplication operation is not exactly the same as the above-described embodiment.
Specifically, it is assumed that the weight data included in the thinned weight data set comes from P filters (P is a positive integer greater than or equal to 2). The set of sparsifying weight data may be divided into P subsets of sparsifying weight data, a P-th subset of the P sparsifying weight data including weight data from a P-th filter of the P filters, p=1, … …, P. Assume that the p-th sparsified-weight data subset includes Num p Weight data, num p Is a positive integer greater than or equal to 1, and Num p Less than n x m.
And carrying out Cartesian product operation on the sparse weight data set and the characteristic data set by utilizing the N data computing arrays to obtain multiplication results required by convolution operation of each filter and the characteristic data set, and adding the corresponding multiplication results to obtain the convolution operation result of each filter and the characteristic data set.
The weight data stored in the three data calculation arrays shown in fig. 9 and 10 is also taken as an example. It is assumed that the weight data shown in fig. 9 and 10 is obtained by thinning the weight data of the channel 1 of the filter 1 and the weight data of the channel 1 of the filter 2 shown in fig. 12. Array pairs { a } are calculated using the three data shown in FIG. 10 11 ,a 12 ,a 13 ,a 21 ,a 22 ,a 23 ,a 31 ,a 32 ,a 33 The Cartesian product operation is carried out, and the following operation result can be obtained: a, a 11 ×b 11 、a 12 ×b 12 、a 13 ×b 13 、a 21 ×b 21 、a 22 ×b 22 、a 23 ×b 23 、a 11 ×b 31 、a 12 ×b 32 、a 13 ×b 33 . It can be seen that a 11 ×b 11 、a 12 ×b 12 、a 13 ×b 13 、a 21 ×b 21 、a 22 ×b 22 、a 23 ×b 23 The sum of the pairs { a } is the weight data of channel 1 of filter 1 11 ,a 12 ,a 13 ,a 21 ,a 22 ,a 23 ,a 31 ,a 32 ,a 33 An operation result of performing convolution operation; a, a 11 ×b 31 、a 12 ×b 32 、a 13 ×b 33 The sum is the weight data pair { a ] of channel 1 of filter 2 11 ,a 12 ,a 13 ,a 21 ,a 22 ,a 23 ,a 31 ,a 32 ,a 33 And performing convolution operation.
Furthermore, the compression module can also carry out sparsification processing on the target data set, and delete 0 in the target data set.
By the technical scheme, the product of each characteristic data in the first characteristic data set and each weight data in the first weight data set can be obtained. After that, the convolution operation result of the first feature data set and the first weight data set can be obtained by adding the corresponding product results.
Further, in the above-described embodiment, in determining the convolution operation result of the first feature data set and the first weight set from the operation result of the cartesian product and the address operation result, each data calculation unit in the data calculation array adds the product of the weight data and the feature data to the data held in the target address determined by the corresponding address calculation unit, and writes the added data back to the target address. Thus, the final saved result of the target address is the convolution operation result.
In other embodiments, each data computation element in the data computation array may perform only multiplication operations, i.e., multiply the data with the feature data, and save the result of the multiplication to the target address determined by the corresponding address computation element, and then fromAnd obtaining a multiplication result by the corresponding target address, and adding the obtained multiplication results to obtain a corresponding convolution operation result. For example, a 11 ×b 11 The result of (a) is stored at the target address 1, a 21 ×b 21 Results of (a) are stored at destination address 2, a 31 ×b 31 The result of (a) is stored at the target address 3, a 12 ×b 12 The result of (a) is stored at the target address 4, a 22 ×b 22 The result of (a) is stored at the target address 5, a 32 ×b 32 Results of (a) are stored at destination address 6, a 13 ×b 13 Results of (a) are stored at the target address 7,a 23 ×b 23 The result of (a) is stored at the destination address 8, a 33 ×b 33 Is saved at the destination address 9. In calculating the convolution result, the data stored in the target addresses 1 to 9 can be added to obtain c as shown in the formula 1.1 11
In other embodiments, the memory module may include an adder unit. Each data calculation unit in the data calculation array may perform multiplication operation only, that is, multiply the data with the feature data, and output the result of multiplication to the storage module, where when the storage module stores the received data to the target address determined by the address calculation unit corresponding to the data calculation unit, the storage module adds the received data to the data stored in the target address, and stores the added data to the target address. Thus, the final result of the target address is the result of the convolution operation.
Fig. 13 is a schematic flow chart of a data processing method according to an embodiment of the present application. The method shown in fig. 13 may be performed by the data processing apparatus shown in fig. 2 or 14.
1301, acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is expressed as n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2.
1302, a second weight matrix is obtained, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows.
And 1303, performing multiplication operation with a first characteristic data set by using a first weight matrix, wherein data in the first characteristic data set come from the same input channel.
1304, multiplying the first set of feature data with the second weight matrix.
1305, determining a target data set according to the operation result of the multiplication operation.
The specific implementation of the steps of the method shown in fig. 13 may be referred to in the descriptions of fig. 2 to 12, and need not be described herein.
Optionally, in some embodiments, the method further comprises: acquiring addresses of weight data in the first weight matrix and the second weight matrix; performing address operation by using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first characteristic data set; the determining the target data set according to the operation result of the multiplication operation comprises the following steps: and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation. The specific implementation manner of each step may also refer to the descriptions of fig. 2 to 12, which are not repeated herein.
Optionally, in some embodiments, the method further comprises: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first to nth weight matrixes, which are positioned in the same row, are different; acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the addresses of the feature data in the first feature data set. The specific implementation manner of each step may also refer to the descriptions of fig. 2 to 12, which are not repeated herein.
Optionally, in some embodiments, the target data set includes a result matrix that is a result of a convolution operation of the first feature data set and the first weight data set, the first feature data set being represented as a first feature matrix, the method further comprising: and determining a first target address according to the address of the weight data stored in the array, the address of the first feature data set, the size corresponding to the first feature matrix, the filling size and the weight size, wherein the weight size is n rows and m columns, and the filling size is the difference between the size of the first feature data set and the size of the result matrix. The specific implementation manner of each step may also refer to the descriptions of fig. 2 to 12, which are not repeated herein.
Optionally, in some embodiments, the method further comprises: acquiring a second characteristic data set, and removing an element with a value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing an element with a value of 0 in the second weight data set to obtain the first weight data set; an address of each feature data in the first set of feature data is determined, and an address of each weight in the first set of weight data is determined. The specific implementation manner of each step may also refer to the descriptions of fig. 2 to 12, which are not repeated herein.
Fig. 14 is a block diagram showing a structure of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 1400 shown in fig. 14 includes: the data processing module 1401 and the control module 1404, the data processing module 1401 includes N data calculation units, N is an integer greater than or equal to 2, wherein: a data processing module 1401, configured to obtain a first weight matrix in a first weight data set, where the first weight matrix is represented as n rows and m columns of weight data, and the data in the first weight data set come from the same input channel, where n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2; acquiring a second weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; multiplying a first weight matrix with a first characteristic data set, wherein data in the first characteristic data set come from the same input channel; multiplying the first characteristic data set by using the second weight matrix; the control module 1404 is configured to determine a target data set according to the operation result of the multiplication operation.
Optionally, in some embodiments, the data processing apparatus 1400 further includes an address processing module 1402, the address processing module 1402 includes N address calculating units, the data calculating units and the address calculating units are in one-to-one correspondence, wherein: the address processing module 1402 is configured to: acquiring addresses of weight data in the first weight matrix and the second weight matrix; performing address operation by using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first characteristic data set; the control module 1404 is configured to determine a target data set according to the operation result of the multiplication operation, including: and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.
Optionally, in some embodiments, the data processing module 1401 is further configured to: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first to nth weight matrixes, which are positioned in the same row, are different; the address processing module 1402 is further configured to: acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the addresses of the feature data in the first feature data set.
Optionally, in some embodiments, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, the first feature data set is represented as a first feature matrix, and the address processing module 1402 is further configured to calculate, according to each address, an address of weight data stored in the array, an address of the first feature data set, a size corresponding to the first feature matrix, a filling size, and a weight size, and determine a first target address, where the weight size is n rows and m columns, and the filling size is a difference between the size of the first feature data set and the size of the result matrix.
Optionally, in some embodiments, the data processing apparatus 1400 further comprises a compression module 1403 for: acquiring a second characteristic data set, and removing an element with a value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing an element with a value of 0 in the second weight data set to obtain the first weight data set; an address of each feature data in the first set of feature data is determined, and an address of each weight in the first set of weight data is determined.
The specific functions and advantages of the various modules in the data processing apparatus 1400 shown in fig. 14 may be referred to in the description of fig. 2 to 12, and need not be described herein.
In the embodiment of the application, the terminal equipment or the network equipment comprises a hardware layer, an operating system layer running on the hardware layer and an application layer running on the operating system layer. The hardware layer includes hardware such as a central processing unit (central processing unit, CPU), a memory management unit (memory management unit, MMU), and a memory (also referred to as a main memory). The operating system may be any one or more computer operating systems that implement business processes through processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system. The application layer comprises applications such as a browser, an address book, word processing software, instant messaging software and the like. Further, the embodiment of the present application is not particularly limited to the specific structure of the execution body of the method provided by the embodiment of the present application, as long as the communication can be performed by the method provided according to the embodiment of the present application by running the program recorded with the code of the method provided by the embodiment of the present application, and for example, the execution body of the method provided by the embodiment of the present application may be a terminal device or a network device, or a functional module in the terminal device or the network device that can call the program and execute the program.
Furthermore, various aspects or features of the application may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used herein encompasses a computer program accessible from any computer-readable device, carrier, or media. For example, computer-readable media can include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, or magnetic strips, etc.), optical disks (e.g., compact disk, CD, digital versatile disk, digital versatile disc, DVD, etc.), smart cards, and flash memory devices (e.g., erasable programmable read-only memory, EPROM), cards, sticks, or key drives, etc. Additionally, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media capable of storing, containing, and/or carrying instruction(s) and/or data.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (11)

1. A data processing apparatus, characterized in that the data processing apparatus comprises:
the storage module is used for storing the first characteristic data set;
the data processing module is used for acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is expressed as n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2;
acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows;
performing a first multiplication operation with the first characteristic data set by using a first weight matrix;
performing a second multiplication operation with the first characteristic data set by using the second weight matrix;
during the first and second multiplication operations, at least one row of feature data in the first set of feature data is read only once from the memory module;
and the control module is used for determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
2. The data processing apparatus according to claim 1, characterized in that the data processing apparatus further comprises:
an address processing module, configured to: acquiring addresses of weight data in the first weight matrix and the second weight matrix;
performing address operation by using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first characteristic data set;
the control module is used for controlling the control module to control the operation of the vehicle,
and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.
3. A data processing apparatus according to claim 2, wherein,
the data processing module is further configured to: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first to nth weight matrixes in the same row are different;
the address processing module is further configured to:
acquiring addresses of weight data in the third weight matrix to the nth weight matrix;
And performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the addresses of the feature data in the first feature data set.
4. The data processing apparatus according to claim 2, wherein the target data set includes a result matrix that is a result of a convolution operation of the first feature data set and the first weight data set, the first feature data set being represented as a first feature matrix;
the address processing module is further configured to determine a first target address according to an address of weight data stored in the array, an address of a first feature data set, a size of the first feature matrix, a filling size, and a weight size, where the weight size is n rows and m columns, the filling size includes a transverse filling size and a longitudinal filling size, the transverse filling size is (n-1)/2, and the longitudinal filling size is (m-1)/2.
5. The data processing apparatus according to any one of claims 1 to 4, further comprising a compression module for: acquiring a second characteristic data set, and removing elements with values of 0 in the second characteristic data set to obtain the first characteristic data set;
Acquiring a second weight data set, and removing an element with a value of 0 in the second weight data set to obtain the first weight data set;
an address of each feature data in the first set of feature data is determined, and an address of each weight in the first set of weight data is determined.
6. A method of data processing, the method comprising:
acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is expressed as n rows and m columns of weight data, and the data in the first weight data set come from the same input channel, wherein n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2;
acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows;
performing a first multiplication operation by using the first weight matrix and the first characteristic data set;
performing a second multiplication operation with the first characteristic data set by using the second weight matrix;
during the first and second multiplication operations, at least one row of the feature data in the first feature data set is read only once from a memory module holding the first feature data set;
And determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.
7. The method of claim 6, wherein the method further comprises:
acquiring addresses of weight data in the first weight matrix and the second weight matrix;
performing address operation by using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first characteristic data set;
the determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation includes:
and determining a target data set according to the operation result of the first multiplication operation, the operation result of the second multiplication operation and the operation result of the address operation.
8. The method of claim 7, wherein the method further comprises: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrixes according to rows, and any two row vectors in n row vectors of the first to nth weight matrixes in the same row are different;
Acquiring addresses of weight data in the third weight matrix to the nth weight matrix;
and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the addresses of the feature data in the first feature data set.
9. The method of claim 7, wherein the target data set comprises a result matrix that is the result of a convolution operation of the first feature data set with the first weight data set, the first feature data set being represented as a first feature matrix,
the method further comprises the steps of:
and determining a first target address according to the address of the weight data stored in each address calculation array, the address of a first characteristic data set, the size corresponding to the first characteristic matrix, the filling size and the weight size, wherein the weight size is n rows and m columns, the filling size comprises a transverse filling size and a longitudinal filling size, the transverse filling size is (n-1)/2, and the longitudinal filling size is (m-1)/2.
10. The method according to any one of claims 6 to 9, further comprising:
acquiring a second characteristic data set, and removing elements with values of 0 in the second characteristic data set to obtain the first characteristic data set;
Acquiring a second weight data set, and removing an element with a value of 0 in the second weight data set to obtain the first weight data set;
an address of each feature data in the first set of feature data is determined, and an address of each weight in the first set of weight data is determined.
11. A data processing apparatus, characterized in that the data processing apparatus comprises:
a processor and a memory storing program code, the processor being for invoking the program code in the memory to perform the method of data processing according to any of claims 6 to 10.
CN201811148307.0A 2018-09-29 2018-09-29 Data processing method and device Active CN110968832B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811148307.0A CN110968832B (en) 2018-09-29 2018-09-29 Data processing method and device
PCT/CN2019/102252 WO2020063225A1 (en) 2018-09-29 2019-08-23 Data processing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811148307.0A CN110968832B (en) 2018-09-29 2018-09-29 Data processing method and device

Publications (2)

Publication Number Publication Date
CN110968832A CN110968832A (en) 2020-04-07
CN110968832B true CN110968832B (en) 2023-10-20

Family

ID=69951080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811148307.0A Active CN110968832B (en) 2018-09-29 2018-09-29 Data processing method and device

Country Status (2)

Country Link
CN (1) CN110968832B (en)
WO (1) WO2020063225A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202247046A (en) * 2021-05-19 2022-12-01 神盾股份有限公司 Data processing method and circuit based on convolution computation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326985A (en) * 2016-08-18 2017-01-11 北京旷视科技有限公司 Neural network training method, neural network training device, data processing method and data processing device
CN107402905A (en) * 2016-05-19 2017-11-28 北京旷视科技有限公司 Computational methods and device based on neutral net
CN107844827A (en) * 2017-11-28 2018-03-27 北京地平线信息技术有限公司 The method and apparatus for performing the computing of the convolutional layer in convolutional neural networks
CN108122030A (en) * 2016-11-30 2018-06-05 华为技术有限公司 A kind of operation method of convolutional neural networks, device and server

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3523751A4 (en) * 2016-10-04 2020-05-06 Magic Leap, Inc. Efficient data layouts for convolutional neural networks
US10515302B2 (en) * 2016-12-08 2019-12-24 Via Alliance Semiconductor Co., Ltd. Neural network unit with mixed data and weight size computation capability

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402905A (en) * 2016-05-19 2017-11-28 北京旷视科技有限公司 Computational methods and device based on neutral net
CN106326985A (en) * 2016-08-18 2017-01-11 北京旷视科技有限公司 Neural network training method, neural network training device, data processing method and data processing device
CN108122030A (en) * 2016-11-30 2018-06-05 华为技术有限公司 A kind of operation method of convolutional neural networks, device and server
CN107844827A (en) * 2017-11-28 2018-03-27 北京地平线信息技术有限公司 The method and apparatus for performing the computing of the convolutional layer in convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卷积神经网络声学模型的结构优化和加速计算;王智超;徐及;张鹏远;颜永红;;重庆邮电大学学报(自然科学版)(03);全文 *

Also Published As

Publication number Publication date
WO2020063225A1 (en) 2020-04-02
CN110968832A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
JP7394104B2 (en) Executing kernel strides in hardware
EP3373210B1 (en) Transposing neural network matrices in hardware
CN111465924B (en) System and method for converting matrix input into vectorized input for matrix processor
CN107145939B (en) Computer vision processing method and device of low-computing-capacity processing equipment
JP7007488B2 (en) Hardware-based pooling system and method
KR20190066473A (en) Method and apparatus for processing convolution operation in neural network
JP2019087252A (en) Apparatus and method for performing deconvolution operation in neural network
JP2020524318A (en) Alternate loop limit
CN109754359B (en) Pooling processing method and system applied to convolutional neural network
CN112840356A (en) Operation accelerator, processing method and related equipment
US20220083857A1 (en) Convolutional neural network operation method and device
US6704760B2 (en) Optimized discrete fourier transform method and apparatus using prime factor algorithm
CN109918204B (en) Data processing system and method
EP3093757B1 (en) Multi-dimensional sliding window operation for a vector processor
US11763150B2 (en) Method and system for balanced-weight sparse convolution processing
KR20200081044A (en) Method and apparatus for processing convolution operation of neural network
CN110874636A (en) Neural network model compression method and device and computer equipment
US20230196113A1 (en) Neural network training under memory restraint
CN107808394B (en) Image processing method based on convolutional neural network and mobile terminal
CN110968832B (en) Data processing method and device
US11435941B1 (en) Matrix transpose hardware acceleration
CN111767243A (en) Data processing method, related device and computer readable medium
CN114758209B (en) Convolution result obtaining method and device, computer equipment and storage medium
CN116400884A (en) Control method and device of multiplier-adder computer device and storage medium
CN111079904B (en) Acceleration method of depth separable convolution and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant