CN110968832A

CN110968832A - Data processing method and device

Info

Publication number: CN110968832A
Application number: CN201811148307.0A
Authority: CN
Inventors: 梁晓峣; 景乃锋; 崔晓松; 廖健行
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2020-04-07
Anticipated expiration: 2038-09-29
Also published as: WO2020063225A1; CN110968832B

Abstract

The application provides a method for processing data and a data processing device, wherein the data processing device comprises a data processing module, and the data processing module is used for acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented as n rows and m columns of weight data, and the data in the first weight data set come from the same input channel; acquiring a second weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing multiplication operation on the first weight matrix and a first characteristic data set, wherein data in the first characteristic data set come from the same input channel; performing multiplication operation by using the second weight matrix and the first characteristic data set; and determining a target data set according to the operation result of the multiplication operation. The technical scheme can reduce the times of accessing the storage device.

Description

Data processing method and device

Technical Field

The present application relates to the field of information technology, and more particularly, to a method of processing data and a data processing apparatus.

Background

Convolutional Neural Networks (CNNs) are the most widely used algorithms in deep learning, and are widely used in various applications such as image classification, speech recognition, video understanding, and face detection.

The core of the convolutional neural network operation is the convolutional operation. The amount of data that the convolution operation needs to process is typically large. Therefore, the storage and operation resources required by the convolution operation are large. The current processors are increasingly meeting the requirements of being difficult to satisfy convolution operations. In addition, with the development of mobile intelligent devices, the mobile intelligent devices also have the requirement of convolution operation. Mobile devices have limited computing and memory capabilities. Therefore, how to improve the efficiency of convolution operation is an urgent problem to be solved.

Disclosure of Invention

The application provides a method for processing data and a data processing device, which can reduce the times of accessing a storage device.

In a first aspect, an embodiment of the present application provides a data processing apparatus, including: a data processing module, configured to obtain a first weight matrix in a first weight data set, where the first weight matrix is represented by n rows and m columns of weight data, and data in the first weight data set are from the same input channel, where n is an integer greater than or equal to 2 and m is an integer greater than or equal to 2; acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing a first multiplication operation by using the first weight matrix and the first feature data set; performing a second multiplication operation by using the second weight matrix and the first feature data set; the data processing module further comprises a control module for determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.

The target data set comprises a product result between elements in the first feature data set and elements in the first weight matrix, a partial Cartesian product and a partial convolution result of the first feature data set and the first weight matrix can be further obtained according to the product result, and the partial Cartesian product and the partial convolution result can be output from the data processing device, so that the convolution result can be predicted through a small calculation amount and a fast calculation rate. For example, assuming that the first weight matrix is a matrix of 3 rows and 3 columns, the second weight matrix is the first weight matrix rearranged by rows, after the data of a certain 3 rows and 3 columns in the first feature data set is input into the data processing module to be multiplied by the first weight matrix and the second weight matrix, respectively, the convolution result of the feature data and the first weight matrix and the convolution portion sum of the feature data of the 3 rows and 3 columns at the adjacent position and the first weight matrix can be obtained according to the target data set, because the feature data of the adjacent position often has continuity, the data processing apparatus can predict the convolution result by using the convolution result and the convolution portion sum in the target data set, for example, when the data processing apparatus performs object identification by using the feature data and according to the scheme provided by the present application, when the obtained convolution result and the convolution portion in the target data set do not conform to the expected value range, the exclusion can be directly performed without performing subsequent calculation, thereby saving the amount of calculation. After the data processing device realizes object identification according to the technical scheme provided by the application, other functions can be further realized by using the object identification result, for example, commodity sorting, target monitoring and the like can be realized by using the object identification result.

In the above scheme, the data processing apparatus obtains a second weight matrix according to the first weight matrix, where the second weight matrix is a matrix in which the first weight matrix is rearranged in rows, and performs multiplication operation with the first feature data set using the first weight matrix and the second weight matrix, and may multiplex the feature data when obtaining partial cartesian products and partial convolution results of the first feature data set and the first weight matrix, thereby improving the operation efficiency.

Specifically, in the prior art, when calculating the convolution of the feature matrix and the weight matrix, the convolution is implemented by sliding the weight matrix on the feature matrix and performing multiplication of the elements of the weight matrix and the corresponding feature data. Because the feature data in the same feature matrix is often used in the multiplication operation after the weight matrix slides for many times, the feature data needs to be loaded for many times in the actual operation. That is, it is necessary to perform a plurality of read operations on the memory in which the feature data is stored, thereby acquiring the feature data a plurality of times. Referring to fig. 1, when calculating the cartesian product of the feature data set and the weight data set, a plurality of convolution steps are required. When the first convolution is executed, the characteristic data a is acquired by reading the memory₂₁To thereby calculate a₂₁And b₂₁The product of (a). When the convolution of the fourth step is calculated (the weight matrix slides from top to bottom and from left to right), the feature data a needs to be obtained again by reading the memory₂₁And calculate a₂₁And b₁₁The product of (a). That is, it is necessary to save the feature data a₂₁The memory of (2) performs multiple read operations, increasing overhead. According to the technical scheme, the weight matrix is rearranged, so that multiplication operation can be performed on more weight matrix elements by loading the feature data once. The number of times the feature data is loaded is reduced. In addition, the multiplexing of the acquired feature data is realized by calculating the product result between the feature data and the elements in the first weight matrix and the product result between the feature data and the elements in the second weight matrix. In conclusion, the scheme improves the operation efficiency.

With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes an address processing module, where the address processing module is configured to: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; the data processing module is used for determining a target data set according to the operation result of the multiplication operation, and comprises: the control module is used for determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.

According to the scheme, an address processing module is introduced, addresses of products of weight data in a first weight matrix and a second weight matrix and feature data in a first data set are calculated through the address processing module, the Cartesian product of the feature data and the weight matrix and convolution results can be further obtained to serve as a target data set, and therefore functions of the data processing device are expanded.

With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing module is further configured to: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; the address processing module is further configured to: acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.

According to the scheme, the first weight matrix with n rows is rearranged according to the rows to obtain n weight matrices, and any two row vectors in the n row vectors in the same row in the n weight matrices are different, so that after multiplication operation is carried out on the feature data and the n weight matrices, a Cartesian product of the feature data and the first weight matrix is obtained, the multiplexing degree of the feature data is improved, and the operation efficiency is further improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the address processing module is further configured to calculate, according to the address of the weight data stored in the array, the address of the first feature data set, the size of the first feature matrix, a padding size, and a weight size, determine a first target address, where the weight size is n rows and m columns, the padding size includes a horizontal padding size and a vertical padding size, the horizontal padding size is (n-1)/2, and the vertical padding size is (m-1)/2.

The scheme further refines a method for obtaining the target data address according to the address of the weight data and the address of the feature data, thereby improving the realizability of the data processing device for obtaining the convolution result through the Cartesian product.

With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes a compression module, configured to: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set.

According to the scheme, the characteristic data and the weight data are thinned, namely, elements with the value of 0 in the characteristic data set and the weight data set are removed, so that the calculation amount of convolution operation is reduced, and the operation efficiency of the data processing device is improved.

In a second aspect, an embodiment of the present application provides a data processing method, where the method includes: acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2; acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing a first multiplication operation by using the first weight matrix and the first feature data set; performing a second multiplication operation by using the second weight matrix and the first feature data set; and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.

With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; determining a target data set according to operation results of the first multiplication operation and the second multiplication operation, including: and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation and the operation result of the address operation.

With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.

With reference to the second aspect, in a possible implementation manner of the second aspect, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the method further includes: and determining a first target address according to the address of the weight data stored in each address calculation array, the address of the first feature data set, the size corresponding to the first feature matrix, the filling size and the weight size, wherein the weight size is n rows and m columns, the filling size comprises a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1)/2, and the vertical filling size is (m-1)/2.

With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set.

In a third aspect, the present application provides a data processing apparatus comprising a processor and a memory, the memory storing program code, the processor being configured to invoke the program code in the memory to perform a method of data processing as provided in the second aspect of the present application.

Drawings

Fig. 1 is a schematic diagram of a convolution operation process in the prior art.

Fig. 2 is a block diagram of a data processing apparatus according to an embodiment of the present application.

FIG. 3 is a schematic diagram of a data computation array provided by an embodiment of the present application.

Fig. 4 is a block diagram of a data calculation unit in a data calculation array according to an embodiment of the present application.

Fig. 5 is a schematic diagram of a multiplication operation performed on a first feature data set according to an embodiment of the present application.

Fig. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set according to an embodiment of the present application.

FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application.

Fig. 8 is a block diagram of a structure of an address calculation unit in an address calculation array according to an embodiment of the present application.

Fig. 9 is a schematic diagram of weight data stored in two data calculation arrays according to an embodiment of the present application.

Fig. 10 is a schematic diagram of weight data stored in a data calculation array according to an embodiment of the present application.

Fig. 11 is a schematic diagram of a weight matrix with 3 filters and performing sparsification according to an embodiment of the present application.

Fig. 12 is a schematic diagram of a weight matrix that is not subjected to thinning processing according to an embodiment of the present application.

Fig. 13 is a schematic flowchart of a data processing method according to an embodiment of the present application.

Fig. 14 is a block diagram illustrating a result of a data processing apparatus according to an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

In the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a. b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c can be single or multiple. In addition, in the embodiments of the present application, the words "first", "second", and the like do not limit the number and the execution order.

It is noted that, in the present application, words such as "exemplary" or "for example" are used to mean exemplary, illustrative, or descriptive. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

Fig. 1 shows a feature data set, which comprises a total of 5 × 5 feature data. Fig. 1 also shows a set of weight data, which comprises a total of 3 × 3 weight data. The weight data set can be used as a convolution kernel to perform convolution operation with the data feature set.

Fig. 1 also shows a schematic of a two-step operation with step size 1 in the process of performing a convolution operation on a feature data set with a weight data set. As shown in fig. 1, 3 × 3 weight data in the feature data set need to be multiplied by 3 × 3 data in the feature data set, respectively. The results of the multiplication are added to obtain the value of one data of the convolution result. Specifically, according to FIG. 1, the convolution result c₁₁Can be expressed as equation 1.1, convolution result c₂₂Can be expressed as equation 1.2:

c₁₁＝a₁₁×b₁₁+a₁₂×b₁₂+a₁₃×b₁₃+a₂₁×b₂₁+a₂₂×b₂₂+a₂₃×b₂₃+a₃₁×b₃₁+a₃₂×b₃₂+a₃₃×b₃₃equation 1.1

c₁₂＝a₁₂×b₁₁+a₁₃×b₁₂+a₁₄×b₁₃+a₂₂×b₂₁+a₂₃×b₂₂+a₂₄×b₂₃+a₃₂×b₃₁+a₃₃×b₃₂+a₃₄×b₃₃Equation 1.2

After the two-step operation shown in fig. 1 is completed, the feature data set continues to slide to the right, and the next operation continues until the entire feature data set is traversed.

Assumption set E₁＝{a₁₁，a₁₂，a₁₃，a₂₁，a₂₂，a₂₃，a₃₁，a₃₂，a₃₃Is set F₁＝{b₁₁，b₁₂，b₁₃，b₂₁，b₂₂，b₂₃，b₃₁，b₃₂，b₃₃}. For set E₁And set F₁Performing Cartesian product operation to obtain a set G₁Set G₁A plurality of multiplication results as shown in table 1 may be included.

TABLE 1

a₁₁×b₁₁

a₁₁×b₁₂

a₁₁×b₁₃

a₁₁×b₂₁

a₁₁×b₂₂

a₁₁×b₂₃

a₁₁×b₃₁

a₁₁×b₃₂

a₁₁×b₃₃

a₂₁×b₁₁

a₂₁×b₁₂

a₂₁×b₁₃

a₂₁×b₂₁

a₂₁×b₂₂

a₂₁×b₂₃

a₂₁×b₃₁

a₂₁×b₃₂

a₂₁×b₃₃

a₃₁×b₁₁

a₃₁×b₁₂

a₃₁×b₁₃

a₃₁×b₂₁

a₃₁×b₂₂

a₃₁×b₂₃

a₃₁×b₃₁

a₃₁×b₃₂

a₃₁×b₃₃

a₁₂×b₁₁

a₁₂×b₁₂

a₁₂×b₁₃

a₁₂×b₂₁

a₁₂×b₂₂

a₁₂×b₂₃

a₁₂×b₃₁

a₁₂×b₃₂

a₁₂×b₃₃

a₂₂×b₁₁

a₂₂×b₁₂

a₂₂×b₁₃

a₂₂×b₂₁

a₂₂×b₂₂

a₂₂×b₂₃

a₂₂×b₃₁

a₂₂×b₃₂

a₂₂×b₃₃

a₃₂×b₁₁

a₃₂×b₁₂

a₃₂×b₁₃

a₃₂×b₂₁

a₃₂×b₂₂

a₃₂×b₂₃

a₃₂×b₃₁

a₃₂×b₃₂

a₃₂×b₃₃

a₁₃×b₁₁

a₁₃×b₁₂

a₁₃×b₁₃

a₁₃×b₂₁

a₁₃×b₂₂

a₁₃×b₂₃

a₁₃×b₃₁

a₁₃×b₃₂

a₁₃×b₃₃

a₂₃×b₁₁

a₂₃×b₁₂

a₂₃×b₁₃

a₂₃×b₂₁

a₂₃×b₂₂

a₂₃×b₂₃

a₂₃×b₃₁

a₂₃×b₃₂

a₂₃×b₃₃

a₃₃×b₁₁

a₃₃×b₁₂

a₃₃×b₁₃

a₃₃×b₂₁

a₃₃×b₂₂

a₃₃×b₂₃

a₃₃×b₃₁

a₃₃×b₃₂

a₃₃×b₃₃

Set E as shown in Table 1₁And set F₁Includes the calculation of c₁₁When required to be usedThe result of all multiplications: a is₁₁×b₁₁、a₁₂×b₁₂、a₁₃×b₁₃、a₂₁×b₂₁、a₂₂×b₂₂、a₂₃×b₂₃、a₃₁×b₃₁、a₃₂×b₃₂、a₃₃×b₃₃. The calculation of c is also included in the results of the Cartesian product operation of the set E and the set F₁₂Partial multiplication results needed to be used: a is₁₂×b₁₁、a₁₃×b₁₂、a₂₂×b₂₁、a₂₃×b₂₂、a₃₂×b₃₁、a₃₃×b₃₂。

Assumption set E₂＝{a₁₂，a₁₃，a₁₄，a₂₂，a₂₃，a₂₄，a₃₂，a₃₃，a₃₄}. For set E₂And set F₁Performing Cartesian product operation to obtain a set G₂Set G₂A plurality of multiplication results as shown in table 2 may be included.

TABLE 2

a₁₂×b₁₁

a₁₂×b₁₂

a₁₂×b₁₃

a₁₂×b₂₁

a₁₂×b₂₂

a₁₂×b₂₃

a₁₂×b₃₁

a₁₂×b₃₂

a₁₂×b₃₃

a₂₂×b₁₁

a₂₂×b₁₂

a₂₂×b₁₃

a₂₂×b₂₁

a₂₂×b₂₂

a₂₂×b₂₃

a₂₂×b₃₁

a₂₂×b₃₂

a₂₂×b₃₃

a₃₂×b₁₁

a₃₂×b₁₂

a₃₂×b₁₃

a₃₂×b₂₁

a₃₂×b₂₂

a₃₂×b₂₃

a₃₂×b₃₁

a₃₂×b₃₂

a₃₂×b₃₃

a₁₃×b₁₁

a₁₃×b₁₂

a₁₃×b₁₃

a₁₃×b₂₁

a₁₃×b₂₂

a₁₃×b₂₃

a₁₃×b₃₁

a₁₃×b₃₂

a₁₃×b₃₃

a₂₃×b₁₁

a₂₃×b₁₂

a₂₃×b₁₃

a₂₃×b₂₁

a₂₃×b₂₂

a₂₃×b₂₃

a₂₃×b₃₁

a₂₃×b₃₂

a₂₃×b₃₃

a₃₃×b₁₁

a₃₃×b₁₂

a₃₃×b₁₃

a₃₃×b₂₁

a₃₃×b₂₂

a₃₃×b₂₃

a₃₃×b₃₁

a₃₃×b₃₂

a₃₃×b₃₃

a₁₄×b₁₁

a₁₄×b₁₂

a₁₄×b₁₃

a₁₄×b₂₁

a₁₄×b₂₂

a₁₄×b₂₃

a₁₄×b₃₁

a₁₄×b₃₂

a₁₄×b₃₃

a₂₄×b₁₁

a₂₄×b₁₂

a₂₄×b₁₃

a₂₄×b₂₁

a₂₄×b₂₂

a₂₄×b₂₃

a₂₄×b₃₁

a₂₄×b₃₂

a₂₄×b₃₃

a₃₄×b₁₁

a₃₄×b₁₂

a₃₄×b₁₃

a₃₄×b₂₁

a₃₄×b₂₂

a₃₄×b₂₃

a₃₄×b₃₁

a₃₄×b₃₂

a₃₄×b₃₃

Set E as shown in Table 2₂And set F₁Includes the calculation of c₁₂Partial multiplication results needed to be used: a is₁₄×b₁₃、a₂₄×b₂₃、a₃₄×b₃₃。

In the calculation of c shown in tables 1 and 2₁₁And c₁₂The unneeded multiplication result may also be applied in subsequent convolution operations.

As can be seen from the analysis of the above convolution and Cartesian product operation processes, the convolution can be decomposed into Cartesian product operations. The operation result obtained by one Cartesian product operation can be used for multi-step convolution operation. The one-step convolution operation may be one or more times of adding the results of the cartesian product operations.

Fig. 2 is a block diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 200 shown in fig. 2 includes: a storage module 210, a data processing module 220, an address processing module 230, and a control module 240.

The storage module 210 is configured to store a first feature data set, an address of each feature data in the first feature data set, a first weight value set, and an address of each weight value in the first weight value set.

The data processing module 220 includes N data computation arrays. Each of the N data calculation arrays includes N × m data calculation units, where N is a positive integer greater than or equal to 2, and m is a positive integer greater than or equal to 2.

The address processing module 230 includes N address calculation arrays. Each of the N address calculation arrays includes N × m address calculation units.

Each data calculation array is configured to obtain n × m weight data from the storage module 210, and store the obtained weight data in n × m data calculation units of each data calculation array.

Each address calculation array is configured to obtain addresses of n × m weight data from the storage module 210 and store the obtained addresses of the weight data in n × m address calculation units of each address calculation array. The address of the weight data stored in the N address calculation arrays is the address of the weight data stored in the N data calculation arrays. In other words, the N address calculation arrays correspond to the N data calculation arrays one to one, and each of the N address calculation arrays stores an address of the weight data stored in the corresponding data calculation array. For example, assume that the weight data stored in one of the N data calculation arrays is b₁₁，b₁₂、b₁₃、b₂₁、b₂₂、b₂₃、b₃₁、b₃₂、b₃₃If the address calculation array corresponding to the data calculation array in the N address calculation arrays stores the address b₁₁Address of b₁₂Address of b₁₃Address of b₂₁Address of b₂₂Address of b₂₃Address of b₃₁Address of b₃₂Address of (a) and (b)₃₃The address of (2).

The N data calculation arrays multiply the first feature data set using the weight data held by the N data calculation arrays. In the operation process of the first characteristic data set, the weight data stored by the N data calculation arrays are unchanged.

Similarly, the N address calculation arrays perform address operations on the addresses of the first feature data set using the addresses of the weight data held by the N address calculation arrays, wherein the addresses of the weight data held by the N address calculation arrays remain unchanged during the address operations on the addresses of the first feature data set.

And the control module 240 is configured to determine a target data set according to the operation result of the multiplication operation and the operation result of the address operation by the N data calculation arrays.

Therefore, the N data calculation arrays may determine, according to the multiplication result and the operation result of the address operation, an operation result of performing a convolution operation on the first feature data set by the weight data stored in the N data calculation arrays. In other words, in some embodiments, the target data set may be a set of data obtained by performing a convolution operation on the first feature data set by using the weight data stored in the N data calculation arrays.

The operation of the first feature data set shown in fig. 1 by using the saved weight data for the N data calculation arrays is described below with reference to fig. 1 and fig. 3 to 5.

FIG. 3 is a schematic diagram of a data computation array provided by an embodiment of the present application. The data calculation array 300 shown in fig. 3 includes 9 data calculation units, which are a data calculation unit 311, a data calculation unit 312, a data calculation unit 313, a data calculation unit 321, a data calculation unit 322, a data calculation unit 323, a data calculation unit 331, a data calculation unit 332, and a data calculation unit 333.

It will be appreciated that the data computation array may include input-output cells (not shown) in addition to the data computation cells shown in FIG. 3. The input-output unit is used to acquire the data that needs to be input to the data computation array 300. The input/output unit is also used for inputting the data required to be output by the data calculation array 300 to the corresponding unit and/or module. For example, the input/output unit may obtain the weight data and the feature data from the storage module, and send the obtained weight data and feature data to the corresponding data calculation unit. The input and output unit is also used for acquiring the target data calculated by each data calculation unit and sending the target data to the storage module.

Optionally, in some embodiments, data transfer between the various compute units in the data compute array is unidirectional. Taking fig. 3 as an example, arrows for connecting the data computing units in fig. 3 may indicate a unidirectional transmission direction of data. Take the data calculation unit 311, the data calculation unit 312, and the data calculation unit 313 as an example. The data calculation unit 311 may transmit data (e.g., feature data) to the data calculation unit 312, and the data calculation unit 312 cannot transmit the data to the data calculation unit 311. The data calculation unit 312 may transmit data to the data calculation unit 313, and the data calculation unit 313 cannot transmit data to the data calculation unit 312.

Fig. 4 is a block diagram of a data calculation unit in a data calculation array according to an embodiment of the present application. As shown in fig. 4, the data calculation unit 400 may include a storage subunit 401 and a data calculation subunit 402. It will be appreciated that the data computation unit 400 may also include an input output subunit. The input and output subunit is used for acquiring the data required to be acquired by the data calculation unit and outputting the data required to be output by the data calculation unit.

Specifically, the data calculation array 300 shown in fig. 3 may obtain 3 × 3 weight data in the weight data set shown in fig. 1, and store the 3 × 3 weight data in the 3 × 3 data calculation units of the data calculation array 300, respectively.

Specifically, weight data b₁₁Weight data b can be stored in a storage subunit of the data calculation unit 311₁₂Weight data b can be stored in a memory sub-unit of the data calculation unit 312₁₃May be stored in a storage subunit of the data calculation unit 313, and so on. Thus, 3 × 3 weight data are stored in the data calculation array 300.

After 3 × 3 pieces of weight data are stored, the data calculation array 300 may slide the first feature data set in one direction, and multiply the first feature data set by using the weight data stored in the data calculation array 300. In the process of multiplying the first feature data set by the data calculation array 300, the weight data stored in the data calculation array 300 is not changed. In other words, during the multiplication operation of the first feature data by the data calculation array 300, the data calculation units in the data calculation array 300 do not delete the saved weight data. Accordingly, the data calculation unit will not read and store new weight data from the storage module.

The manner in which the first feature data set slides in one direction may be referred to in fig. 5. Fig. 5 is a schematic diagram of a process of multiplying the first feature data set according to an embodiment of the present application. As shown in fig. 5, the first feature data set may be first flipped 180 degrees. As shown in fig. 5, column 1 of the first feature data set becomes column 5 after flipping, column 2 becomes column 4 after flipping, and so on. It should be noted that, as shown in fig. 5, the first feature data set is first flipped 180 and then slid to the right for convenience of describing the feature data a₁₁、a₂₁、a₃₁、a₁₂、a₂₂、a₃₂、a₁₃、a₂₃、a₃₃And weight data b₁₁、b₂₁、b₃₁、b₁₂、b₂₂、b₃₂、b₁₃、b₂₃And b₃₃The calculation process of (2). In practical implementation, the first feature data set may be directly multiplied by the weight data stored in the data calculation array 300 by sliding to the right. The operation result of the first characteristic data set directly performing the multiplication operation by sliding right is the same as the data value of the operation result of the first characteristic data set performing the multiplication operation by firstly turning over 180 degrees and then sliding right in the manner shown in fig. 5, except that the obtained final data have different sequence.

The first feature data set after being turned over slides to the right in a single direction, and multiplication operation is respectively carried out on the first feature data set and weight data stored in the data calculation array 300. Specifically, at the time of the first operation, the feature data a₁₁、a₂₁And a₃₁Respectively associated with weight data b₁₁、b₂₁And b₃₁Multiplication. After the first operation, the reversed first characteristic data set slides to the right, and a second operation is carried out. At the second operation, the feature data a₁₁、a₂₁And a₃₁Respectively associated with weight data b₁₂、b₂₂And b₃₂Multiplication, feature data a₁₂、a₂₂And a₃₂Respectively associated with weight data b₁₁、b₂₁And b₃₁Multiplication. After the second operation, the feature data set after the turnover continues to slide to the right, and the third operation is carried out, and so on. In the above embodiment, the step size of each sliding of the first feature data set is 1. Of course, in some other embodiments, the step size of each sliding of the first feature data set may be a positive integer greater than 1.

Taking the first operation as an example, the data calculation unit 311 may obtain the feature data a from the first feature data set stored in the storage module 210₁₁And the acquired feature data a₁₁Stored in a storage sub-unit of the data calculation unit 311. In this case, the weight data b is stored in the memory sub-unit of the data calculating unit 311₁₁And characteristic data a₁₁. The data calculation subunit in the data calculation unit 311 holds the storage weight data b in the storage subunit₁₁And characteristic data a₁₁Multiplying to obtain intermediate data k_(11,11). Weight data b₁₁And characteristic data a₁₁The multiplication operation may be implemented by a multiplier in the data calculation subunit.

The data calculation unit 311 may also acquire the cache data r held in the first target address based on the target address determined by the address calculation unit corresponding to the data calculation unit 311_(11,11). Specifically, the address calculation unit corresponding to the data calculation unit 311 may calculate the address based on the feature data a₁₁Address and weight data b₁₁Determines the first target address. The data calculation unit 311 may acquire the current cache data r held in the first target address_(11,11). The manner in which the address calculation unit determines the first target address will be described later. The data calculation subunit calculates the intermediateData k_(11,11)And the current cache data r_(11,11)Adding to obtain target data d_(11,11). The intermediate data k_(11,11)And the current cache data r_(11,11)May be implemented by an adder in the data computation subunit. The target data d_(11,11)May be saved to the first target address. In other words, the current cache data r held in the first target address_(11,11)Is updated to the target data d_(11,11)。

Similarly, the data calculation unit 321 can determine the weight data b held by the data calculation unit 321 in the same manner₂₁And characteristic data a₂₁Product of (c) (hereinafter referred to as intermediate data k)_(21，21)). The target address determined by the address calculation unit corresponding to the data calculation unit 321 is also the first target address. The data calculating unit 321 converts the intermediate data k₍₂₁，₂₁₎The current cache data stored with the first target address (the current cache data is updated to the target data d at this time)₍₁₁，₁₁₎) Adding to obtain target data d_(21，21). The target data d_(21，21)May be saved to the first target address. In other words, the current cache data d held in the first target address_(11，11)Is updated to the target data d₍₂₁，21)。

The data calculation unit 331 can determine the weight data b held by the data calculation unit 331 according to the same manner₃₁And characteristic data a₃₁Product of (c) (hereinafter referred to as intermediate data k)_(31，31)). The target address determined by the address calculation unit corresponding to the data calculation unit 331 is also the first target address. The data calculation unit 331 calculates the intermediate data k_(31，31)The current cache data stored with the first target address (the current cache data is updated to the target data d at this time)_(21，21)) Adding to obtain target data d_(31，31). The target data d_(31，31)May be saved to the first target address. In other words, the current cache data d held in the first target address_(21，21)Is updated to the target data d_(31，31)。

After the first operation, the target data stored in the first target address is a₁₁×b₁₁+a₂₁×b₂₁+a₃₁×b₃₁。

In a similar manner, the data computation array 300 may continue to operate on the first set of feature data using the weight data maintained by the data computation units in the data computation array 300.

After the third operation, the data stored in the first target address is a₁₁×b₁₁+a₂₁×b₂₁+a₃₁×b₃₁+a₁₂×b₁₂+a₂₂×b₂₂+a₃₂×b₃₂. That is, in the third operation, the target address determined by the address calculation unit corresponding to the data calculation unit 312, the data calculation unit 322, and the data calculation unit 332 is also the first target address. Therefore, after the third budget, the target data stored in the first target address is the data stored in the first target address after the first operation and the data a determined by the data calculating unit 312₁₂×b₁₂A determined by the data calculation unit 322₂₂×b₂₂And a determined by the data calculation unit 332₃₂×b₃₂And (4) summing. After the fifth operation, the data stored in the first target address is a₁₁×b₁₁+a₂₁×b₂₁+a₃₁×b₃₁+a₁₂×b₁₂+a₂₂×b₂₂+a₃₂×b₃₂+a₁₃×b₁₃+a₂₃×b₂₃+a₃₃×b₃₃. That is, in the fifth operation, the target address determined by the address calculation unit corresponding to the data calculation unit 313, the data calculation unit 323, and the data calculation unit 333 is also the first target address. Therefore, after the fifth operation, the target data stored in the first target address is the data stored in the first target address after the third operation and a determined by the data calculation unit 313₁₃×b₁₃A determined by the data calculating unit 323₂₃×b₂₃And a determined by the data calculating unit 333₃₃×b₃₃And (4) summing.

Thus, after five calculations, the data stored in the first target address is the convolution result c shown in equation 1.1₁₁. Similarly, the convolution operation of the first feature data set and the weight data set can be completed by using the multiplication operation and the address operation result.

The address calculation of the addresses of the first feature data set shown in fig. 1 by using the addresses of the saved weight data by the N address calculation arrays is described below with reference to fig. 1 and 3 to 8.

Fig. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set according to an embodiment of the present application. The address of the first characteristic data set shown in fig. 6 is the address of the first characteristic data set shown in fig. 1. In particular, the address Add_a11Is characteristic data a₁₁Address of, address Add_a12Is characteristic data a₁₂And so on. The addresses of the weight data sets as shown in fig. 6 are the addresses of the weight data sets as shown in fig. 1. In particular, the address Add_b11Is weight data b₁₁Address of, address Add_b12Is weight data b₁₂And so on.

FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application. The address calculation array 700 shown in fig. 7 includes 9 data calculation units in total, which are an address calculation unit 711, an address calculation unit 712, an address calculation unit 713, an address calculation unit 721, an address calculation unit 722, an address calculation unit 723, an address calculation unit 731, an address calculation unit 732, and an address calculation unit 733, respectively.

It will be appreciated that the address calculation array may include input-output units (not shown) in addition to the address calculation unit shown in fig. 7. The input-output unit is used to obtain the data that needs to be input into the address calculation array 700. The input/output unit is further configured to input data that needs to be output by the address calculation array 700 to a corresponding unit and/or module. For example, the input/output unit may obtain the address of the weight data and the address of the feature data from the storage module, and send the obtained address of the weight data and the address of the feature data to the corresponding address calculation unit. The input and output unit is further used for acquiring the target address calculated by each address calculation unit and sending the target address to the corresponding data calculation unit.

The N address calculation arrays correspond to the N data calculation arrays one by one. The one-to-one correspondence here means that one of the N data calculation arrays corresponds to one of the N address calculation arrays, and the address calculation arrays corresponding to different data calculation arrays are different. For example, assuming that N is equal to 3, 3 data calculation arrays are the data calculation array 1, the data calculation array 2, and the data calculation array 3, respectively, and 3 address calculation arrays are the address calculation array 1, the address calculation array 2, and the address calculation array 3, respectively. The data calculation array 1 corresponds to the address calculation array 1, the data calculation array 2 corresponds to the address calculation array 2, and the data calculation array 3 corresponds to the address calculation array 3. The address calculation array corresponding to the data calculation array is used to calculate the target address of each target data in the data calculation array. Further, the data calculation units in the data calculation array are in one-to-one correspondence with the address calculation units in the address calculation array. Assuming that the data calculation array shown in fig. 3 corresponds to the address calculation array shown in fig. 7, the data calculation unit 311 corresponds to the address calculation unit 711, the data calculation unit 312 corresponds to the address calculation unit 712, the data calculation unit 313 corresponds to the address calculation unit 731, and so on. The address calculation unit is used for determining the address of the target data of the corresponding data calculation unit. Specifically, the cache data r acquired by the data calculation unit 311 as described above_(11,11)The first target address is obtained by the address calculation unit 711 through address operation.

Fig. 8 is a block diagram of a structure of an address calculation unit in an address calculation array according to an embodiment of the present application. As shown in fig. 8, the address calculation unit 800 may include a storage sub-unit 801 and an address calculation sub-unit 802. It is understood that the address calculation unit 800 may further include an input-output subunit. The input and output subunit is used for acquiring the data required to be acquired by the address calculation unit and outputting the data required to be output by the address calculation unit.

Specifically, the address calculation array 700 shown in fig. 7 may obtain addresses of 3 × 3 weight data in the addresses of the weight data set shown in fig. 6, and store the addresses of the 3 × 3 weight data in 3 × 3 data calculation units of the address calculation array 700, respectively.

In particular, the address Add_b11Can be stored in a memory sub-unit of the address calculation unit 711, address Add_b12May be stored in a memory subunit of the address calculation unit 712, address Add_b13May be stored in a memory subunit of the address calculation unit 713, and so on. Thus, 3 × 3 addresses of weight data are stored in the address calculation array 700.

After storing the addresses of 3 × 3 pieces of weight data, the address calculation array 700 may perform unidirectional sliding on the address of the first feature data set, and perform address operation on the address of the first feature data set using the address of the weight data stored in the address calculation array 700. In the process of performing address operation on the address of the first feature data set by the address calculation array 700, the address of the weight data stored in the address calculation array 700 is not changed. In other words, during the address operation of the address calculation array 700 on the address of the first feature data, the address calculation unit in the address calculation array 700 does not delete the address of the saved weight data. Correspondingly, the address calculation unit will not read and store the address of the new weight data from the storage module.

The process of performing address calculation by unidirectional address sliding to the right of the first feature data set is similar to the process of performing multiplication by unidirectional address sliding to the right of the first feature data set, and thus, the description is not repeated here.

How the address calculation unit performs the address operation will be described below.

For convenience of description, hereinafter, the address of the weight acquired by the address calculation unit 800 is referred to as the address of the first weight, the address of the feature data acquired by the address calculation unit 800 is referred to as the address of the first feature data, and the address obtained by the address calculation unit 800 performing the address operation is referred to as the first target address.

The input/output subunit in the address calculation unit 800 may obtain the following information in addition to the address of the first feature data and the address of the first weight data from the storage module: the size of the input data corresponding to the first feature data set, the fill size, and the weight size, where the weight size is the size of the address calculation array to which the address calculation unit 800 belongs, and the fill size is a preset size. In this example, the weight size is 3 × 3. The size of the input data corresponding to the first feature data set, the fill size, and the weight size may also be saved in the storage subunit 801 of the address calculation unit 800. The address calculation subunit 801 may determine the first target address according to the first weight data address, the first feature data address, the size of the input data corresponding to the first feature data set, the padding size, and the weight size.

Assuming that the input picture size is a rows and b columns and the convolution kernel size is n rows and m columns, the convolved output picture size is (a-n +1) × (b-m + 1). This has two problems: 1, after each convolution operation, reducing the size of an output picture; and 2, fewer pixel points of corners and edge regions of the original picture are adopted in output, and much information of the edge position of the output picture is lost.

To address these issues, the original picture may be padded (Padding) on the boundary to increase the size of the matrix before the convolution operation is performed. 0 is usually used as a padding value.

And if the number of the horizontal and vertical extension pixel points is respectively p and q, the size of the filled original picture is (a +2p) x (b +2q), the size of the convolution kernel is kept unchanged by n rows and m columns, the size of the output picture is unchanged, and the size of the output picture is (a +2p-n +1) x (b +2q-m + 1). The number of pixel points p and q extending in each direction is the fill size. It can be concluded that the lateral filling dimension p is equal to (n-1)/2 and the longitudinal filling dimension q is equal to (m-1)/2.

The address calculation subunit 801 may determine the target address specifically according to the following formula:

result_cord＝(input_cord/input_size_x-w_cord/kernel_size_x+padding_size_x)×input_size_y+(input_cord％input_size_y-w_cord％kernel_size_y+padding_size_y) (formula 1.3)

Wherein,% represents remainder, result _ code represents the target address, input _ code represents the address of the characteristic data, and input _ size_xAn abscissa, input _ size, representing the size of the input data corresponding to the first set of feature data_yAn ordinate indicating the size of input data corresponding to the first feature data set, w _ code indicating the address of the weight data, kernel _ size_xThe abscissa, kernel _ size, representing the weight size_yOrdinate, padding _ size, indicating the weight size_xIndicates the horizontal padding size, padding _ size_yIndicating the vertical fill size.

The addresses of the feature data and the addresses of the weight data in equation 1.3 are absolute addresses. The absolute address refers to an absolute position of the feature data/weight data in the corresponding feature data set/weight data set. Suppose the feature data set includes X feature data, and the absolute address of the xth feature data in the X feature data is X-1, where X is a positive integer greater than 1 and less than or equal to X. For example, the feature data set includes: 5, 0, 0, 32, 0, 0, 0, 0, 23, the absolute addresses of

feature data

5, 32, and 23 are: 0,3,8. The absolute addresses listed above refer to the positions of the feature data in the feature data, and may be converted into addresses composed of abscissa and ordinate according to the specification of the feature matrix. Similarly, the absolute address of the weight data may be converted into an address formed by the abscissa and the ordinate.

Optionally, in some embodiments, the address calculation subunit 801 may further determine the target address according to the following formula:

result_cord＝((base_input+input_cord)/input_size_x-(base_w+w_cord)/kernel_size_x+padding_size_x)×input_size_y+((base_cord+input_cord)％input_size_y-(base_w+w_cord)％kernel__size_y+padding__size_y) (equation 1.4)

Wherein,% represents remainder, result _ code represents the target address, input _ code represents the address of the characteristic data, and input _ size_xAn abscissa, input _ size, representing the size of the input data corresponding to the first set of feature data_yAn ordinate indicating the size of input data corresponding to the first feature data set, w _ code indicating the address of the weight data, kernel _ size_xThe abscissa, kernel _ size, representing the weight size_yOrdinate, padding _ size, indicating the weight size_xIndicates the horizontal padding size, padding _ size_yIndicates the vertical fill size, base _ input indicates the base address of the feature data, and base _ w indicates the base address of the weight data.

The address of the feature data and the address of the weight data in equation 1.4 are relative addresses. The relative address refers to the position of the feature data/weight data in the corresponding feature data set/weight data set relative to the address of the first feature data/weight data. And assuming that the address of the first feature data combined by the feature data is Y, the address of the ith feature data in the feature data set is Y + Y-1, wherein Y and Y are positive integers greater than or equal to 1.

Optionally, in some embodiments, after determining the target address, the address calculation unit may directly send the target address to the corresponding data calculation unit. The data calculation unit may determine the cache data in the target address according to the target address.

Optionally, in other embodiments, after determining the target address, the address calculation unit may determine cache data in the target address, and then send the cache data and the target address together to the corresponding data calculation unit.

The above describes how a data compute array performs multiplication operations and an address compute array performs address operations.

As described above, the data processing apparatus may include 2 or more data calculation arrays and corresponding address calculation arrays.

The weight data set shown in fig. 1 only includes 3 × 3 weight data, and only one weight data set is used for performing convolution operation on the feature data set. Optionally, in other embodiments, two or more weight data sets used for performing convolution operation on the feature data set may also be used.

Optionally, in some embodiments, each of the N data calculation arrays may obtain and store a weight data set, and perform a multiplication operation on the first feature data set by using the stored weight data. Correspondingly, each address calculation array in the N address calculation arrays may obtain and store an address of corresponding weight data, and perform multiplication operation on the address of the first feature data set by using the address of the stored weight data.

If the number of the weight data sets used for performing convolution operation on the feature data set is greater than N, the N data calculation arrays can acquire the N weight data sets each time to perform multiplication operation on the first feature data set. And if the number of the weight data sets which can be obtained at one time is less than N, obtaining all the weight data sets to multiply the first characteristic data set. Assuming that the value of N is 4, the number of weight data sets is 9. In this case, the 4 data calculation arrays may first obtain the 1 st to 4 th weight data sets to multiply the first feature data set, then the 4 data calculation arrays obtain the 5 th to 8 th weight data sets to multiply the first feature data set, and then the 4 data calculation arrays obtain the 9 th weight data set to multiply the first feature data set. The manner of address operation performed by the N address calculation arrays is similar, and thus, the description thereof is not repeated.

Optionally, in other embodiments, the weight data stored in different data calculation arrays of the N data calculation arrays may be the result of rearranging the same weight data in rows. For example, it is assumed that the N data calculation arrays include a first data calculation array and a second data calculation array, and that the N × m weight data stored in the second data calculation array is N × m weight data obtained by rearranging the N × m weight data stored in the first data calculation array in rows.

As shown in FIG. 9, the data calculation array 1 stores 3 × 3 weight data, where the weight data in the first row is b₁₁，b₁₂And b₁₃(ii) a The second row weight data is b₂₁，b₂₂And b₂₃(ii) a The third row weight data is b₃₁，b₃₂And b₃₃. The data calculation array 2 stores 3 x 3 weight data, wherein the weight data in the first row is b₃₁，b₃₂And b₃₃(ii) a The second row weight data is b₁₁，b₁₂And b₁₃(ii) a The third row weight data is b₂₁，b₂₂And b₂₃. It can be seen that the result of the rearrangement of the weight data stored in the data calculation array 1 is the weight data stored in the data calculation array 2. Correspondingly, the weight data stored in the data calculation array 1 may also be considered as a result of rearranging the weight data stored in the data calculation array 2 in rows. For convenience of description, the weight data obtained after such line rearrangement is referred to as rearranged weight data, and the weight data stored in the two data calculation arrays shown in fig. 9 are referred to as mutually rearranged weight data.

FIG. 9 shows the relationship between the weight data held by two data calculation arrays. Optionally, in some embodiments, the weight data stored in any two of the three or more data calculation arrays are also mutually rearranged weight data. For example, the N data calculation arrays further include a data calculation array 3 as shown in fig. 10, the data calculation array 3 stores 3 × 3 weight data, wherein the weight data in the first row isb₂₁，b₂₂And b₂₃(ii) a The second row weight data is b₃₁，b₃₂And b₃₃(ii) a The third row weight data is b₁₁，b₁₂And b₁₃. Therefore, as shown in fig. 9, the weight data stored in the data calculation array 1 and the data calculation array 3 are rearranged weight data; the weight data stored in the data calculation array 2 and the data calculation array 3 are also rearranged mutually. In summary, if the value of N is greater than or equal to N, the weight data includes N rows in total, the weight data may be rearranged at most N-1 times, and the weight data stored in the 2 nd to nth data calculation arrays in the N data calculation arrays are all weight data obtained by rearranging the weight data stored in the 1 st data calculation array in the N data calculation arrays by rows, where any two row vectors of the row vectors in which the N weight data stored in the N data calculation arrays are located in the same row are different. N is a positive integer greater than or equal to N. In this case, the first data calculation array and the second data calculation array are any two data calculation arrays of the n data calculation arrays. In other words, the first row weight data stored in each of the n data calculation arrays is the second row weight data to the nth row weight data in the remaining n-1 data calculation arrays, respectively.

Optionally, in some embodiments, the data calculation array 2 and the data calculation array 3 may obtain 3 × 3 pieces of weight data as shown in fig. 1, and then perform data rearrangement to obtain rearranged weight data.

Optionally, in other embodiments, the storage module may store the rearrangement weight data, and the data calculation array 2 and the data calculation array 3 directly obtain the rearrangement weight data from the storage module.

It can be understood that, since the data calculation array corresponds to the address calculation array, the address of the weight data held by the second address calculation array corresponding to the second data calculation array is also the result of the rearrangement by rows of the addresses of the weight data held by the first address calculation array corresponding to the first data calculation array.

Similarly, if the value of N is greater than or equal to N, the weight data includes N rows in total, and the address of the weight data also includes N rows. The addresses of the weight data can be rearranged for N-1 times at most, and the addresses of the weight data stored in the 2 nd to nth address calculation arrays in the N address calculation arrays are all addresses of the weight data after the addresses of the weight data stored in the 1 st address calculation array in the N address calculation arrays are rearranged according to rows. N is a positive integer greater than or equal to N. In this case, the first address calculation array and the second address calculation array are any two address calculation arrays of n address calculation arrays. In other words, the address of the first row weight data stored by each address calculation array in the n address calculation arrays is the address of the second row weight data to the address of the nth row weight data in the rest n-1 data calculation arrays respectively.

The access times of the data calculation array and the address calculation array to the storage module can be further reduced by multiplexing the characteristic value data after the weight data and the corresponding weight data address are rearranged according to rows.

For example, in the process of performing convolution operation on the feature data set shown in fig. 1 by using the weight data set shown in fig. 1, it is also necessary to determine an operation result shown in equation 1.3:

c₂₁＝a₂₁×b₁₁+a₂₂×b₁₂+a₂₃×b₁₃+a₃₁×b₂₁+a₃₂×b₂₂+a₃₃×b₂₃+a₄₁×b₃₁+a₄₂×b₃₂+a₄₃×b₃₃equation 1.4

If the weight data stored in the second data calculation array after the weight data is rearranged is as shown in fig. 9, a partial result of formula 1.4 can be obtained after one access to the storage module.

Specifically, when the data calculation array 2 shown in fig. 9 multiplies the feature data set by the saved weight data, a can be obtained₂₁×b₁₁A result of the operation of (a)₂₂×b₁₂A result of the operation of (a)₂₃×b₁₃A result of the operation of (a)₃₁×b₂₁A result of the operation of (a)₃₂×b₂₂A sum of the operation results of₃₃×b₂₃The operation result of (1). According to the operation rule described above, the sum of the above 6 operation results is stored in the same target address.

It is assumed that the data processing apparatus includes only the data calculation array 1 and the data calculation array 2 and that the weight data held by the data calculation array 1 and the data calculation array 2 is as shown in fig. 9. In the process of multiplying the feature data set shown in fig. 1 by using the data calculation array 1 and the data calculation array 2, after multiplying the feature data in the first row to the third row of the feature data set, the data calculation array 1 and the data calculation array 2 may multiply the feature data in the third row to the fifth row of the feature data set. In other words, the step size of the downslide may be 2 in the multiplication operation through the feature data set. If a is desired to be obtained without rearranging the weight data (in other words, the data processing apparatus has only the data calculation array 1 shown in FIG. 9)₂₁×b₁₁、a₂₂×b₁₂、a₂₃×b₁₃For example, after the multiplication of the feature data of the first row to the third row is completed, the multiplication of the feature data of the second row to the fourth row needs to be performed by using the data calculation array 1. The multiplication requires that the feature data of the second to third rows of the feature data set be acquired again. In other words, the second to third rows of the feature data set need to be read for the second time to obtain a₂₁×b₁₁、a₂₂×b₁₂、a₂₃×b₁₃Etc., which results in the same feature data needing to be read multiple times.

Because the weight data is rearranged, the operation result of the data calculation array 2 for multiplying the feature data in the second row to the third row of the feature data set is equivalent to the operation result of the data calculation array 1 for multiplying the feature data in the second row to the third row after sliding downwards with the step length of 1. In other words, the feature data in the second row to the third row of the feature data set can be read once, and multiplication of the feature data in the second row to the third row by two weight data sets can be realized. Thus more partial cartesian products can be obtained with one reading of the characteristic data. In practice, partial cartesian products of the feature data set and the weight data set are used for prediction, so that the number of accesses to the storage module can be reduced and the data processing speed can be increased by rearranging the weight data in rows, performing multiplication operation on the feature data set, the original weight data and the rearranged weight data respectively according to the feature data set, and obtaining a target data set including partial cartesian products according to the result.

When the first weight matrix with n rows is rearranged (n-1) times, and any two row vectors in the n row vectors of the obtained n weight matrices in the same row are different, after the characteristic data set and the n weight matrices are subjected to multiplication, the Cartesian product of the characteristic data set and the first weight matrix can be obtained, and the convolution of the characteristic data set and the first weight matrix can be further obtained, wherein each characteristic data in the characteristic data set only needs to be loaded into the data processing unit once.

In the process of multiplying the first row to the third row of feature data of the feature data set shown in fig. 1 by using the weight data shown in fig. 10, a can be obtained₃₁×b₁₁、a₃₂×b₁₂And a₃₃×b₁₃The operation result of (1). The first data calculation array, the second data calculation array, and the third data calculation array may multiply the feature data of the fourth row to the fifth row of the feature data set after the multiplication of the feature data of the first row to the third row of the feature data set. In other words, the feature data set is traversedThe step size of the downslide may be 3 during the multiplication.

Assuming that there are three data calculation arrays among the N data calculation arrays, which are respectively the data calculation array 1 shown in fig. 9, the data calculation array 2 and the data calculation array 3 shown in fig. 10, the three data calculation arrays can complete the cartesian product operation on the feature data set.

And also with characteristic data a₁₁、a₂₁、a₃₁、a₁₂、a₂₂、a₃₂、a₁₃、a₂₃、a₃₃For example. These three data calculation arrays may be respectively associated with the feature data a₁₁、a₂₁、a₃₁、a₁₂、a₂₂、a₃₂、a₁₃、a₂₃、a₃₃The multiplication process as shown in fig. 5 is performed. The three data calculation arrays use the weight data stored respectively to complete the feature data a₁₁、a₂₁、a₃₁、a₁₂、a₂₂、a₃₂、a₁₃、a₂₃、a₃₃The operation results of the multiplication in (2) are shown in table 1.

In summary, if the weight data includes n rows, the weight data can be rearranged n-1 times at most. If the weight data is rearranged for one time, the step length of downward sliding can be 2 in the process of traversing the characteristic data set to carry out multiplication operation; if the weight data is rearranged twice, the step length of downward sliding can be 3 in the process of traversing the characteristic data set to carry out multiplication operation; if the weight data is rearranged for n-1 times, the step length of downward sliding can be n in the process of traversing the feature data set to perform multiplication operation.

Optionally, in some embodiments, the first feature data set is a feature data set obtained by performing a thinning process on the second feature data set. The first weight data set is a weight data set obtained after thinning processing. The data processing apparatus 200 as shown in fig. 2 may further comprise a compression module. The compression module is used for acquiring a second characteristic data set, and performing sparsification processing on the second characteristic data set to obtain the first characteristic data set, wherein the second characteristic data set comprises characteristic data corresponding to input data. The compression module is further configured to obtain a second weight data set, and perform sparsification on the second weight data set to obtain the first weight data set. The compression module is further configured to determine an address of each feature data in the first feature data set, and determine an address of each weight data in the first weight data set. The compression module sends the acquired first feature data set, the acquired first weight data set, the address of each feature data in the first feature data set and the address of each weight data in the first weight data set to a storage module, and the storage module stores the addresses. If the weight data after thinning is less than n × m, the remaining bits are complemented by 0.

The input data referred to in the embodiments of the present application may be any data capable of multiplication, cartesian product operation, and/or convolution operation. For example, image data, voice data, and the like may be used. The input data is a generic term of all data input to the data processing apparatus. The input data may consist of characteristic data. The feature data corresponding to the input data may be all data included in the input data or may be partial feature data of the input data. Taking image data as an example, assuming that the input data is a whole image, all data of the image is referred to as feature data. The second weight data set may include all feature data of the input data, or may be all or part of feature data of the image after some processing. For example, the second weight data may be feature data of a partial image obtained by segmenting the image.

Assume that the second set of feature data includes: 5, 0, 0, 32, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 43, 54, 0, 0, 0, 0, 1, 4, 9, 34, 0, 0, 0, 0, 0, 0, 87, 0, 0, 0, 0, 5, 8, the first feature data set obtained after the thinning includes: 5,32, 23, 43, 54,1,4,9, 34, 87,5,8. Assuming that the address of the first feature data in the second feature data set is 0, the address of the second feature data is 1, the address of the third feature data is 2, and the address of the nth feature data is n-1, the address (absolute address) of the first feature data set is: 0,3,8, 14, 15, 19, 20, 21, 22, 29, 34, 35.

Assuming that the second weight data set includes 8, 4, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 24, 54, 0, 0, 0, 0, 0, 0, 12, 0, 0, 22, 3, 45, 0, 0, 0, 0, 67, 44, 0, 0, 0, 0, 0, 0, 0, 0, 35, 65, 75, the thinned second weight data set includes: 8,4,2, 24, 54, 12, 22,3, 45, 67, 44, 35, 65, 75. It can be seen that the thinned second set of weight data includes 14 weight data. Assume that each data calculation array includes 3 × 3 data calculation units. Therefore, the number of the weight data in the thinned second weight data set is less than the number of data calculation units included in 2 data calculation arrays. Therefore, the second weight data set after the thinning is supplemented with 40 s finally, and the first weight data set is obtained. Therefore, the set of first weight data corresponding to the second weight data is: 8,4,2, 24, 54, 12, 22,3, 45, 67, 44, 35, 65, 75,0,0,0,0. Assuming that the address of the first weight data in the second weight data set is 0, the address of the second weight data is 1, the address of the third weight data is 2, and the address of the nth weight data is n-1, the address (absolute address) of the first weight data set is: 0,1,6, 16, 17, 23, 26, 27, 28, 33, 34, 43, 44, 45.

In some embodiments, the first feature data set may also be a feature data set that has not been subjected to a thinning process. In other words, the first set of characteristic data may be equal to the second set of characteristic data.

The first feature data set in the above embodiment corresponds to a matrix, and correspondingly, the weight data for performing the convolution operation on the first feature data set also corresponds to a matrix. In other words, the convolution operation described in the above embodiment is a two-dimensional convolution operation.

The technical solution of the embodiment of the present application may also be applied to T-dimensional multiplication, cartesian product calculation, and/or convolution calculation (T is a positive integer greater than or equal to 3). In addition, a plurality of weight data sets for performing convolution operation on the first feature data set may be provided.

The following describes the technical solution of the present application by taking a three-dimensional convolution operation as an example.

If the input data corresponding to the first set of feature data is a color picture data, the first set of feature data may be a three-dimensional tensor. A three-dimensional convolution operation may be performed on the first set of feature data.

The first set of feature data includes three subsets: a feature data subset 1, a feature data subset 2, and a feature data subset 3. The feature data of the three subsets correspond to the three input channels, red, green and blue, respectively. The feature data in each of the three subsets may correspond to a matrix.

It is assumed that three sets of weight data are used to perform a three-dimensional convolution operation on the first set of feature data. The set of weight data used to perform the convolution operation on the set of feature data may also be referred to as a Filter (Filter). Thus, the three sets of weight data may be referred to as filter 1, filter 2, and filter 3. Each of the three weight data sets includes three weight channels, which are channel 1, channel 2, and channel 3. The weight data included in each of the three weight channels may correspond to a matrix. The three weight channels correspond to the three feature data subsets one to one. For example, lane 1 corresponds to feature data subset 1, lane 2 corresponds to feature data subset 2, and lane 3 corresponds to feature data subset 3. The weight channel may perform convolution operations on the corresponding feature data subsets. Filter 1, filter 2, and filter 3 may each perform a three-dimensional convolution operation on the first feature data set. That is, the channel 1 of the filter 1 performs convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 1 performs convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 1 performs convolution operation on the feature data subset 3 of the first feature data set; the channel 1 of the filter 2 performs convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 2 performs convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 2 performs convolution operation on the feature data subset 3 of the first feature data set; the channel 1 of the filter 3 performs convolution operation on the feature data subset 1 of the first feature data set, the channel 2 of the filter 3 performs convolution operation on the feature data subset 2 of the first feature data set, and the channel 3 of the filter 3 performs convolution operation on the feature data subset 3 of the first feature data set.

It can be seen that the process of performing the three-dimensional convolution operation on the first feature data set by each of the three filters can be decomposed into three two-dimensional convolution operation processes. The specific implementation of these three two-dimensional convolution operations is similar to that of the two-dimensional convolution operation in the above-described embodiment. Taking the example that the channel 1 performs convolution operation on the feature data subset 1, the channel 1 may be regarded as the weight data set shown in fig. 1, and the feature data subset 1 may be regarded as the feature data set shown in fig. 1. The process of performing convolution operation on the feature data subset by the channel 1 is the process of performing convolution operation on the feature data set by the weight data set as shown in fig. 1. As described above, the convolution process can be decomposed into multiplication and addition operations. Therefore, the data processing apparatus shown in fig. 2 can also perform a three-dimensional convolution operation. In the case where the input data corresponding to the feature data set is a three-dimensional tensor, the first feature data set referred to in the above embodiments may be regarded as a subset of feature data in the feature data set corresponding to the three-dimensional tensor. In the case of performing convolution operation on the feature data set by using a plurality of weight data sets, the first weight data set may be regarded as one weight data set of the plurality of weight data sets. In the case that the set of weight data also corresponds to a three-dimensional tensor, the first set of weight data can be considered as one channel of the set of weight data of the three-dimensional tensor.

The process of the multidimensional convolution operation of three or more dimensions is similar to the three-dimensional convolution operation process, and the description is not repeated here.

Optionally, in a case where a plurality of weight data sets are used to perform convolution operation on the feature data set, the first weight data set may also be a weight data set obtained by performing thinning processing on the plurality of weight data sets. Specifically, the non-0 weight data included in the first weight data set is from one channel of the same weight data set or the same channel of different weight data sets.

The thinning process for the multiple weight data sets will be described below with reference to fig. 11.

Fig. 11 is a schematic diagram of a weight matrix with 3 filters and performing sparsification according to an embodiment of the present application. Each of the 3 filters shown in fig. 11 includes 3 weight channels, and each weight channel includes 3 × 3 weight data.

As shown in FIG. 11, the weight data of weight data set 1 is from the weight data in channel 1 of filter 1 and filter 2, and the weight data of weight data set 4 is from the weight data in channel 1 of filter 2 and filter 3. The weight data of the weight data set 2 comes from the weight data in the channel 2 of the filter 1 and the filter 2, and the weight data of the weight data set 5 comes from the weight data in the channel 2 of the filter 2 and the filter 3. The weight data of the weight data set 3 is from the weight data in the channel 3 of the filter 1 and the filter 2, and the weight data of the weight data set 6 is from the weight data in the channel 3 of the filter 2 and the filter 3.

As shown in fig. 11, the same channel where the weight data is from different weight data means that the weight data may belong to different filters, but the channels in different filters are the same. The weight data, e.g., weight data set 4, comes from the weight data in channel 1 of filter 2 and the weight data in channel 1 of filter 3.

For convenience of description, a weight data set obtained by thinning the weight data in the plurality of filters is referred to as a thinned weight data set hereinafter)

In some embodiments, the weight data included in the set of sparse weight data may come from the same filter. The operation process of the feature data multiplication by the sparse weight data set and the process of determining the convolution operation result of the sparse weight data set and the feature data according to the operation result of the multiplication are the same as those in the above embodiments, and thus, the description thereof is not repeated.

In some embodiments, the weight data included in the set of sparse weight data may come from different filters. The operation process of multiplying the feature data by the sparse weight data set is the same as that of the above embodiment, and thus, the description is not repeated here. In the case that the weight data included in the thinned weight data set can come from different filters, the process of determining the convolution operation result of the thinned weight data set and the feature data according to the operation result of the multiplication operation is not exactly the same as the above-described embodiment.

Specifically, it is assumed that the set of thinned weight data includes weight data from P filters (P is a positive integer greater than or equal to 2). The set of sparse weight data may be divided into P subsets of sparse weight data, a pth subset of sparse weight data of the P subsets of sparse weight data comprising weight data from a pth filter of the P filters, P being 1, … …, P. Suppose that the p-th sparse weight data subset includes Num_pA weight value data of Num_pIs a positive integer greater than or equal to 1, and Num_pLess than n x m.

And carrying out Cartesian product operation on the sparse weight data set and the characteristic data set by utilizing the N data calculation arrays to obtain multiplication results required by convolution operation of each filter and the characteristic data set, and adding the corresponding multiplication results to obtain the convolution operation result of each filter and the characteristic data set.

The weight data stored in the three data calculation arrays shown in fig. 9 and 10 are also taken as an example. Suppose that the weight data shown in FIGS. 9 and 10 areThe weight data of the channel 1 of the filter 1 and the weight data of the channel 1 of the filter 2 shown in fig. 12 are thinned. Array pairs { a ] were calculated using the three data shown in FIG. 10₁₁，a₁₂，a₁₃，a₂₁，a₂₂，a₂₃，a₃₁，a₃₂，a₃₃Carrying out Cartesian product operation to obtain the following operation results: a is₁₁×b₁₁、a₁₂×b₁₂、a₁₃×b₁₃、a₂₁×b₂₁、a₂₂×b₂₂、a₂₃×b₂₃、a₁₁×b₃₁、a₁₂×b₃₂、a₁₃×b₃₃. Can see a₁₁×b₁₁、a₁₂×b₁₂、a₁₃×b₁₃、a₂₁×b₂₁、a₂₂×b₂₂、a₂₃×b₂₃The sum of (a) is the weight data pair of channel 1 of filter 1₁₁，a₁₂，a₁₃，a₂₁，a₂₂，a₂₃，a₃₁，a₃₂，a₃₃Carrying out convolution operation on the operation result; a is₁₁×b₃₁、a₁₂×b₃₂、a₁₃×b₃₃Is the weight data pair of channel 1 of filter 2 { a₁₁，a₁₂，a₁₃，a₂₁，a₂₂，a₂₃，a₃₁，a₃₂，a₃₃And (6) carrying out convolution operation on the operation result.

Further, the compression module may also perform sparsification on the target data set, and delete 0 in the target data set.

Through the technical scheme, the product of each feature data in the first feature data set and each weight data in the first weight data set can be obtained. After that, the convolution operation result of the first feature data set and the first weight data set can be obtained by adding the corresponding product results.

In addition, in the above embodiment, in the process of determining the convolution operation result of the first feature data set and the first weight value set according to the operation result of the cartesian product and the address operation result, each data calculation unit in the data calculation array adds the product of the weight value data and the feature data to the data stored in the target address determined by the corresponding address calculation unit, and writes the added data back to the target address. Thus, the final result held by the target address is the convolution operation result.

In other embodiments, each data calculation unit in the data calculation array may perform only a multiplication operation, that is, multiply the data by the feature data, store the multiplication result to the target address determined by the corresponding address calculation unit, then obtain the multiplication result from the corresponding target address, and add the obtained multiplication results to obtain the corresponding convolution operation result. For example, a₁₁×b₁₁The result of (2) is stored at the target address 1, a₂₁×b₂₁The result of (2) is stored at the target address 2, a₃₁×b₃₁The result of (2) is stored at the target address 3, a₁₂×b₁₂The result of (2) is stored at the target address 4, a₂₂×b₂₂The result of (2) is stored at the target address 5, a₃₂×b₃₂The result of (2) is stored at the target address 6, a₁₃×b₁₃The result of (2) is stored at the target address 7, a₂₃×b₂₃The result of (2) is stored at the target address 8, a₃₃×b₃₃The result of (2) is stored at the target address 9. When calculating the convolution result, the data stored in the target addresses 1 to 9 can be added to obtain c shown in formula 1.1₁₁。

In other embodiments, the memory module may include an addition unit. Each data calculation unit in the data calculation array can only perform multiplication operation, namely, data is multiplied by the characteristic data, the multiplication result is output to the storage module, and when the storage module stores the received data to the target address determined by the address calculation unit corresponding to the data calculation unit, the storage module firstly adds the received data and the data stored in the target address, and stores the added data to the target address. Thus, the final result held by the target address is the result of the convolution operation.

Fig. 13 is a schematic flow chart of a data processing method provided according to an embodiment of the present application. The method shown in fig. 13 may be performed by the data processing apparatus shown in fig. 2 or fig. 14.

1301, a first weight matrix in a first weight data set is obtained, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set are from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2.

1302, a second weight matrix is obtained, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix in rows.

1303, a multiplication operation is performed on the first weight matrix and a first feature data set, where data in the first feature data set are from the same input channel.

And 1304, multiplying the first feature data set by the second weight matrix.

1305, a target data set is determined based on the operation result of the multiplication.

Specific implementation manners of the steps of the method shown in fig. 13 can be seen from the descriptions of fig. 2 to fig. 12, and thus, detailed descriptions thereof are omitted.

Optionally, in some embodiments, the method further includes: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; the determining a target data set according to the operation result of the multiplication operation includes: and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.

Optionally, in some embodiments, the method further includes: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.

Optionally, in some embodiments, the target data set includes a result matrix, where the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the method further includes: and determining a first target address according to the address of the weight data stored in each address calculation array, the address of a first feature data set, the size corresponding to the first feature matrix, a filling size and a weight size, wherein the weight size is n rows and m columns, and the filling size is the difference between the size of the first feature data set and the size of the result matrix. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.

Optionally, in some embodiments, the method further includes: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set. The specific implementation manner of each step described above can also refer to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.

Fig. 14 is a block diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 1400 shown in fig. 14 includes: data processing module 1401 and control module 1404, data processing module 1401 includes N data calculation units, N is an integer greater than or equal to 2, wherein: a data processing module 1401, configured to obtain a first weight matrix in a first weight data set, where the first weight matrix is represented by n rows and m columns of weight data, and data in the first weight data set are from the same input channel, where n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2; acquiring a second weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix according to rows; performing multiplication operation on the first weight matrix and a first characteristic data set, wherein data in the first characteristic data set come from the same input channel; performing multiplication operation by using the second weight matrix and the first characteristic data set; the control module 1404 is configured to determine a target data set according to an operation result of the multiplication.

Optionally, in some embodiments, the data processing apparatus 1400 further includes an address processing module 1402, where the address processing module 1402 includes N address calculation units, and the data calculation units and the address calculation units are in one-to-one correspondence, where: the address processing module 1402 is configured to: acquiring addresses of weight data in the first weight matrix and the second weight matrix; address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set; the control module 1404 is configured to determine, according to an operation result of the multiplication operation, a target data set including: and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.

Optionally, in some embodiments, the data processing module 1401 is further configured to: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix according to rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different; the address processing module 1402 is further configured to: acquiring addresses of weight data in the third weight matrix to the nth weight matrix; and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.

Optionally, in some embodiments, the target data set includes a result matrix, where the result matrix is a result of performing a convolution operation on the first feature data set and the first weight data set, and the first feature data set is represented as a first feature matrix, and the address processing module 1402 is further configured to determine a first target address according to an address of the weight data stored in each address calculation array, an address of the first feature data set, a size corresponding to the first feature matrix, a padding size, and a weight size, where the weight size is n rows and m columns, and the padding size is a difference between a size of the first feature data set and a size of the result matrix.

Optionally, in some embodiments, the data processing apparatus 1400 further comprises a compression module 1403 for: acquiring a second characteristic data set, and removing elements with the median value of 0 in the second characteristic data set to obtain the first characteristic data set; acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set; and determining the address of each feature data in the first feature data set, and determining the address of each weight value in the first weight value data set.

The detailed functions and advantages of the modules in the data processing apparatus 1400 shown in fig. 14 can be referred to the descriptions of fig. 2 to fig. 12, and thus, the detailed description is not necessary here.

In the embodiment of the application, the terminal device or the network device includes a hardware layer, an operating system layer running on the hardware layer, and an application layer running on the operating system layer. The hardware layer includes hardware such as a Central Processing Unit (CPU), a Memory Management Unit (MMU), and a memory (also referred to as a main memory). The operating system may be any one or more computer operating systems that implement business processing through processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system. The application layer comprises applications such as a browser, an address list, word processing software, instant messaging software and the like. Furthermore, the embodiment of the present application does not particularly limit the specific structure of the execution main body of the method provided by the embodiment of the present application, as long as the communication can be performed according to the method provided by the embodiment of the present application by running the program recorded with the code of the method provided by the embodiment of the present application, for example, the execution main body of the method provided by the embodiment of the present application may be a terminal device or a network device, or a functional module capable of calling the program and executing the program in the terminal device or the network device.

In addition, various aspects or features of the present application may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques. The term "article of manufacture" as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. For example, computer-readable media can include but are not limited to magnetic storage devices (e.g., hard disk, floppy disk, magnetic strips, etc.), optical disks (e.g., Compact Disk (CD), Digital Versatile Disk (DVD), etc.), smart cards, and flash memory devices (e.g., erasable programmable read-only memory (EPROM), card, stick, or key drive, etc.). In addition, various storage media described herein can represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" can include, without being limited to, wireless channels and various other media capable of storing, containing, and/or carrying instruction(s) and/or data.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data processing apparatus, characterized in that the data processing apparatus comprises:

the data processing module is used for acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2;

acquiring a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix obtained by rearranging the first weight matrix in rows;

performing a first multiplication operation using the first weight matrix and the first feature data set

Performing a second multiplication operation by using the second weight matrix and the first feature data set;

and the control module is used for determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.

2. The data processing apparatus of claim 1, further comprising:

an address processing module to: acquiring addresses of weight data in the first weight matrix and the second weight matrix;

address operation is carried out by using addresses of the weight data in the first weight matrix and the second weight matrix and addresses in the first characteristic data set;

the control module is used for controlling the operation of the electronic device,

and determining a target data set according to the operation result of the multiplication operation and the operation result of the address operation.

3. The data processing apparatus of claim 2,

the data processing module is further configured to: acquiring a third weight matrix to an nth weight matrix in the first weight data set, wherein the third weight matrix to the nth weight matrix are matrixes obtained by rearranging the first weight matrix in rows, and any two row vectors in n row vectors of the first weight matrix to the nth weight matrix which are positioned in the same row are different;

the address processing module is further configured to:

acquiring addresses of weight data in the third weight matrix to the nth weight matrix;

and performing address operation by using the addresses of the weight data of the third to nth weight matrixes and the address of the feature data in the first feature data set.

4. A data processing apparatus according to claim 2 or 3, wherein the target data set comprises a result matrix, the result matrix being a result of a convolution operation of the first feature data set with the first weight data set, the first feature data set being represented as a first feature matrix;

the address processing module is further configured to determine a first target address according to an address of weight data stored in each address calculation array, an address of a first feature data set, a size of the first feature matrix, a filling size, and a weight size, where the weight size is n rows and m columns, the filling size includes a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1)/2, and the vertical filling size is (m-1)/2.

5. The data processing apparatus according to any of claims 1 to 4, further comprising a compression module for: acquiring a second characteristic data set, and removing elements with a median value of 0 in the second characteristic data set to obtain the first characteristic data set;

acquiring a second weight data set, and removing elements with the median value of 0 in the second weight data set to obtain the first weight data set;

determining an address of each feature data in the first feature data set, and determining an address of each weight value in the first weight value data set.

6. A method of data processing, the method comprising:

acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented by n rows and m columns of weight data, the data in the first weight data set come from the same input channel, n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2;

performing a first multiplication operation by using the first weight matrix and the first feature data set;

and determining a target data set according to the operation results of the first multiplication operation and the second multiplication operation.

7. The method of claim 6, further comprising:

acquiring addresses of weight data in the first weight matrix and the second weight matrix;

determining a target data set according to operation results of the first multiplication operation and the second multiplication operation, including:

and determining a target data set according to the operation result of the first multiplication operation, the operation result of the second multiplication operation and the operation result of the address operation.

8. The method of claim 7, further comprising: acquiring third to nth weight matrixes in the first weight data set, wherein the third to nth weight matrixes are matrixes obtained by rearranging the first weight matrix in rows, and any two row vectors in the n row vectors in the same row of the first to nth weight matrixes are different;

9. The method according to claim 7 or 8, wherein the target data set comprises a result matrix, the result matrix being a result of a convolution operation of the first feature data set with the first weight data set, the first feature data set being represented as a first feature matrix,

the method further comprises the following steps:

and determining a first target address according to the address of the weight data stored in each address calculation array, the address of a first feature data set, the size corresponding to the first feature matrix, a filling size and a weight size, wherein the weight size is n rows and m columns, the filling size comprises a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1)/2, and the vertical filling size is (m-1)/2.

10. The method according to any one of claims 5 to 9, further comprising:

acquiring a second characteristic data set, and removing elements with a median value of 0 in the second characteristic data set to obtain the first characteristic data set;

11. A data processing apparatus, characterized in that the data processing apparatus comprises:

a processor and a memory, the memory storing program code, the processor for invoking the program code in the memory to perform a method of data processing according to any one of claims 6 to 10.