WO2020063225A1

WO2020063225A1 - Data processing method and apparatus

Info

Publication number: WO2020063225A1
Application number: PCT/CN2019/102252
Authority: WO
Inventors: 梁晓峣; 景乃锋; 崔晓松; 廖健行
Original assignee: 华为技术有限公司
Priority date: 2018-09-29
Filing date: 2019-08-23
Publication date: 2020-04-02
Also published as: CN110968832A; CN110968832B

Abstract

Provided are a data processing method and a data processing apparatus. The data processing apparatus comprises a data processing module, and the data processing module is used for: acquiring a first weight matrix in a first weight data set, wherein the first weight matrix is represented as n rows and m columns of weight data, and data in the first weight data set are from the same input channel; acquiring a second weight matrix, wherein the second weight matrix is a matrix obtained after rearranging the first weight matrix in rows; using the first weight matrix to perform a multiplication operation with a first feature data set, wherein data in the first feature data set are from the same input channel; using the second weight matrix to perform a multiplication operation with the first feature data set; and according to the operation result of the multiplication operation, determining a target data set. The technical solution can reduce the number of times a storage device is accessed.

Description

Data processing method and device

Technical field

The present application relates to the field of information technology, and more particularly, to a method and a data processing apparatus for processing data.

Background technique

Convolutional neural networks (CNNs) are the most widely used algorithms in deep learning. They are widely used in image classification, speech recognition, video understanding, face detection, and other applications.

The core of convolutional neural network operation is convolution operation. The amount of data that a convolution operation needs to process is usually large. Therefore, the storage and operation resources required for the convolution operation occupy a large amount. Today's processors are increasingly meeting the demands of difficult convolution operations. In addition, with the development of mobile smart devices, mobile smart devices also require convolution operations. But mobile devices have limited computing and storage capabilities. Therefore, how to improve the efficiency of the convolution operation is an urgent problem.

Summary of the Invention

The present application provides a method and a data processing apparatus for processing data, which can reduce the number of times to access a storage device.

In a first aspect, an embodiment of the present application provides a data processing apparatus. The data processing apparatus includes a data processing module, which is configured to obtain a first weight matrix in a first weight data set, where the first A weight matrix is represented as n rows and m columns of weight data. The data in the first weight data set comes from the same input channel, where n is an integer greater than or equal to 2, and m is an integer greater than or equal to 2. An integer; a second weight matrix is obtained according to a first weight matrix, where the second weight matrix is a matrix after the first weight matrix is rearranged in rows; using the first weight matrix and the first feature The data set is subjected to a first multiplication operation; the second weight matrix is used to perform a second multiplication operation with the first characteristic data set; the data processing module further includes a control module, configured to perform the first multiplication operation and the second multiplication operation according to the The result of the operation determines the target data set.

The target data set includes a product result between elements in the first feature data set and elements in the first weight matrix. Based on the product result, a part of the first feature data set and the first weight matrix can be further obtained. Cartesian product and partial convolution result, the partial Cartesian product and partial convolution result can be output from the data processing device, so that the prediction of the convolution result can be realized with a small amount of calculation and a fast calculation rate. For example, assuming that the first weight matrix is a matrix with 3 rows and 3 columns, and the second weight matrix is a first weight matrix that is rearranged by rows, when data of a certain 3 rows and 3 columns in the first feature data set After the input data processing module performs multiplication with the first weight matrix and the second weight matrix, respectively, according to the target data set, the convolution result of the feature data and the first weight matrix can be obtained, as well as 3 rows and 3 columns of adjacent positions. The feature data of the sum of the convolution part of the first weight matrix, and the feature data of adjacent positions often have continuity, so the data processing device can use the convolution result and the convolution part and the convolution in the above target data set. The results are predicted. For example, when the data processing device uses feature data and performs object recognition according to the solution provided in this application, when the convolution result and the convolution part in the obtained target data set do not match the expected range of values, it can directly Eliminates the need for subsequent calculations, saving calculations. After the data processing device implements object recognition according to the technical solution provided in the present application, it can further use the object recognition result to realize other functions, for example, it can use the object recognition result to sort products, monitor targets, and so on.

In the above solution, the data processing device obtains a second weight matrix according to the first weight matrix, where the second weight matrix is a matrix in which the first weight matrix is rearranged in rows, and the first weight matrix and By performing a multiplication operation on the second weight matrix and the first feature data set, the feature data can be reused when obtaining a partial Cartesian product and a partial convolution result of the first feature data set and the first weight matrix, thereby improving the operation efficiency. .

Specifically, in the prior art, when calculating the convolution of the feature matrix and the weight matrix, it is realized by sliding the weight matrix on the feature matrix and performing a multiplication operation of the weight matrix element and the corresponding feature data. Since the feature data in the same feature matrix often needs to be used in multiplication operations after multiple weight matrix sliding, the feature data needs to be loaded multiple times in actual operation. That is to say, multiple reading operations need to be performed on the memory storing the characteristic data, so as to obtain the characteristic data multiple times. Referring to FIG. 1, when calculating a Cartesian product of a feature data set and a weight data set, a multi-step convolution needs to be performed. When performing the first step of convolution, it is necessary to read the memory to obtain the characteristic data a ₂₁ , so as to calculate the product of a ₂₁ and b ₂₁ . When calculating the fourth step of the convolution (the weight matrix is sliding from top to bottom and from left to right), it is necessary to read the memory to obtain the characteristic data a ₂₁ again, and calculate a ₂₁ and b ₁₁ Product. That is, multiple reading operations need to be performed on the memory storing the characteristic data a ₂₁ , which increases the overhead. In the technical solution provided by the present application, by rearranging the weight matrix, it is possible to realize that the characteristic data can be multiplied with more weight matrix elements by once loading the characteristic data. The number of times the feature data is loaded is reduced. In addition, by calculating a product result between the feature data and the elements in the first weight matrix and a product result between the feature data and the elements in the second weight matrix, the acquired feature data is multiplexed. In summary, the above scheme improves the efficiency of the operation.

With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes an address processing module, where the address processing module is configured to obtain the first weight matrix and the second weight matrix The address of the weight data; use the address of the weight data in the first weight matrix and the second weight matrix to perform the address operation with the address in the first feature data set; the data processing module is configured to perform the multiplication operation Determining the target data set includes: a control module configured to determine the target data set according to the operation result of the multiplication operation and the operation result of the address operation.

This solution introduces an address processing module. The address processing module calculates the address of the product of the weight data in the first weight matrix and the second weight matrix and the feature data in the first data set. The feature data and weight can be further obtained. The Cartesian product of the value matrix and the convolution result are used as the target data set, thereby expanding the function of the data processing device.

With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing module is further configured to: obtain a third weight matrix to an n-th weight matrix in the first weight data set, where , The third weight matrix to the nth weight matrix are matrixes in which the first weight matrix is rearranged in rows, and n rows of the first weight matrix to the nth weight matrix are located in the same row. Any two row vectors in the vector are not the same; the address processing module is further configured to: obtain an address of the weight data in the third weight matrix to the n-th weight matrix; use the third weight to the n-th weight An address operation is performed on the address of the weight data of the matrix and the address of the feature data in the first feature data set.

In this solution, the first weight matrix with n rows is rearranged in rows to obtain n weight matrices, and any two of the n row vectors in the n weight matrices in the same row are different. Therefore, after the feature data is multiplied with the n weight matrixes, a Cartesian product of the feature data and the first weight matrix is obtained, thereby increasing the degree of reuse of the feature data and further improving the operation efficiency.

With reference to the first aspect, in a possible implementation manner of the first aspect, the target data set includes a result matrix, and the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set. The first feature data set is represented as a first feature matrix, and the address processing module is further configured to calculate an address of the weight data stored in the array, an address of the first feature data set, and the first feature according to the each address. The size of the matrix, the padding size, and the weighting size determine the first target address, where the weighting size is n rows and m columns, and the padding size includes a horizontal padding size and a vertical padding size, and the horizontal padding size is (n- 1) / 2, and the longitudinal filling size is (m-1) / 2.

This solution further refines the method of obtaining the target data address based on the address of the weight data and the address of the characteristic data, thereby improving the feasibility of the data processing device obtaining the convolution result through the Cartesian product.

With reference to the first aspect, in a possible implementation manner of the first aspect, the data processing apparatus further includes a compression module, configured to: obtain a second feature data set, and obtain an element with a value of 0 in the second feature data set The first feature data set is obtained by removal; the second weight data set is obtained, and elements with a value of 0 in the second weight data set are removed to obtain the first weight data set; The address of each feature data determines the address of each weight in the first weight data set.

This solution sparses feature data and weight data, that is, removes elements with a value of 0 in the feature data set and weight data set, which reduces the amount of convolution operations and thus improves the operation of the data processing device. effectiveness.

In a second aspect, an embodiment of the present application provides a data processing method. The method includes: obtaining a first weight matrix in a first weight data set, where the first weight matrix is represented by n rows and m columns. Weight data, the data in the first weight data set is from the same input channel, where n is an integer greater than or equal to 2 and m is an integer greater than or equal to 2; A two weight matrix, wherein the second weight matrix is a matrix after the first weight matrix is rearranged by rows; using the first weight matrix and the first feature data set to perform a first multiplication operation; using the The second weight matrix performs a second multiplication operation with the first feature data set; and determines a target data set according to the operation results of the first multiplication operation and the second multiplication operation.

With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: obtaining addresses of weight data in the first weight matrix and the second weight matrix; and using the first weight The address of the weight data in the matrix and the second weight matrix and the address in the first feature data set are subjected to an address operation; and the target data set is determined according to the operation results of the first multiplication operation and the second multiplication operation, including: The target data set is determined according to the operation results of the first multiplication operation and the second multiplication operation and the operation results of the address operation.

With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: obtaining a third weight matrix to an n-th weight matrix in the first weight data set, where the third weight matrix The weight matrix to the n-th weight matrix are matrixes in which the first weight matrix is rearranged in rows, and any one of n row vectors of the first weight matrix to the n-th weight matrix located in the same row is in the same row. The two row vectors are not the same; obtain the address of the weight data in the third to n-th weight matrix; use the address of the weight data in the third to n-th weight matrix and the first feature data The address of the feature data in the set is subjected to an address operation.

With reference to the second aspect, in a possible implementation manner of the second aspect, the target data set includes a result matrix, and the result matrix is a result of a convolution operation performed on the first feature data set and the first weight data set. The first feature data set is represented as a first feature matrix, and the method further includes: calculating an address of the weight data stored in the array according to the each address, an address of the first feature data set, and corresponding to the first feature matrix. Size, padding size, and weight size to determine the first target address, where the weighting size is n rows and m columns, and the padding size includes a horizontal padding size and a vertical padding size, and the horizontal padding size is (n-1 ) / 2, and the longitudinal filling size is (m-1) / 2.

With reference to the second aspect, in a possible implementation manner of the second aspect, the method further includes: obtaining a second feature data set, and removing elements with a value of 0 in the second feature data set to obtain the first feature data A set; obtaining a second weight data set, removing elements with a value of 0 in the second weight data set to obtain the first weight data set; determining an address of each feature data in the first feature data set, An address for each weight in the first weight data set is determined.

According to a third aspect, the present application provides a data processing device. The data processing device includes a processor and a memory. The memory stores program code. The processor is configured to call the program code in the memory to execute the program code provided in the second aspect of the application. Data processing methods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a convolution operation process in the prior art.

FIG. 2 is a structural block diagram of a data processing apparatus according to an embodiment of the present application.

FIG. 3 is a schematic diagram of a data calculation array according to an embodiment of the present application.

FIG. 4 is a structural block diagram of a data calculation unit in a data calculation array provided by an embodiment of the present application.

FIG. 5 is a schematic diagram of performing a multiplication operation on a first feature data set according to an embodiment of the present application.

FIG. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set provided in an embodiment of the present application.

FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application.

FIG. 8 is a structural block diagram of an address calculation unit in an address calculation array according to an embodiment of the present application.

FIG. 9 is a schematic diagram of weight data stored in two data calculation arrays according to an embodiment of the present application.

FIG. 10 is a schematic diagram of weight data stored in a data calculation array according to an embodiment of the present application.

FIG. 11 is a schematic diagram of a weight matrix with three filters and thinning processing provided in an embodiment of the present application.

FIG. 12 is a schematic diagram of a weight matrix that has not undergone thinning processing according to an embodiment of the present application.

FIG. 13 is a schematic flowchart of a data processing method according to an embodiment of the present application.

FIG. 14 is a result block diagram of a data processing apparatus provided by an embodiment of the present application.

detailed description

The technical solutions in this application will be described below with reference to the drawings.

In the present application, "at least one" means one or more, and "multiple" means two or more. "And / or" describes the association relationship of related objects, and indicates that there can be three kinds of relationships, for example, A and / or B can represent: the case where A exists alone, A and B exist simultaneously, and B alone exists, where A, B can be singular or plural. The character "/" generally indicates that the related objects are an "or" relationship. "At least one or more of the following" or similar expressions refers to any combination of these items, including any combination of single or plural items. For example, at least one item (a), a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, and c may be single or multiple. In addition, in the embodiments of the present application, the words “first”, “second” and the like do not limit the number and execution order.

It should be noted that, in this application, words such as "exemplary" or "for example" are used as examples, illustrations, or illustrations. Any embodiment or design described as "exemplary" or "for example" in this application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "for example" is intended to present related concepts in a concrete manner.

FIG. 1 shows a feature data set, which includes a total of 5 × 5 feature data. FIG. 1 also shows a weight data set, which includes a total of 3 × 3 weight data. The weight data set can be used as a convolution kernel to perform a convolution operation with the data feature set.

FIG. 1 also shows a schematic diagram of a two-step operation with a step size of 1 during a convolution operation on a feature data set using a weight data set. As shown in FIG. 1, the 3 × 3 weight data in the feature data set needs to be multiplied with the 3 × 3 data in the feature data set, respectively. The result of the multiplication operation is added to obtain the value of one data of the convolution result. Specifically, according to FIG. 1, the convolution result c ₁₁ can be expressed as Formula 1.1, and the convolution result c ₂₂ can be expressed as Formula 1.2:

c ₁₁ = a ₁₁ × b ₁₁ + a ₁₂ × b ₁₂ + a ₁₃ × b ₁₃ + a ₂₁ × b ₂₁ + a ₂₂ × b ₂₂ + a ₂₃ × b ₂₃ + a ₃₁ × b ₃₁ + a ₃₂ × b ₃₂ + a ₃₃ × b ₃₃ , formula 1.1

c ₁₂ = a ₁₂ × b ₁₁ + a ₁₃ × b ₁₂ + a ₁₄ × b ₁₃ + a ₂₂ × b ₂₁ + a ₂₃ × b ₂₂ + a ₂₄ × b ₂₃ + a ₃₂ × b ₃₁ + a ₃₃ × b ₃₂ + a ₃₄ × b ₃₃ , formula 1.2

After the two-step operation shown in FIG. 1 is completed, the feature data set continues to slide to the right, and the next operation is continued until the entire feature data set is traversed.

Suppose the set E ₁ = {a ₁₁ , a ₁₂ , a ₁₃ , a ₂₁ , a ₂₂ , a ₂₃ , a ₃₁ , a ₃₂ , a ₃₃ }, and the set F ₁ = {b ₁₁ , b ₁₂ , b ₁₃ , b ₂₁ , B ₂₂ , b ₂₃ , b ₃₁ , b ₃₂ , b ₃₃ }. Performing a Cartesian product operation on the sets E ₁ and F ₁ can obtain the set G ₁ , and the set G ₁ can include multiple multiplication results as shown in Table 1.

Table 1

a ₁₁×b ₁₁ a ₁₁ × b ₁₁	a ₁₁×b ₁₂ a ₁₁ × b ₁₂	a ₁₁×b ₁₃ a ₁₁ × b ₁₃	a ₁₁×b ₂₁ a ₁₁ × b ₂₁	a ₁₁×b ₂₂ a ₁₁ × b ₂₂	a ₁₁×b ₂₃ a ₁₁ × b ₂₃	a ₁₁×b ₃₁ a ₁₁ × b ₃₁	a ₁₁×b ₃₂ a ₁₁ × b ₃₂	a ₁₁×b ₃₃ a ₁₁ × b ₃₃
a ₂₁×b ₁₁ a ₂₁ × b ₁₁	a ₂₁×b ₁₂ a ₂₁ × b ₁₂	a ₂₁×b ₁₃ a ₂₁ × b ₁₃	a ₂₁×b ₂₁ a ₂₁ × b ₂₁	a ₂₁×b ₂₂ a ₂₁ × b ₂₂	a ₂₁×b ₂₃ a ₂₁ × b ₂₃	a ₂₁×b ₃₁ a ₂₁ × b ₃₁	a ₂₁×b ₃₂ a ₂₁ × b ₃₂	a ₂₁×b ₃₃ a ₂₁ × b ₃₃
a ₃₁×b ₁₁ a ₃₁ × b ₁₁	a ₃₁×b ₁₂ a ₃₁ × b ₁₂	a ₃₁×b ₁₃ a ₃₁ × b ₁₃	a ₃₁×b ₂₁ a ₃₁ × b ₂₁	a ₃₁×b ₂₂ a ₃₁ × b ₂₂	a ₃₁×b ₂₃ a ₃₁ × b ₂₃	a ₃₁×b ₃₁ a ₃₁ × b ₃₁	a ₃₁×b ₃₂ a ₃₁ × b ₃₂	a ₃₁×b ₃₃ a ₃₁ × b ₃₃
a ₁₂×b ₁₁ a ₁₂ × b ₁₁	a ₁₂×b ₁₂ a ₁₂ × b ₁₂	a ₁₂×b ₁₃ a ₁₂ × b ₁₃	a ₁₂×b ₂₁ a ₁₂ × b ₂₁	a ₁₂×b ₂₂ a ₁₂ × b ₂₂	a ₁₂×b ₂₃ a ₁₂ × b ₂₃	a ₁₂×b ₃₁ a ₁₂ × b ₃₁	a ₁₂×b ₃₂ a ₁₂ × b ₃₂	a ₁₂×b ₃₃ a ₁₂ × b ₃₃
a ₂₂×b ₁₁ a ₂₂ × b ₁₁	a ₂₂×b ₁₂ a ₂₂ × b ₁₂	a ₂₂×b ₁₃ a ₂₂ × b ₁₃	a ₂₂×b ₂₁ a ₂₂ × b ₂₁	a ₂₂×b ₂₂ a ₂₂ × b ₂₂	a ₂₂×b ₂₃ a ₂₂ × b ₂₃	a ₂₂×b ₃₁ a ₂₂ × b ₃₁	a ₂₂×b ₃₂ a ₂₂ × b ₃₂	a ₂₂×b ₃₃ a ₂₂ × b ₃₃
a ₃₂×b ₁₁ a ₃₂ × b ₁₁	a ₃₂×b ₁₂ a ₃₂ × b ₁₂	a ₃₂×b ₁₃ a ₃₂ × b ₁₃	a ₃₂×b ₂₁ a ₃₂ × b ₂₁	a ₃₂×b ₂₂ a ₃₂ × b ₂₂	a ₃₂×b ₂₃ a ₃₂ × b ₂₃	a ₃₂×b ₃₁ a ₃₂ × b ₃₁	a ₃₂×b ₃₂ a ₃₂ × b ₃₂	a ₃₂×b ₃₃ a ₃₂ × b ₃₃
a ₁₃×b ₁₁ a ₁₃ × b ₁₁	a ₁₃×b ₁₂ a ₁₃ × b ₁₂	a ₁₃×b ₁₃ a ₁₃ × b ₁₃	a ₁₃×b ₂₁ a ₁₃ × b ₂₁	a ₁₃×b ₂₂ a ₁₃ × b ₂₂	a ₁₃×b ₂₃ a ₁₃ × b ₂₃	a ₁₃×b ₃₁ a ₁₃ × b ₃₁	a ₁₃×b ₃₂ a ₁₃ × b ₃₂	a ₁₃×b ₃₃ a ₁₃ × b ₃₃
a ₂₃×b ₁₁ a ₂₃ × b ₁₁	a ₂₃×b ₁₂ a ₂₃ × b ₁₂	a ₂₃×b ₁₃ a ₂₃ × b ₁₃	a ₂₃×b ₂₁ a ₂₃ × b ₂₁	a ₂₃×b ₂₂ a ₂₃ × b ₂₂	a ₂₃×b ₂₃ a ₂₃ × b ₂₃	a ₂₃×b ₃₁ a ₂₃ × b ₃₁	a ₂₃×b ₃₂ a ₂₃ × b ₃₂	a ₂₃×b ₃₃ a ₂₃ × b ₃₃
a ₃₃×b ₁₁ a ₃₃ × b ₁₁	a ₃₃×b ₁₂ a ₃₃ × b ₁₂	a ₃₃×b ₁₃ a ₃₃ × b ₁₃	a ₃₃×b ₂₁ a ₃₃ × b ₂₁	a ₃₃×b ₂₂ a ₃₃ × b ₂₂	a ₃₃×b ₂₃ a ₃₃ × b ₂₃	a ₃₃×b ₃₁ a ₃₃ × b ₃₁	a ₃₃×b ₃₂ a ₃₃ × b ₃₂	a ₃₃×b ₃₃ a ₃₃ × b ₃₃

As shown in Table 1, the Cartesian product of the set E ₁ and the set F ₁ includes all the multiplication results needed to calculate c ₁₁ : a ₁₁ × b ₁₁ , a ₁₂ × b ₁₂ , a ₁₃ × b ₁₃ , a ₂₁ × b ₂₁ , a ₂₂ × b ₂₂ , a ₂₃ × b ₂₃ , a ₃₁ × b ₃₁ , a ₃₂ × b ₃₂ , and a ₃₃ × b ₃₃ . The results of the Cartesian product of sets E and F also include some of the multiplication results needed to calculate c ₁₂ : a ₁₂ × b ₁₁ , a ₁₃ × b ₁₂ , a ₂₂ × b ₂₁ , a ₂₃ × b ₂₂ , a ₃₂ × b ₃₁ , a ₃₃ × b ₃₂ .

Suppose the set E ₂ = {a ₁₂ , a ₁₃ , a ₁₄ , a ₂₂ , a ₂₃ , a ₂₄ , a ₃₂ , a ₃₃ , a ₃₄ }. Performing a Cartesian product operation on the sets E ₂ and F ₁ can obtain the set G ₂ , and the set G ₂ can include multiple multiplication results as shown in Table 2.

Table 2

a ₁₂×b ₁₁ a ₁₂ × b ₁₁	a ₁₂×b ₁₂ a ₁₂ × b ₁₂	a ₁₂×b ₁₃ a ₁₂ × b ₁₃	a ₁₂×b ₂₁ a ₁₂ × b ₂₁	a ₁₂×b ₂₂ a ₁₂ × b ₂₂	a ₁₂×b ₂₃ a ₁₂ × b ₂₃	a ₁₂×b ₃₁ a ₁₂ × b ₃₁	a ₁₂×b ₃₂ a ₁₂ × b ₃₂	a ₁₂×b ₃₃ a ₁₂ × b ₃₃
a ₂₂×b ₁₁ a ₂₂ × b ₁₁	a ₂₂×b ₁₂ a ₂₂ × b ₁₂	a ₂₂×b ₁₃ a ₂₂ × b ₁₃	a ₂₂×b ₂₁ a ₂₂ × b ₂₁	a ₂₂×b ₂₂ a ₂₂ × b ₂₂	a ₂₂×b ₂₃ a ₂₂ × b ₂₃	a ₂₂×b ₃₁ a ₂₂ × b ₃₁	a ₂₂×b ₃₂ a ₂₂ × b ₃₂	a ₂₂×b ₃₃ a ₂₂ × b ₃₃
a ₃₂×b ₁₁ a ₃₂ × b ₁₁	a ₃₂×b ₁₂ a ₃₂ × b ₁₂	a ₃₂×b ₁₃ a ₃₂ × b ₁₃	a ₃₂×b ₂₁ a ₃₂ × b ₂₁	a ₃₂×b ₂₂ a ₃₂ × b ₂₂	a ₃₂×b ₂₃ a ₃₂ × b ₂₃	a ₃₂×b ₃₁ a ₃₂ × b ₃₁	a ₃₂×b ₃₂ a ₃₂ × b ₃₂	a ₃₂×b ₃₃ a ₃₂ × b ₃₃

a ₁₃×b ₁₁ a ₁₃ × b ₁₁	a ₁₃×b ₁₂ a ₁₃ × b ₁₂	a ₁₃×b ₁₃ a ₁₃ × b ₁₃	a ₁₃×b ₂₁ a ₁₃ × b ₂₁	a ₁₃×b ₂₂ a ₁₃ × b ₂₂	a ₁₃×b ₂₃ a ₁₃ × b ₂₃	a ₁₃×b ₃₁ a ₁₃ × b ₃₁	a ₁₃×b ₃₂ a ₁₃ × b ₃₂	a ₁₃×b ₃₃ a ₁₃ × b ₃₃
a ₂₃×b ₁₁ a ₂₃ × b ₁₁	a ₂₃×b ₁₂ a ₂₃ × b ₁₂	a ₂₃×b ₁₃ a ₂₃ × b ₁₃	a ₂₃×b ₂₁ a ₂₃ × b ₂₁	a ₂₃×b ₂₂ a ₂₃ × b ₂₂	a ₂₃×b ₂₃ a ₂₃ × b ₂₃	a ₂₃×b ₃₁ a ₂₃ × b ₃₁	a ₂₃×b ₃₂ a ₂₃ × b ₃₂	a ₂₃×b ₃₃ a ₂₃ × b ₃₃
a ₃₃×b ₁₁ a ₃₃ × b ₁₁	a ₃₃×b ₁₂ a ₃₃ × b ₁₂	a ₃₃×b ₁₃ a ₃₃ × b ₁₃	a ₃₃×b ₂₁ a ₃₃ × b ₂₁	a ₃₃×b ₂₂ a ₃₃ × b ₂₂	a ₃₃×b ₂₃ a ₃₃ × b ₂₃	a ₃₃×b ₃₁ a ₃₃ × b ₃₁	a ₃₃×b ₃₂ a ₃₃ × b ₃₂	a ₃₃×b ₃₃ a ₃₃ × b ₃₃
a ₁₄×b ₁₁ a ₁₄ × b ₁₁	a ₁₄×b ₁₂ a ₁₄ × b ₁₂	a ₁₄×b ₁₃ a ₁₄ × b ₁₃	a ₁₄×b ₂₁ a ₁₄ × b ₂₁	a ₁₄×b ₂₂ a ₁₄ × b ₂₂	a ₁₄×b ₂₃ a ₁₄ × b ₂₃	a ₁₄×b ₃₁ a ₁₄ × b ₃₁	a ₁₄×b ₃₂ a ₁₄ × b ₃₂	a ₁₄×b ₃₃ a ₁₄ × b ₃₃
a ₂₄×b ₁₁ a ₂₄ × b ₁₁	a ₂₄×b ₁₂ a ₂₄ × b ₁₂	a ₂₄×b ₁₃ a ₂₄ × b ₁₃	a ₂₄×b ₂₁ a ₂₄ × b ₂₁	a ₂₄×b ₂₂ a ₂₄ × b ₂₂	a ₂₄×b ₂₃ a ₂₄ × b ₂₃	a ₂₄×b ₃₁ a ₂₄ × b ₃₁	a ₂₄×b ₃₂ a ₂₄ × b ₃₂	a ₂₄×b ₃₃ a ₂₄ × b ₃₃
a ₃₄×b ₁₁ a ₃₄ × b ₁₁	a ₃₄×b ₁₂ a ₃₄ × b ₁₂	a ₃₄×b ₁₃ a ₃₄ × b ₁₃	a ₃₄×b ₂₁ a ₃₄ × b ₂₁	a ₃₄×b ₂₂ a ₃₄ × b ₂₂	a ₃₄×b ₂₃ a ₃₄ × b ₂₃	a ₃₄×b ₃₁ a ₃₄ × b ₃₁	a ₃₄×b ₃₂ a ₃₄ × b ₃₂	a ₃₄×b ₃₃ a ₃₄ × b ₃₃

As shown in Table 2, the results of the Cartesian product of the sets E ₂ and F ₁ include some of the multiplication results needed to calculate c ₁₂ : a ₁₄ × b ₁₃ , a ₂₄ × b ₂₃ , and a ₃₄ × b ₃₃ .

The multiplication results shown in Tables 1 and 2 that are not needed to calculate c ₁₁ and c ₁₂ can also be applied to subsequent convolution operations.

According to the analysis of the convolution operation and the Cartesian product operation process, it can be seen that the convolution operation can be decomposed into a Cartesian product operation. The result obtained by one Cartesian product operation can be used for multi-step convolution operation. The result of the one-step convolution operation may be the addition of one or more Cartesian product operations.

FIG. 2 is a structural block diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus 200 shown in FIG. 2 includes a storage module 210, a data processing module 220, an address processing module 230, and a control module 240.

The storage module 210 is configured to store a first feature data set, an address of each feature data in the first feature data set, a first weight set, and an address of each weight in the first weight set.

The data processing module 220 includes N data calculation arrays. Each of the N data calculation arrays includes n × m data calculation units, where N is a positive integer greater than or equal to 2, n is a positive integer greater than or equal to 2, and m is a number greater than or equal to 2. Positive integer.

The address processing module 230 includes N address calculation arrays. Each of the N address calculation arrays includes n × m address calculation units.

Each data calculation array is configured to obtain n × m weight data from the storage module 210 and save the obtained weight data to the n × m data calculation units of each data calculation array.

Each address calculation array is configured to obtain an address of n × m weight data from the storage module 210 and save the address of the obtained weight data to the n × m address calculation unit of each address calculation array. The address of the weight data stored in the N address calculation arrays is the address of the weight data stored in the N data calculation arrays. In other words, the N address calculation arrays have one-to-one correspondence with the N data calculation arrays, and each address calculation array in the N address calculation arrays holds an address of the weight data stored in the corresponding data calculation array. For example, assuming that the weight data stored by one of the N data calculation arrays is b ₁₁ , b ₁₂ , b ₁₃ , b ₂₁ , b ₂₂ , b ₂₃ , b ₃₁ , b ₃₂ , b ₃₃ , then The addresses stored in the N address calculation arrays corresponding to the data calculation array are the addresses of b ₁₁ , b ₁₂ , b ₁₃ , b ₂₁ , b ₂₂ , and b ₂₃ . , B ₃₁ , b ₃₂ , and b ₃₃ .

The N data calculation arrays use the weight data stored by the N data calculation arrays to perform a multiplication operation on the first feature data set. During the operation of the first feature data set, the weight data stored in the N data calculation arrays are unchanged.

Similarly, the N address calculation arrays use the addresses of the weight data stored by the N address calculation arrays to perform address calculations on the addresses of the first feature data set, where addresses are performed on the addresses of the first feature data set. During the operation, the addresses of the weight data stored in the N address calculation arrays remain unchanged.

The control module 240 is configured to determine a target data set according to the N data calculation array according to the operation result of the multiplication operation and the operation result of the address operation.

Therefore, the N data calculation arrays can determine the operation result of the convolution operation on the first feature data set based on the weight data stored by the N data calculation arrays according to the multiplication operation result and the operation result of the address operation. In other words, in some embodiments, the target data set may be a data set obtained by performing a convolution operation on the first feature data set with weight data stored by the N data calculation arrays.

The operation of the first feature data set shown in FIG. 1 is described below using the saved weight data for the N data calculation arrays with reference to FIG. 1, FIG. 3 to FIG. 5.

FIG. 3 is a schematic diagram of a data calculation array according to an embodiment of the present application. The data calculation array 300 shown in FIG. 3 includes a total of 9 data calculation units, which are a data calculation unit 311, a data calculation unit 312, a data calculation unit 313, a data calculation unit 321, a data calculation unit 322, a data calculation unit 323, The data calculation unit 331, the data calculation unit 332, and the data calculation unit 333.

It can be understood that, in addition to the data calculation unit shown in FIG. 3, the data calculation array may further include an input and output unit (not shown in the figure). The input-output unit is used to acquire data that needs to be input to the data calculation array 300. The input / output unit is further configured to input data to be output by the data calculation array 300 to a corresponding unit and / or module. For example, the input / output unit may obtain weight data and feature data from the storage module, and send the obtained weight data and feature data to a corresponding data calculation unit. The input-output unit is further configured to obtain target data calculated by each data calculation unit and send the target data to a storage module.

Optionally, in some embodiments, the data transfer between the computing units in the data computing array is unidirectional. Taking FIG. 3 as an example, the arrows used to connect the data calculation units in FIG. 3 may indicate a unidirectional transmission direction of data. Take the data calculation unit 311, the data calculation unit 312, and the data calculation unit 313 as examples. The data calculation unit 311 can send data (for example, characteristic data) to the data calculation unit 312, and the data calculation unit 312 cannot send data to the data calculation unit 311. The data calculation unit 312 can send data to the data calculation unit 313, and the data calculation unit 313 cannot send data to the data calculation unit 312.

FIG. 4 is a structural block diagram of a data calculation unit in a data calculation array provided by an embodiment of the present application. As shown in FIG. 4, the data calculation unit 400 may include a storage subunit 401 and a data calculation subunit 402. It can be understood that the data calculation unit 400 may further include an input-output sub-unit. The input-output subunit is configured to obtain data required by the data calculation unit, and output data required to be output by the data calculation unit.

Specifically, the data calculation array 300 shown in FIG. 3 may obtain 3 × 3 weight data in the weight data set shown in FIG. 1, and save the 3 × 3 weight data to the data calculation respectively. 3 × 3 data calculation units of the array 300.

Specifically, the weight data b ₁₁ may be stored in a storage sub-unit of the data calculation unit 311, the weight data b ₁₂ may be stored in a storage sub-unit of the data calculation unit 312, and the weight data b ₁₃ may be stored in the data calculation unit In the storage subunit of 313, and so on. In this way, the data calculation array 300 stores 3 × 3 weight data.

After saving 3 × 3 weight data, the data calculation array 300 may slide the first feature data set unidirectionally, and use the weight data saved by the data calculation array 300 to perform a multiplication operation on the first feature data set. During the multiplication operation of the first feature data set by the data calculation array 300, the weight data stored in the data calculation array 300 does not change. In other words, during the multiplication of the first feature data by the data calculation array 300, the data calculation unit in the data calculation array 300 will not delete the saved weight data. Correspondingly, the data calculation unit will not read and save the new weight data from the storage module.

For a manner of unidirectional sliding of the first feature data set, refer to FIG. 5. FIG. 5 is a schematic diagram of a process of multiplying the first feature data set according to an embodiment of the present application. As shown in FIG. 5, the first feature data set may be turned 180 degrees first. As shown in FIG. 5, the first column of the first feature data set becomes the fifth column after being inverted, the second column becomes the fourth column after being inverted, and so on. It should be noted that, as shown in FIG. 5, the first feature data set is first flipped 180 and then swiped to the right for the convenience of describing the feature data a ₁₁ , a ₂₁ , a ₃₁ , a ₁₂ , a ₂₂ , a ₃₂ , a The calculation process of ₁₃ , a ₂₃ , a ₃₃ and weight data b ₁₁ , b ₂₁ , b ₃₁ , b ₁₂ , b ₂₂ , b ₃₂ , b ₁₃ , b ₂₃ and b ₃₃ . In actual implementation, the first feature data set can be directly multiplied with the weight data stored in the data calculation array 300 by sliding rightward. The calculation result of the first feature data set that is directly multiplied by sliding to the right is the same as the value of the data of the calculation result of the first feature data set that is first flipped 180 degrees and then slided to the right in the manner shown in Figure 5. Yes, only the order of the final data is different.

The flipped first feature data set slides to the right one-way, and performs multiplication operations with the weight data stored in the data calculation array 300, respectively. Specifically, at the first operation, the feature data a ₁₁ , a _21, and a ₃₁ are multiplied with the weight data b ₁₁ , b _21, and b _{31, respectively} . After the first operation, the first feature data set after the flip is swiped to the right to perform the second operation. In the second operation, the characteristic data a ₁₁ , a ₂₁ and a ₃₁ are respectively multiplied by weight data b ₁₂ , b ₂₂ and b ₃₂ , and the characteristic data a ₁₂ , a ₂₂ and a ₃₂ are respectively weighted by b ₁₁ , B ₂₁ and b _{31 are} multiplied. After the second operation, the inverted feature data set continues to slide to the right for the third operation, and so on. In the above embodiment, the step size of each sliding of the first feature data set is 1. Of course, in some other embodiments, the step size of each sliding of the first feature data set may also be a positive integer greater than 1.

In the first operation example, the data calculating unit 311 may acquire characteristic data from a ₁₁ wherein a first set of data stored in the storage module 210, and the acquired characteristic data is stored in a ₁₁ memory sub-data calculating unit 311 Unit. In this case, the storage subunit of the data calculation unit 311 holds the weighted value data b ₁₁ and the characteristic data a ₁₁ . The data calculation sub-unit in the data calculation unit 311 multiplies the weight data b ₁₁ and the feature data a ₁₁ stored in the storage sub-unit to obtain intermediate data k _(11,11) . The multiplication operation of the weight data b ₁₁ and the characteristic data a ₁₁ may be implemented by a multiplier in the data calculation subunit.

The data calculation unit 311 may also obtain the cache data r _(11,11) stored in the first target address according to the target address determined by the address calculation unit corresponding to the data calculation unit 311. Specifically, the address calculation unit corresponding to the data calculation unit 311 may determine the first target address according to the address of the characteristic data a _{11 and} the address of the weight data b ₁₁ . The data calculation unit 311 may obtain the current cache data r _(11,11) stored in the first target address. The manner in which the address calculation unit determines the first target address will be described later. The data calculation subunit adds the intermediate data k _(11,11) and the current buffer data r _{(11,11) to} obtain target data d _(11,11) . The addition operation of the intermediate data k _(11,11) and the current buffered data r _(11,11) can be implemented by an adder in a data calculation subunit. The target data d _(11,11) can be stored in the first target address. In other words, the current cache data r _(11,11) stored in the first target address is updated to the target data d _(11,11) .

Similarly, the data calculation unit 321 can determine the product of the weight data b ₂₁ and the feature data a ₂₁ (hereinafter referred to as the intermediate data k _{(21, 21)} ) held by the data calculation unit 321 in the same manner. The target address determined by the address calculation unit corresponding to the data calculation unit 321 is also the first target address. The data calculation unit 321 adds the intermediate data k _{(21, 21)} and the current cache data (the current cache data has been updated to the target data d _{(11, 11)} ) stored at the first target address to obtain the target data. d _{(21, 21)} . The target data d _{(21, 21)} can be stored in the first target address. In other words, the current cache data d _{(11, 11)} stored in the first target address is updated to the target data d _(21, 21).

The data calculation unit 331 can determine the product of the weight data b ₃₁ and the feature data a ₃₁ (hereinafter referred to as the intermediate data k _{(31, 31)} ) held by the data calculation unit 331 in the same manner. The target address determined by the address calculation unit corresponding to the data calculation unit 331 is also the first target address. The data calculation unit 331 adds the intermediate data k _{(31, 31)} and the current cache data (the current cache data has been updated to the target data d _{(21, 21)} ) at the first target address to obtain the target data. d _{(31, 31)} . The target data d _{(31, 31)} can be stored in the first target address. In other words, the current cache data d _{(21, 21)} stored in the first target address is updated to the target data d _{(31, 31)} .

After the first operation, the target data stored in the first target address is a ₁₁ × b ₁₁ + a ₂₁ × b ₂₁ + a ₃₁ × b ₃₁ .

In a similar method, the data calculation array 300 may continue to perform operations on the first feature data set using the weight data saved by the data calculation unit in the data calculation array 300.

After the third operation, the data stored in the first destination address is a ₁₁ × b ₁₁ + a ₂₁ × b ₂₁ + a ₃₁ × b ₃₁ + a ₁₂ × b ₁₂ + a ₂₂ × b ₂₂ + a ₃₂ × b ₃₂ . That is, during the third operation, the target address determined by the address calculation unit corresponding to the data calculation unit 312, the data calculation unit 322, and the data calculation unit 332 is also the first target address. Therefore, after the third budget, the target data stored in the first target address is a ₁₂ × b ₁₂ determined by the data stored in the first target address and the data calculation unit 312 after the first calculation, and the data calculation unit 322 The sum of the determined a ₂₂ × b ₂₂ and the a ₃₂ × b ₃₂ determined by the data calculation unit 332. After the fifth operation, the data stored in the first destination address is a ₁₁ × b ₁₁ + a ₂₁ × b ₂₁ + a ₃₁ × b ₃₁ + a ₁₂ × b ₁₂ + a ₂₂ × b ₂₂ + a ₃₂ × b ₃₂ + a ₁₃ × b ₁₃ + a ₂₃ × b ₂₃ + a ₃₃ × b ₃₃ . That is, during the fifth operation, the target address determined by the address calculation unit corresponding to the data calculation unit 313, the data calculation unit 323, and the data calculation unit 333 is also the first target address. Therefore, after the fifth operation, the target data stored in the first target address is a ₁₃ × b ₁₃ determined by the data stored in the first target address and the data calculation unit 313 after the third operation, and the data calculation unit 323 The determined a ₂₃ × b ₂₃ and the sum of a ₃₃ × b ₃₃ determined by the data calculation unit 333.

In this way, after five calculations, the data stored in the first target address is the convolution result c ₁₁ as shown in Formula 1.1. Similarly, the multiplication operation and the address operation result can be used to complete the convolution operation of the first feature data set and the weight data set.

The following describes operations performed on the addresses of the first feature data set shown in FIG. 1 by using the saved weight data addresses for the N address calculation arrays with reference to FIG. 1, FIG. 3, and FIG. 8.

FIG. 6 is a schematic diagram of an address of a first feature data set and an address of a weight data set provided in an embodiment of the present application. The address of the first feature data set shown in FIG. 6 is the address of the first feature data set shown in FIG. 1. Specifically, the address Add _a11 is the address of the characteristic data a ₁₁ , the address Add _a12 is the address of the characteristic data a ₁₂ , and so on. The address of the weight data set shown in FIG. 6 is the address of the weight data set shown in FIG. 1. Specifically, the address Add _b11 is the address of the weight data b ₁₁ , the address Add _b12 is the address of the weight data b ₁₂ , and so on.

FIG. 7 is a schematic diagram of an address calculation array according to an embodiment of the present application. The address calculation array 700 shown in FIG. 7 includes nine data calculation units, which are respectively an address calculation unit 711, an address calculation unit 712, an address calculation unit 713, an address calculation unit 721, an address calculation unit 722, an address calculation unit 723, The address calculation unit 731, the address calculation unit 732, and the address calculation unit 733.

It can be understood that, in addition to the address calculation unit shown in FIG. 7, the address calculation array may further include an input-output unit (not shown in the figure). The I / O unit is used to obtain data that needs to be input to the address calculation array 700. The input / output unit is further configured to input data to be output by the address calculation array 700 to a corresponding unit and / or module. For example, the input / output unit may obtain the address of the weight data and the address of the characteristic data from the storage module, and send the obtained address of the weight data and the address of the characteristic data to the corresponding address calculation unit. The input-output unit is further configured to obtain a target address calculated by each address calculation unit, and send the target address to a corresponding data calculation unit.

The N address calculation arrays are in one-to-one correspondence with the N data calculation arrays. The one-to-one correspondence here means that one data calculation array in the N data calculation arrays corresponds to one address calculation array in the N address calculation arrays, and different data calculation arrays have different address calculation arrays. For example, suppose N is equal to 3. The three data calculation arrays are data calculation array 1, data calculation array 2, and data calculation array 3. The three address calculation arrays are address calculation array 1, address calculation array 2, and address calculation array 3. . The data calculation array 1 corresponds to the address calculation array 1, the data calculation array 2 corresponds to the address calculation array 2, and the data calculation array 3 corresponds to the address calculation array 3. The address calculation array corresponding to the data calculation array is used to calculate a target address of each target data in the data calculation array. Further, the data calculation unit in the data calculation array and the address calculation unit in the address calculation array also correspond one-to-one. Assuming that the data calculation array shown in FIG. 3 corresponds to the address calculation array shown in FIG. 7, the data calculation unit 311 corresponds to the address calculation unit 711, the data calculation unit 312 corresponds to the address calculation unit 712, and the data calculation unit 313 corresponds to the address The calculation unit 731, and so on. The address calculation unit is used to determine the address of the target data of the corresponding data calculation unit. Specifically, as described above, the first target address where the cache data r _{(11, 11)} obtained by the data calculation unit 311 is obtained after the address calculation unit 711 performs an address operation.

FIG. 8 is a structural block diagram of an address calculation unit in an address calculation array according to an embodiment of the present application. As shown in FIG. 8, the address calculation unit 800 may include a storage subunit 801 and an address calculation subunit 802. It can be understood that the address calculation unit 800 may further include an input-output sub-unit. The input-output sub-unit is configured to obtain data required by the address calculation unit and output data required by the address calculation unit.

Specifically, the address calculation array 700 shown in FIG. 7 may obtain an address of 3 × 3 weight data in the address of the weight data set shown in FIG. 6, and compare the 3 × 3 weight data with The addresses are respectively stored in 3 × 3 data calculation units of the address calculation array 700.

Specifically, the address Add _b11 may be stored in the storage subunit of the address calculation unit 711, the address Add _b12 may be stored in the storage subunit of the address calculation unit 712, and the address Add _b13 may be stored in the storage subunit of the address calculation unit 713. And so on. In this way, the address calculation array 700 stores addresses of 3 × 3 weight data.

After the addresses of the 3 × 3 weight data are saved, the address calculation array 700 may unidirectionally slide the addresses of the first feature data set, and use the addresses of the weight data stored by the address calculation array 700 to the first feature data. The address of the set performs the address operation. During the address calculation process performed by the address calculation array 700 on the address of the first feature data set, the address of the weight data stored in the address calculation array 700 does not change. In other words, during the address calculation process of the address of the first characteristic data by the address calculation array 700, the address calculation unit in the address calculation array 700 will not delete the saved address of the weight data. Correspondingly, the address calculation unit will not read and save the address of the new weight data from the storage module.

The process of performing the address calculation by sliding the address of the first feature data set to the right in one direction is similar to the process of sliding the right of the first feature data set to the right to perform the multiplication operation, and it is unnecessary to repeat it here.

The following describes how the address calculation unit performs address operations.

For convenience of description, the address of the weight obtained by the address calculation unit 800 is referred to as the address of the first weight, and the address of the feature data obtained by the address calculation unit 800 is referred to as the address of the first feature data. The address obtained after the unit 800 performs an address operation is called a first target address.

The input / output sub-unit in the address calculation unit 800 can obtain the following information in addition to the address of the first feature data and the address of the first weight data from the storage module: the input corresponding to the first feature data set The size of the data, the filling size, and the weight size. The weight size is the size of the address calculation array to which the address calculation unit 800 belongs, and the filling size is a preset size. In this example, the weight size is 3 × 3. The size of the input data corresponding to the first feature data set, the padding size, and the weight size may also be stored in a storage subunit 801 of the address calculation unit 800. The address calculation subunit 801 may determine the first target address according to the first weight data address, the first feature data address, the size of the input data corresponding to the first feature data set, the filling size, and the weight size.

Assuming that the size of the input picture is a row and b column, and the size of the convolution kernel is n rows and m column, the size of the output picture after convolution is (a-n + 1) × (b-m + 1). There are two problems: 1. The size of the output picture is reduced after each convolution operation; 2. The corners and edges of the original picture are used less in the output, and the output picture loses a lot of information about the position of the edges.

In order to solve these problems, before the convolution operation is performed, the original picture may be padded on the boundary to increase the size of the matrix. 0 is usually used as the padding value.

Set the number of horizontal and vertical extended pixels to be p and q respectively, then the size of the original picture after filling is (a + 2p) × (b + 2q), and the size of the convolution kernel remains n rows and m columns, then the output picture If the size is constant, the output picture size is (a + 2p-n + 1) × (b + 2q-m + 1). The number of pixels p and q expanded in each direction is the fill size. It can be concluded that the horizontal filling size p is equal to (n-1) / 2, and the vertical filling size q is equal to (m-1) / 2.

The address calculation subunit 801 may specifically determine the target address according to the following formula:

_{result_cord = (input_cord / input_size x -w_cord} / kernel_size x + padding_size x) × input_size y + (input_cord% input_size y -w_cord% kernel_size y + padding_size y), ( Equation 1.3)

Among them,% represents the margin, result_cord represents the target address, input_cord represents the address of the feature data, input_size _x represents the abscissa of the size of the input data corresponding to the first feature data set, and input_size _y represents the first feature corresponding to the first feature The ordinate of the size of the input data of the data set, w_cord represents the address of the weight data, kernel_size _x represents the abscissa of the weight size, kernel_size _y represents the ordinate of the weight size, padding_size _x represents the horizontal padding size, and padding_size _y represents the vertical fill size.

The address of the feature data in Formula 1.3 and the address of the weight data are absolute addresses. The absolute address refers to the absolute position of the feature data / weight data in the corresponding feature data set / weight data set. Assume that the feature data set includes X feature data, and the absolute address of the x-th feature data in the X feature data is x-1, where X is a positive integer greater than 1, and x is greater than 1 and less than or equal to X. Positive integer. For example, the feature data set includes: 5, 0, 0, 32, 0, 0, 0, 0, 23, and the absolute addresses of the

feature data

5, 32, and 23 are: 0, 3, and 8, respectively. The absolute address listed above refers to the position of the feature data in the feature data, and can be converted into an address composed of the abscissa and the ordinate according to the specifications of the feature matrix. Similarly, the absolute address of the weight data can also be converted into an address composed of the abscissa and the ordinate.

Optionally, in some embodiments, the address calculation subunit 801 may further determine the target address according to the following formula:

result_cord = ((base_input + input_cord) / input_size x - (base_w + w_cord) / kernel_size x + padding_size x) × input_size y + ((base_cord + input_cord)% input_size y - (base_w + w_cord)% kernel__size y + padding__size y) , (Equation 1.4)

Among them,% represents the margin, result_cord represents the target address, input_cord represents the address of the feature data, input_size _x represents the abscissa of the size of the input data corresponding to the first feature data set, and input_size _y represents the first feature corresponding to the first feature The ordinate of the size of the input data of the data set, w_cord represents the address of the weight data, kernel_size _x represents the abscissa of the weight size, kernel_size _y represents the ordinate of the weight size, padding_size _x represents the horizontal padding size, and padding_size _y represents the vertical padding size, base_input represents the base address of the address of the feature data, and base_w represents the base address of the address of the weight data.

The address of the characteristic data and the address of the weight data in Equation 1.4 are relative addresses. The relative address refers to the position of the feature data / weight data in the corresponding feature data set / weight data set relative to the address of the first feature data / weight data. Assuming that the address of the first feature data combined with the feature data is Y, the address of the y-th feature data in the feature data set is Y + y-1, where Y and y are both positive integers greater than or equal to 1.

Optionally, in some embodiments, after determining the target address, the address calculation unit may directly send the target address to the corresponding data calculation unit. The data calculation unit may determine the cached data in the target address according to the target address.

Optionally, in other embodiments, after the address calculation unit determines the target address, the cache data in the target address may be determined, and then the cache data and the target address are sent to the corresponding data calculation unit together.

The foregoing describes how a data calculation array performs multiplication operations and an address calculation array performs address operations.

As described above, the data processing apparatus may include two or more data calculation arrays and corresponding address calculation arrays.

The weight data set shown in FIG. 1 includes only 3 × 3 weight data, and only one weight data set is used for the convolution operation on the feature data set. Optionally, in other embodiments, the weight data set used to perform the convolution operation on the feature data set may also be two or more.

Optionally, in some embodiments, each of the N data calculation arrays may obtain and save a weight data set, and multiply the first feature data set by using the saved weight data. . Correspondingly, each of the N address calculation arrays can obtain and save the address of the corresponding weight data, and multiply the address of the first feature data set by using the saved weight data address.

If the number of weight data sets used to perform the convolution operation on the feature data set is greater than N, the N data calculation arrays can obtain the N weight data sets each time and multiply the first feature data set . If the number of weight data sets that can be acquired at one time is less than N, all the weight data sets are acquired to perform a multiplication operation on the first feature data set. Assume that the value of N is 4, and the number of weight data sets is 9. In this case, the four data calculation arrays can first obtain the first to fourth weight data sets and multiply the first feature data set, and then the four data calculation arrays can then obtain the fifth to eighth data sets. The weight data set performs a multiplication operation on the first feature data set, and then the four data calculation arrays obtain a ninth weight data set and perform a multiplication operation on the first feature data set. The manner in which the N address calculation arrays perform address operations is similar, and it is unnecessary to repeat them here.

Optionally, in other embodiments, the weight data stored in different data calculation arrays in the N data calculation arrays may be the result of rearranging the same weight data in rows. For example, suppose that the N data calculation arrays include a first data calculation array and a second data calculation array. The n × m weight data stored in the second data calculation array are n × m pieces of data stored in the first data calculation array. The weight data is n × m weight data after row rearrangement.

As shown in FIG. 9, the data calculation array 1 stores 3 × 3 weight data, wherein the weight data of the first row is b ₁₁ , b ₁₂ and b ₁₃ ; the weight data of the second row is b ₂₁ , b ₂₂ And b ₂₃ ; the weight data of the third row are b ₃₁ , b ₃₂ and b ₃₃ . The data calculation array 2 holds 3 × 3 weight data, wherein the weight data of the first row are b ₃₁ , b ₃₂ and b ₃₃ ; the weight data of the second row are b ₁₁ , b ₁₂ and b ₁₃ ; the third The exercise weight data are b ₂₁ , b ₂₂ and b ₂₃ . It can be seen that the result of rearranging the weight data stored in the data calculation array 1 is the weight data stored in the data calculation array 2. Correspondingly, the weight data stored in the data calculation array 1 may also be considered as a result of rearranging the weight data stored in the data calculation array 2 in rows. For the convenience of description, the weight data obtained after the rearrangement by rows is referred to as rearrangement weight data, and the weight data stored by the two data calculation arrays shown in FIG. 9 are referred to as mutual rearrangement weights. Value data.

Figure 9 shows the relationship between the weight data stored in the two data calculation arrays. Optionally, in some embodiments, the weight data stored in any two of the three or more data calculation arrays is also rearranged weight data. For example, the N data calculation arrays also include a data calculation array 3 as shown in FIG. 10. The data calculation 3 array stores 3 × 3 weight data, where the weight data of the first row is b ₂₁ , b ₂₂ And b ₂₃ ; weight data in the second row are b ₃₁ , b ₃₂ and b ₃₃ ; weight data in the third row are b ₁₁ , b ₁₂ and b ₁₃ . It can be seen that the weight data stored in data calculation array 1 and data calculation array 3 shown in FIG. 9 are rearranged weight data; the weight data stored in data calculation array 2 and data calculation array 3 are also each other. Rearrange weight data. In summary, if the value of N is greater than or equal to n, and the weight data includes a total of n rows, the weight data can be rearranged at most n-1 times. In the n data calculation arrays of the N data calculation arrays, The weight data stored in the 2nd to nth data calculation arrays are the weighted data in which the weight data stored in the first data calculation array of the n data calculation arrays are rearranged in rows, where: Any two row vectors of the n weight data stored in the n data calculation arrays in the row vector of the same row are different. N is a positive integer greater than or equal to n. In this case, the first data calculation array and the second data calculation array are any two data calculation arrays among the n data calculation arrays. In other words, each of the n data calculation arrays holds the first row weight data of the remaining n-1 data calculation arrays from the second row weight data to the nth row weight data. data.

Optionally, in some embodiments, the data calculation array 2 and the data calculation array 3 may first obtain 3 × 3 weight data as shown in FIG. 1, and then perform data rearrangement to obtain rearranged weight data.

Optionally, in other embodiments, the storage module may store the rearrangement weight data, and the data calculation array 2 and the data calculation array 3 directly obtain the rearrangement weight data from the storage module.

It can be understood that, because the data calculation array corresponds to the address calculation array, the address of the weight data stored in the second address calculation array corresponding to the second data calculation array also corresponds to the first data calculation array. The first address calculates the result of row-wise rearrangement of the address of the weight data held by the array.

Similarly, if the value of N is greater than or equal to n, the weight data includes a total of n lines, and the address of the weight data also includes n lines. The address of the weight data can be rearranged at most n-1 times. The addresses of the weight data stored in the 2nd to nth address calculation arrays in the n address calculation arrays of the N address calculation arrays are all For the first address in the n address calculation arrays, the address of the weight data stored in the array is sorted by the address of the weight data. N is a positive integer greater than or equal to n. In this case, the first address calculation array and the second address calculation array are any two address calculation arrays among the n address calculation arrays. In other words, the addresses of the first row of weight data stored in each of the n address calculation arrays are the addresses of the second row of weight data in the remaining n-1 data calculation arrays, respectively. The address of n rows of weight data.

After the weight data and the corresponding weight data address are rearranged in a row, the feature value data can be reused to further reduce the number of times the data calculation array and the address calculation array access the storage module.

For example, in the process of performing a convolution operation on the feature data set shown in FIG. 1 using the weight data set shown in FIG. 1, it is also necessary to determine the operation result shown in Formula 1.3:

c ₂₁ = a ₂₁ × b ₁₁ + a ₂₂ × b ₁₂ + a ₂₃ × b ₁₃ + a ₃₁ × b ₂₁ + a ₃₂ × b ₂₂ + a ₃₃ × b ₂₃ + a ₄₁ × b ₃₁ + a ₄₂ × b ₃₂ + a ₄₃ × b ₃₃ , formula 1.4

If the weight data stored in the second data calculation array after rearranging the weight data is shown in FIG. 9, after one access to the storage module, a partial result of formula 1.4 can be obtained.

Specifically, when the data calculation array 2 shown in FIG. 9 uses the stored weight data to multiply the feature data set, the operation results of a ₂₁ × b _{11 and} the operation results of a ₂₂ × b ₁₂ can be obtained. The operation result of a ₂₃ × b ₁₃ , the operation result of a ₃₁ × b ₂₁ , the operation result of a ₃₂ × b ₂₂ , and the operation result of a ₃₃ × b ₂₃ . According to the operation rules described earlier, the sum of the above 6 operation results will be saved to the same destination address.

It is assumed that the data processing device includes only the data calculation array 1 and the data calculation array 2 and the weight data stored in the data calculation array 1 and the data calculation array 2 is shown in FIG. 9. In the process of using the data calculation array 1 and the data calculation array 2 to multiply the feature data set shown in FIG. 1, the data calculation array 1 and the data calculation array 2 are in the first row to the third line of the feature data set. After the row characteristic data is multiplied, the feature data in the third to fifth rows of the feature data set may be multiplied. In other words, in the process of traversing the feature data set and performing the multiplication operation, the step size for sliding down may be two. In the case that the weight data is not rearranged (in other words, the data processing device has only the data calculation array 1 shown in FIG. 9), if it is desired to obtain a ₂₁ × b ₁₁ , a ₂₂ × b ₁₂ , a ₂₃ × b ₁₃ and other calculation results, after the multiplication of the feature data of the first to third rows is completed, the data calculation array 1 is used to perform the multiplication of the feature data of the second to fourth rows. The multiplication needs to obtain the feature data of the second to third rows of the feature data set again. In other words, the feature data in the second to third rows of the feature data set needs to be read a second time to obtain the operation results of a ₂₁ × b ₁₁ , a ₂₂ × b ₁₂ , a ₂₃ × b _13, etc. The same feature data needs to be read multiple times.

Because the weight data is rearranged, the data calculation array 2 performs a multiplication operation on the second to third feature data of the feature data set, which is equivalent to using the data to calculate the array 1 with a step size of 1. After sliding down, the result of multiplication is performed on the feature data in the second to third rows. In other words, as long as the feature data of the second to third rows of the feature data set is read once, the multiplication operation of the two weight data sets to the feature data of the second to third rows can be realized. In this way, more partial Cartesian products can be obtained by one reading of the characteristic data. Because in practice, there is also a method of making predictions by using part of the Cartesian product of the feature data set and the weight data set, so by rearranging the weight data by rows, and according to the feature data set, the original weight data and weight Multiplying the weighted data in the back row and obtaining the target data set including a partial Cartesian product according to the result can reduce the number of accesses to the storage module and increase the speed of data processing.

When the first weight matrix with n rows is rearranged (n-1) times, and any two of the n row vectors of the n weight matrices located in the same row are different, the feature After the data set is multiplied with the n weight matrix, the Cartesian product of the feature data set and the first weight matrix can be obtained, and the convolution of the feature data set and the first weight matrix can be further obtained. Each feature data in the data set need only be loaded into the data processing unit once.

In the process of multiplying the feature data of the first to third rows of the feature data set shown in FIG. 1 by using the weight data shown in FIG. 10, a ₃₁ × b ₁₁ , a ₃₂ × b can be obtained. ₁₂ and a ₃₃ × b ₁₃ result. After the first data calculation array, the second data calculation array, and the third data calculation array multiply the feature data of the first to third rows of the feature data set, the fourth data calculation array may Multiplication is performed on the characteristic data from the fifth to the fifth rows. In other words, in the process of traversing the feature data set for multiplication, the step size for sliding down may be 3.

Assuming that there are three data calculation arrays in the N data calculation arrays, and the three data calculation arrays are respectively shown as data calculation array 1, data calculation array 2 and data calculation array 3 shown in FIG. 10, then Three data calculation arrays can complete the Cartesian product operation on the feature data set.

The feature data a ₁₁ , a ₂₁ , a ₃₁ , a ₁₂ , a ₂₂ , a ₃₂ , a ₁₃ , a ₂₃ , and a _{33 are also taken} as examples. The three data calculation arrays can perform the multiplication process shown in FIG. 5 with the characteristic data a ₁₁ , a ₂₁ , a ₃₁ , a ₁₂ , a ₂₂ , a ₃₂ , a ₁₃ , a ₂₃ , and a _33, respectively. These three data calculation arrays use the weight data stored separately to complete the multiplication of the characteristic data a ₁₁ , a ₂₁ , a ₃₁ , a ₁₂ , a ₂₂ , a ₃₂ , a ₁₃ , a ₂₃ , a ₃₃ , such as Table 1 shows.

In summary, if the weight data includes a total of n rows, the weight data can be rearranged at most n-1 times. If the weight data is rearranged once, during the process of traversing the feature data set for multiplication, the step size for sliding down may be 2; if the weight data is rearranged twice, the feature data set is traversed In the process of multiplication, the step size for sliding down can be 3; if the weight data is rearranged n-1 times, during the process of traversing the feature data set for multiplication, the step size for sliding down can be Is n.

Optionally, in some embodiments, the first feature data set is a feature data set obtained by thinning the second feature data set. The first weight data set is a weight data set obtained after thinning. The data processing apparatus 200 shown in FIG. 2 may further include a compression module. The compression module is configured to obtain a second feature data set, and perform thinning processing on the second feature data set to obtain the first feature data set. The second feature data set includes feature data corresponding to the input data. The compression module is further configured to obtain a second weight data set, and perform thinning processing on the second weight data set to obtain the first weight data set. The compression module is further configured to determine an address of each feature data in the first feature data set, and determine an address of each weight data in the first weight data set. The compression module will obtain the first feature data set, the first weight data set, the address of each feature data in the first feature data set, and the weight of each weight data in the first weight data set. The address is sent to the storage module and saved by the storage module. If the thinned weight data is less than n × m, the remaining bits are padded with zeros.

The input data referred to in the embodiments of the present application may be any data capable of performing a multiplication operation, a Cartesian product operation, and / or a convolution operation. For example, it may be image data, voice data, and the like. The input data is a collective term for all data input to the data processing device. The input data may consist of characteristic data. The feature data corresponding to the input data may be all data included in the input data, or may be part of the feature data of the input data. Taking image data as an example, assuming that the input data is an entire image, all the data of the image is called feature data. The second weight data set may include all feature data of the input data, or may be all or part of the feature data of the image after some processing. For example, the second weight data may be feature data of a partial image obtained after the image is segmented.

Assume that the second feature data set includes: 5, 0, 0, 32, 0, 0, 0, 0, 23, 0, 0, 0, 0, 0, 43, 54, 0, 0, 0, 1, 4, 9,34,0,0,0,0,0,0,87,0,0,0,0,5,8, then the first feature data set obtained after thinning includes: 5, 32, 23, 43 , 54,1,4,9,34,87,5,8. Assume that the address of the first feature data in the second feature data set is 0, the address of the second feature data is 1, the address of the third feature data is 2, and the address of the nth feature data is n-1. , The address (absolute address) of the first feature data set is: 0, 3, 8, 14, 15, 19, 20, 21, 22, 29, 34, 35.

Assume that the second weight data set includes 8, 4, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 54, 0, 0, 0, 0,0,12,0,0,22,3,45,0,0,0,0,67,44,0,0,0,0,0,0,0,0,35,65,75, Then the second weighted data set after sparseness includes: 8, 4, 2, 24, 54, 12, 22, 3, 45, 67, 44, 35, 65, 75. It can be seen that the thinned second weight data set includes 14 weight data. It is assumed that each data calculation array includes 3 × 3 data calculation units. Therefore, the number of weight data of the second thinned data set after sparseness is less than the number of data calculation units included in the two data calculation arrays. Therefore, 4 zeros are finally added to the second sparse weighted data set to obtain the first weighted data set. Therefore, the set of first weight data corresponding to the second weight data is: 8, 4, 2, 24, 54, 12, 22, 3, 45, 67, 44, 35, 65, 75, 0, 0,0,0. Assume that the address of the first weight data in the second weight data set is 0, the address of the second weight data is 1, the address of the third weight data is 2, and the address of the nth weight data is If the address is n-1, the address (absolute address) of the first weight data set is: 0, 1, 6, 16, 17, 23, 26, 27, 28, 33, 34, 43, 44, 45.

In some embodiments, the first feature data set may also be a feature data set that has not been thinned. In other words, the first feature data set may be equal to the second feature data set.

The first feature data set in the above embodiment corresponds to a matrix, and accordingly, the weight data used to perform the convolution operation on the first feature data set also corresponds to a matrix. In other words, the convolution operation described in the above embodiment is a two-dimensional convolution operation.

The technical solutions of the embodiments of the present application can also be applied to T-dimensional multiplication operations, Cartesian product operations, and / or convolution operations (T is a positive integer greater than or equal to 3). In addition, there may be multiple weight data sets for performing a convolution operation on the first feature data set.

The technical solution of the present application is described below by taking a three-dimensional convolution operation as an example.

If the input data corresponding to the first feature data set is a color picture data, the first feature data set may be a three-dimensional tensor. A three-dimensional convolution operation may be performed on the first feature data set.

The first feature data set includes three subsets: feature data subset 1, feature data subset 2, and feature data subset 3. The feature data of the three subsets correspond to the three input channels of red, green, and blue, respectively. The feature data in each of the three subsets may correspond to a matrix.

It is assumed that a three-dimensional convolution operation is performed on the first feature data set using three weight data sets. The weight data set used to perform the convolution operation on the feature data set may also be referred to as a filter. Therefore, the three weight data sets can be referred to as filter 1, filter 2, and filter 3. Each of the three weight data sets includes three weight channels, namely channel 1, channel 2 and channel 3. The weight data included in each of the three weight channels may correspond to a matrix. The three weight channels correspond one-to-one with the three feature data subsets. For example, channel 1 corresponds to feature data subset 1, channel 2 corresponds to feature data subset 2, and channel 3 corresponds to feature data subset 3. The weight channel can perform a convolution operation on the corresponding feature data subset. The filter 1, filter 2 and filter 3 may respectively perform a three-dimensional convolution operation on the first feature data set. That is, channel 1 of filter 1 performs a convolution operation on the characteristic data subset 1 of the first characteristic data set, and channel 2 of filter 1 performs a convolution on the characteristic data subset 2 of the first characteristic data set. Operation, channel 3 of filter 1 performs a convolution operation on the characteristic data subset 3 of the first characteristic data set; channel 1 of filter 2 performs a convolution operation on the characteristic data subset 1 of the first characteristic data set, Channel 2 of filter 2 performs a convolution operation on the feature data subset 2 of the first feature data set, and channel 3 of filter 2 performs a convolution operation on the feature data subset 3 of the first feature data set; filter Channel 1 of 3 performs a convolution operation on the feature data subset 1 of the first feature data set, and channel 2 of filter 3 performs a convolution operation on the feature data subset 2 of the first feature data set. Channel 3 performs a convolution operation on the feature data subset 3 of the first feature data set.

It can be seen that the process of performing a three-dimensional convolution operation on the first feature data set by each of the three filters can be decomposed into three two-dimensional convolution operation processes. The specific implementations of the three two-dimensional convolution operations are similar to the specific implementations of the two-dimensional convolution operation in the foregoing embodiment. Taking channel 1 for convolution operation on the characteristic data subset 1 as an example, channel 1 can be considered as the weight data set shown in FIG. 1, and the characteristic data subset 1 can be considered as the characteristic data set shown in FIG. 1. The process of performing a convolution operation on the feature data subset by channel 1 is a process of performing a convolution operation on the feature data set by the weight data set shown in FIG. 1. As described above, the convolution operation process can be decomposed into a multiplication operation and an addition operation. Therefore, the data processing apparatus shown in FIG. 2 can also perform a three-dimensional convolution operation. In the case where the input data corresponding to the feature data set is a three-dimensional tensor, the first feature data set referred to in the above embodiment may be considered as a feature data subset in the feature data set corresponding to the three-dimensional tensor. In a case where a convolution operation is performed on the feature data set using multiple weight data sets, the first weight data set may be considered as one weight data set among the multiple weight data sets. In the case where the weight data set also corresponds to a three-dimensional tensor, the first weight data set can be considered as a channel in the weight data set of the three-dimensional tensor.

The process of multi-dimensional convolution operations above 3D is similar to the process of 3D convolution operations, and it is unnecessary to repeat the description here.

Optionally, in the case of performing a convolution operation on the feature data set by using multiple weight data sets, the first weight data set may also be a weight value obtained by performing thinning processing on multiple weight data sets. Data collection. Specifically, the non-zero weight data included in the first weight data set comes from one channel of the same weight data set or the same channel of different weight data sets.

The following describes the thinning processing performed on multiple weight data sets with reference to FIG. 11.

FIG. 11 is a schematic diagram of a weight matrix with three filters and thinning processing provided in an embodiment of the present application. Each of the three filters shown in FIG. 11 includes three weight channels, and each weight channel includes 3 × 3 weight data.

As shown in FIG. 11, the weight data of weight data set 1 comes from the weight data in channel 1 of filter 1 and filter 2, and the weight data of weight data set 4 comes from filter 2 and filter Weight data in channel 1 of 3. The weight data of weight data set 2 comes from the weight data in filter 1 and channel 2 of filter 2, and the weight data of weight data set 5 comes from the filter 2 and channel 2 of filter 3. Weight data. The weight data of the weight data set 3 comes from the weight data in channel 3 of filter 1 and filter 2, and the weight data of the weight data set 6 comes from the channel 3 of filter 2 and filter 3. Weight data.

As shown in Figure 11, the same channel with weight data from different weight data means that the weight data can belong to different filters, but the channels in different filters are the same. For example, the weight data of the weight data set 4 comes from the weight data in channel 1 of filter 2 and the weight data in channel 1 of filter 3.

For the convenience of description, the weight data set obtained by thinning the weight data in multiple filters is hereinafter referred to as the sparse weight data set)

In some embodiments, the weight data included in the sparse weight data set may come from the same filter. The process of multiplying feature data by the sparse weighted data set and the process of determining the result of the convolution operation of the set of sparsely weighted data set and feature data according to the operation result of the multiplication are the same as the above embodiments, here No need to repeat them.

In some embodiments, the weight data included in the sparse weight data set may come from different filters. The operation process of multiplying the feature data by the sparse weighted data set is the same as the above embodiment, and it is unnecessary to repeat it here. In the case that the weight data included in the thinning weight data set can come from different filters, the process of determining the convolution operation result of the thinning weight data set and the characteristic data according to the operation result of the multiplication operation and the above implementation The examples are not exactly the same.

Specifically, it is assumed that the weight data included in the thinning weight data set comes from P filters (P is a positive integer greater than or equal to 2). The sparseness weight data set can be divided into P sparseness weight data subsets, and the p-th sparseness weight data subset of the P sparseness weight data subsets includes data from the P filters. The weight data of the p-th filter, p = 1,..., P. Assume that the p-th sparse thinned weight data subset includes Num _p weight data, where Num _p is a positive integer greater than or equal to 1, and Num _{p is} less than n × m.

By using the N data calculation arrays to perform a Cartesian product operation between the sparse weighted data set and the feature data set, the multiplication result required for each filter to perform a convolution operation with the feature data set, and then correspondingly By adding the multiplication results of, you can get the convolution operation result of each filter and the feature data set.

The weight data stored in the three data calculation arrays shown in FIG. 9 and FIG. 10 are also taken as an example. It is assumed that the weight data shown in FIG. 9 and FIG. 10 are obtained by thinning the weight data of channel 1 of filter 1 and the weight data of channel 1 of filter 2 shown in FIG. 12. Using the three data calculation arrays shown in FIG. 10 to perform a Cartesian product operation on {a ₁₁ , a ₁₂ , a ₁₃ , a ₂₁ , a ₂₂ , a ₂₃ , a ₃₁ , a ₃₂ , a ₃₃ }, the following operation can be obtained: Results: a ₁₁ × b ₁₁ , a ₁₂ × b ₁₂ , a ₁₃ × b ₁₃ , a ₂₁ × b ₂₁ , a ₂₂ × b ₂₂ , a ₂₃ × b ₂₃ , a ₁₁ × b ₃₁ , a ₁₂ × b ₃₂ , a ₁₃ × b ₃₃ . It can be seen that the sum of a ₁₁ × b ₁₁ , a ₁₂ × b ₁₂ , a ₁₃ × b ₁₃ , a ₂₁ × b ₂₁ , a ₂₂ × b ₂₂ , and a ₂₃ × b ₂₃ is the weight of channel 1 of filter 1 Data results of convolution operations on {a ₁₁ , a ₁₂ , a ₁₃ , a ₂₁ , a ₂₂ , a ₂₃ , a ₃₁ , a ₃₂ , a ₃₃ }; a ₁₁ × b ₃₁ , a ₁₂ × b ₃₂ , The sum of a ₁₃ × b ₃₃ is the weight data of channel 1 of filter 2 to convolve {a ₁₁ , a ₁₂ , a ₁₃ , a ₂₁ , a ₂₂ , a ₂₃ , a ₃₁ , a ₃₂ , a ₃₃ } The result of the operation.

Further, the compression module may also perform thinning processing on the target data set, and delete 0 in the target data set.

Through the above technical solution, a product of each feature data in the first feature data set and each weight data in the first weight data set can be obtained. After that, the corresponding product results can be added to obtain the convolution operation result of the first feature data set and the first weight data set.

In addition, in the above embodiment, in the process of determining a convolution operation result of the first feature data set and the first weight value set according to the calculation result of the Cartesian product and the address operation result, each data in the data calculation array The calculation unit adds the product of the weight data and the characteristic data to the data stored in the target address determined by the corresponding address calculation unit, and writes the added data back to the target address. In this way, the final saved result of the target address is the result of the convolution operation.

In other embodiments, each data calculation unit in the data calculation array may perform only a multiplication operation, that is, multiply the data with the characteristic data, and save the multiplied result to the target address determined by the corresponding address calculation unit, and then The multiplication result is obtained from the corresponding target address, and the obtained multiplication results are added to obtain the corresponding convolution operation result. For example, the result of a ₁₁ × b ₁₁ is stored in target address 1, the result of a ₂₁ × b ₂₁ is stored in target address 2, the result of a ₃₁ × b ₃₁ is stored in target address 3, and the result of a ₁₂ × b ₁₂ is stored in The result at destination address 4, a ₂₂ × b ₂₂ is stored at destination address 5, the result at a ₃₂ × b ₃₂ is stored at destination address 6, the result at a ₁₃ × b ₁₃ is stored at destination address 7, and the result at a ₂₃ × b ₂₃ The result stored at destination address 8, a ₃₃ × b ₃₃ is stored at destination address 9. When calculating the convolution result, the data stored in the target address 1 to the target address 9 can be added to obtain c ₁₁ as shown in formula 1.1.

In other embodiments, the storage module may include an addition unit. Each data calculation unit in the data calculation array can only perform multiplication operations, that is, multiply the data with the characteristic data, and output the result of the multiplication to the storage module. The storage module stores the received data to the data calculation unit. When the corresponding address calculation unit determines the target address, the received data is first added to the data stored in the target address, and the added data is saved to the target address. In this way, the final saved result of the target address is the result of the convolution operation.

FIG. 13 is a schematic flowchart of a data processing method according to an embodiment of the present application. The method shown in FIG. 13 may be executed by the data processing apparatus shown in FIG. 2 or FIG. 14.

1301: Obtain a first weight matrix in a first weight data set, where the first weight matrix is represented as n rows and m columns of weight data, and the data in the first weight data set is from the same Input channel, where n is an integer greater than or equal to 2 and m is an integer greater than or equal to 2.

1302: Obtain a second weight matrix, where the second weight matrix is a matrix after the first weight matrix is rearranged in rows.

1303. Use a first weight matrix to perform a multiplication operation with a first feature data set, where the data in the first feature data set comes from the same input channel.

1304. Perform a multiplication operation using the second weight matrix and the first feature data set.

1305. Determine a target data set according to an operation result of the multiplication operation.

For a specific implementation manner of each step of the method shown in FIG. 13, reference may be made to the description of FIGS. 2 to 12, and details are not described herein again.

Optionally, in some embodiments, the method further includes: obtaining addresses of weight data in the first weight matrix and the second weight matrix; using the first weight matrix and the second weight matrix An address operation is performed on the address of the weight data and the address in the first characteristic data set; the determining the target data set according to the operation result of the multiplication operation includes: according to the operation result of the multiplication operation and the operation result of the address operation To determine the target data set. For specific implementation manners of the foregoing steps, reference may also be made to the description of FIG. 2 to FIG. 12, and details are not described herein again.

Optionally, in some embodiments, the method further includes: obtaining a third weight matrix to an n-th weight matrix in the first weight data set, wherein the third weight matrix to the n-th weight matrix The matrix is a matrix in which the first weight matrix is rearranged in rows, and any two row vectors of n row vectors in the same row of the first weight matrix to the nth weight matrix are different; obtain The addresses of the weight data in the third to n-th weight matrices; using the addresses of the weight data in the third to n-th weight matrices and the addresses of the feature data in the first feature data set Address calculation. For specific implementation manners of the foregoing steps, reference may also be made to the description of FIG. 2 to FIG. 12, and details are not described herein again.

Optionally, in some embodiments, the target data set includes a result matrix, which is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is Expressed as a first feature matrix, the method further includes: calculating an address of weight data stored in the array, an address of a first feature data set, a size corresponding to the first feature matrix, a padding size, and a weight value according to the each address The size determines the first target address, where the weight size is n rows and m columns, and the padding size is the difference between the size of the first feature data set and the size of the result matrix. For specific implementation manners of the foregoing steps, reference may also be made to the description of FIG. 2 to FIG. 12, and details are not described herein again.

Optionally, in some embodiments, the method further includes: obtaining a second feature data set, removing elements with a value of 0 in the second feature data set to obtain the first feature data set; obtaining second weight data Set, removing elements with a value of 0 in the second weight data set to obtain the first weight data set; determining the address of each feature data in the first feature data set, and determining the first weight data set The address of each weight in. For specific implementation manners of the foregoing steps, reference may also be made to the description of FIG. 2 to FIG. 12, and details are not described herein again.

FIG. 14 is a structural block diagram of a data processing apparatus according to an embodiment of the present application. The data processing device 1400 shown in FIG. 14 includes a data processing module 1401 and a control module 1404. The data processing module 1401 includes N data calculation units, where N is an integer greater than or equal to 2, where: the data processing module 1401 is used for: Obtain a first weight matrix in a first weight data set, where the first weight matrix is represented as n rows and m columns of weight data, and the data in the first weight data set is from the same input channel Where n is an integer greater than or equal to 2 and m is an integer greater than or equal to 2; a second weight matrix is obtained, where the second weight matrix is after the first weight matrix is rearranged in rows. Matrix; use the first weight matrix to multiply with the first feature data set, where the data in the first feature data set comes from the same input channel; use the second weight matrix and the first feature data set Perform a multiplication operation; the control module 1404 is configured to determine a target data set according to an operation result of the multiplication operation.

Optionally, in some embodiments, the data processing device 1400 further includes an address processing module 1402, and the address processing module 1402 includes N address calculation units. The data calculation unit and the address calculation unit correspond one-to-one, where: the address processing module 1402 Configured to: obtain the addresses of the weight data in the first weight matrix and the second weight matrix; use the addresses of the weight data in the first weight matrix and the second weight matrix and the first feature data The address in the set performs an address operation; the control module 1404 is configured to determine the target data set according to the operation result of the multiplication operation, and includes: determining the target data set according to the operation result of the multiplication operation and the operation result of the address operation.

Optionally, in some embodiments, the data processing module 1401 is further configured to obtain a third weight matrix to an n-th weight matrix in the first weight data set, where the third weight matrix is to The n-th weight matrix is a matrix after the first weight matrix is rearranged in rows, and any two of the n row vectors in the same row of the first weight matrix to the n-th weight matrix are in any two row vectors. Not the same; the address processing module 1402 is further configured to: obtain the addresses of the weight data in the third to n-th weight matrix; use the addresses of the weight data in the third to n-th weight matrix and The address of the feature data in the first feature data set is subjected to an address operation.

Optionally, in some embodiments, the target data set includes a result matrix, which is a result of a convolution operation performed on the first feature data set and the first weight data set, and the first feature data set is Represented as a first feature matrix, the address processing module 1402 is further configured to calculate an address of the weight data stored in the array, an address of a first feature data set, a size corresponding to the first feature matrix, and a padding size according to each address. And the weight size to determine the first target address, where the weight size is n rows and m columns, and the padding size is the difference between the size of the first feature data set and the size of the result matrix.

Optionally, in some embodiments, the data processing device 1400 further includes a compression module 1403, configured to: obtain a second feature data set, and remove elements having a value of 0 in the second feature data set to obtain the first feature data A set; obtaining a second weight data set, removing elements with a value of 0 in the second weight data set to obtain the first weight data set; determining an address of each feature data in the first feature data set, An address for each weight in the first weight data set is determined.

For specific functions and beneficial effects of each module in the data processing apparatus 1400 shown in FIG. 14, reference may be made to the description of FIGS. 2 to 12, and details are not described herein.

In the embodiment of the present application, the terminal device or the network device includes a hardware layer, an operating system layer running on the hardware layer, and an application layer running on the operating system layer. This hardware layer includes hardware such as a central processing unit (CPU), a memory management unit (MMU), and a memory (also called main memory). The operating system may be any one or more computer operating systems that implement business processing through processes, such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system. This application layer contains applications such as browsers, address books, word processing software, and instant messaging software. In addition, the embodiment of the present application does not specifically limit the specific structure of the execution subject of the method provided by the embodiment of the present application, as long as the program that records the code of the method provided by the embodiment of the application can be run to provide the program according to the embodiment of the application. The communication may be performed by using the method described above. For example, the method execution subject provided in the embodiments of the present application may be a terminal device or a network device, or a function module in the terminal device or the network device that can call a program and execute the program.

In addition, various aspects or features of the present application may be implemented as a method, apparatus, or article of manufacture using standard programming and / or engineering techniques. The term "article of manufacture" as used in this application encompasses a computer program accessible from any computer-readable device, carrier, or medium. For example, computer-readable media may include, but are not limited to: magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CD), digital versatile discs (DVD) Etc.), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.). In addition, the various storage media described herein may represent one or more devices and / or other machine-readable media used to store information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing, and / or carrying instruction (s) and / or data.

Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.

If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. The aforementioned storage media include: U disks, mobile hard disks, read-only memories (ROMs), random access memories (RAMs), magnetic disks or compact discs and other media that can store program codes .

The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. It should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A data processing device, characterized in that the data processing device includes:

A data processing module, configured to obtain a first weight matrix in a first weight data set, where the first weight matrix is represented as n rows and m columns of weight data, and the first weight data The data in the set comes from the same input channel, where n is an integer greater than or equal to 2 and m is an integer greater than or equal to 2;

Obtaining a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix in which the first weight matrix is rearranged in rows;

First multiplication using first weight matrix and first feature data set

Perform a second multiplication operation using the second weight matrix and the first feature data set;

The control module is configured to determine a target data set according to an operation result of the first multiplication operation and the second multiplication operation.
The data processing device according to claim 1, wherein the data processing device further comprises:

An address processing module, configured to obtain addresses of weight data in the first weight matrix and the second weight matrix;

Perform an address operation using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first feature data set;

The control module is used for:

A target data set is determined according to an operation result of the multiplication operation and an operation result of the address operation.
The data processing device according to claim 2, wherein:

The data processing module is further configured to obtain a third weight matrix to an n-th weight matrix in the first weight data set, wherein the third weight matrix to the n-th weight matrix Is a matrix after the first weight matrix is rearranged by rows, and any two row vectors of the n row vectors of the first weight matrix to the nth weight matrix in the same row are not the same;

The address processing module is further configured to:

Obtaining addresses of weight data in the third weight matrix to the n-th weight matrix;

An address operation is performed using the addresses of the weight data of the third to n-th weight matrixes and the addresses of the feature data in the first feature data set.
The data processing device according to claim 2 or 3, wherein the target data set includes a result matrix, and the result matrix is a convolution of the first feature data set and the first weight data set A result of the operation, the first feature data set is represented as a first feature matrix;

The address processing module is further configured to calculate an address of the weight data stored in the array, an address of a first feature data set, a size of the first feature matrix, a padding size, and a weight size according to the each address to determine The first target address, wherein the weight size is n rows and m columns, the filling size includes a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1) / 2, and the vertical filling size Yes (m-1) / 2.
The data processing device according to any one of claims 1-4, wherein the data processing device further comprises a compression module, configured to: obtain a second feature data set, and median the second feature data set Removing elements with 0 to obtain the first feature data set;

Acquiring a second weight data set, and removing elements with a value of 0 in the second weight data set to obtain the first weight data set;

Determining an address of each feature data in the first feature data set, and determining an address of each weight in the first weight data set.
A data processing method, characterized in that the method includes:

Obtain a first weight matrix in a first weight data set, where the first weight matrix is represented as n rows and m columns of weight data, and the data in the first weight data set is from the same Input channel, where n is an integer greater than or equal to 2 and m is an integer greater than or equal to 2;

Obtaining a second weight matrix according to the first weight matrix, wherein the second weight matrix is a matrix in which the first weight matrix is rearranged in rows;

Perform a first multiplication operation using a first weight matrix and a first feature data set;

Perform a second multiplication operation using the second weight matrix and the first feature data set;

A target data set is determined according to an operation result of the first multiplication operation and the second multiplication operation.
The method according to claim 6, further comprising:

Obtaining addresses of weight data in the first weight matrix and the second weight matrix;

Perform an address operation using the addresses of the weight data in the first weight matrix and the second weight matrix and the addresses in the first feature data set;

The determining a target data set according to an operation result of the first multiplication operation and the second multiplication operation includes:

A target data set is determined according to an operation result of the first multiplication operation, an operation result of the second multiplication operation, and an operation result of the address operation.
The method according to claim 7, further comprising: obtaining a third weight matrix to an n-th weight matrix in the first weight data set, wherein the third weight The matrix to the n-th weight matrix is a matrix in which the first weight matrix is rearranged in rows, and any of the n row vectors of the first to n-th weight matrices located in the same row is in the same row. The two row vectors are not the same;

Obtaining addresses of weight data in the third weight matrix to the n-th weight matrix;

An address operation is performed using the addresses of the weight data of the third to n-th weight matrixes and the addresses of the feature data in the first feature data set.
The method according to claim 7 or 8, wherein the target data set includes a result matrix, and the result matrix is a convolution operation performed on the first feature data set and the first weight data set. As a result, the first feature data set is represented as a first feature matrix,

The method further includes:

The address of the weight data stored in the array, the address of the first feature data set, the size corresponding to the first feature matrix, the filling size, and the weight size are determined according to each address to determine a first target address, where: The weight size is n rows and m columns, the filling size includes a horizontal filling size and a vertical filling size, the horizontal filling size is (n-1) / 2, and the vertical filling size is (m-1) / 2.
The method according to any one of claims 5 to 9, wherein the method further comprises:

Acquiring a second feature data set, and removing elements with a value of 0 in the second feature data set to obtain the first feature data set;

Acquiring a second weight data set, and removing elements with a value of 0 in the second weight data set to obtain the first weight data set;

Determining an address of each feature data in the first feature data set, and determining an address of each weight in the first weight data set.
A data processing device, characterized in that the data processing device includes:

A processor and a memory, where the memory stores program code, and the processor is configured to call the program code in the memory to perform the data processing method according to any one of claims 6-10.