CN112966729B - Data processing method and device, computer equipment and storage medium - Google Patents

Data processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112966729B
CN112966729B CN202110221235.3A CN202110221235A CN112966729B CN 112966729 B CN112966729 B CN 112966729B CN 202110221235 A CN202110221235 A CN 202110221235A CN 112966729 B CN112966729 B CN 112966729B
Authority
CN
China
Prior art keywords
processing
array
weight
elements
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110221235.3A
Other languages
Chinese (zh)
Other versions
CN112966729A (en
Inventor
周军
常亮
周亮
王文强
吴飞
徐宁仪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Original Assignee
University of Electronic Science and Technology of China
Chengdu Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China, Chengdu Sensetime Technology Co Ltd filed Critical University of Electronic Science and Technology of China
Priority to CN202110221235.3A priority Critical patent/CN112966729B/en
Publication of CN112966729A publication Critical patent/CN112966729A/en
Priority to PCT/CN2021/115789 priority patent/WO2022179075A1/en
Application granted granted Critical
Publication of CN112966729B publication Critical patent/CN112966729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Abstract

The present disclosure provides a data processing method, apparatus, computer device, and storage medium, wherein the method comprises: determining target characteristic elements and target weight elements corresponding to a plurality of processing cycles respectively from the characteristic matrix and the weight matrix of the image to be processed; the characteristic matrix of the image to be processed corresponds to a plurality of weight matrixes; in response to any processing period coming, each PE in the processing engine PE array acquires a target characteristic element corresponding to the processing period and a corresponding target weight element and performs preset operation to obtain intermediate processing data; for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the repeated feature elements respectively correspond to target weight elements corresponding to the repeated feature elements in different weight matrices; and obtaining result data for processing the characteristic matrix of the image to be processed based on the intermediate processing data respectively corresponding to the plurality of processing cycles.

Description

Data processing method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a computer device, and a storage medium.
Background
Convolutional Neural Networks (CNNs) are widely used as important models for deep learning in image recognition, natural language processing, and the like. Convolutional neural networks include a plurality of different network layers, such as convolutional layers, pooling layers, activation layers, and full-link layers.
When each node of the full connection layer is calculated, because the data volume of the input data, the relevant parameters and other processing data is large, each calculation needs a long time to transmit the required data to the operation unit, and the processing efficiency is low.
Disclosure of Invention
The embodiment of the disclosure at least provides a data processing method, a data processing device, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including: determining target characteristic elements and target weight elements corresponding to a plurality of processing cycles respectively from the characteristic matrix and the weight matrix of the image to be processed; the characteristic matrix of the image to be processed corresponds to a plurality of weight matrixes; responding to the arrival of any processing cycle, each PE in the processing engine PE array acquires a target characteristic element corresponding to the processing cycle and a corresponding target weight element and performs preset operation to obtain intermediate processing data; for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the repeated feature elements respectively correspond to target weight elements corresponding to the repeated feature elements in different weight matrices; and obtaining result data for processing the characteristic matrix of the image to be processed based on the intermediate processing data respectively corresponding to the plurality of processing cycles.
Therefore, the characteristic elements transmitted to the PE array are respectively multiplexed in each processing period, so that the data quantity of the data needing to be read into the PE array in each processing period is reduced, the time consumption for reading the data into the PE array is reduced, and the processing efficiency of the PE array is improved.
In one possible implementation, before determining the target feature element and the target weight element corresponding to each of the plurality of processing cycles, the method further includes: and performing size transformation on the original to-be-processed image feature matrix and the original weight matrix based on the size of the PE array to obtain the to-be-processed image feature matrix and the weight matrix.
Therefore, the sizes of the original image feature matrix to be processed and the original weight matrix can be transformed to be matched with the PE array, the processing logic is simpler in the subsequent processing process, and the processing process is simplified.
In a possible implementation manner, the target feature elements respectively corresponding to the plurality of processing cycles include at least one image feature element in the to-be-processed image feature matrix; and the target weight elements respectively corresponding to the processing periods comprise the weight elements corresponding to the positions of the target characteristic elements processed by the corresponding processing periods in at least part of the weight matrixes.
In one possible embodiment, each row of the PE array includes a repeating feature element; in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, including: responding to any processing cycle, transmitting the number of target characteristic elements of the PE array row in the image characteristic matrix to be processed to one column of PE in the PE array, and copying the target characteristic elements in the one column of PE to the PEs in other columns as a first operand of the corresponding PE; transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs; and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
Therefore, the number of the characteristic data which need to be transmitted to the PE array in a row of characteristic elements in the characteristic data of the image to be processed in the PE array is multiplexed, and the data processing efficiency is improved.
In a possible implementation manner, the performing, by using the PE array, a preset operation on a first operand and a second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle includes: in the corresponding processing cycle, performing weighted summation on each column of target characteristic elements in the first operand and each column of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes; and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
Therefore, parallel processing of processing tasks corresponding to the multiple weight data in one processing period is achieved, processing of the processing tasks corresponding to the multiple weight data is completed through the multiple processing periods, elements in the to-be-processed image feature matrix can be multiplexed in each period, and data processing efficiency is improved.
In one possible embodiment, each column of the PE array includes a repeating feature element; in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, including: responding to any processing cycle, transmitting the target characteristic elements of the PE array column number in the image characteristic matrix to be processed to a row of PE in the PE array, and copying the target characteristic elements in the row of PE to the PEs in other rows as a first operand of the corresponding PE; transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs; and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
Therefore, the number of the characteristic data needing to be transmitted to the PE array in the condition that one line of characteristic elements in the characteristic data of the image to be processed are multiplexed in the PE array is achieved, and the data processing efficiency is further improved.
In a possible implementation manner, the performing, by using the PE array, a preset operation on a first operand and a second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle includes: in the corresponding processing cycle, performing weighted summation on each row of target characteristic elements in the first operand and each row of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes; and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
In one possible embodiment, each PE of the PE array includes a repeating feature element; in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, including: responding to any processing period, transmitting a target feature element in the image feature matrix to be processed to one PE in the PE array, and copying the target image feature element in the one PE to other PEs as a first operand of the corresponding PE; transmitting the weight elements of the weight matrix from all the PE numbers in the PE array corresponding to the target characteristic element to each PE in the PE array as a second operand of the corresponding PE; and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In a possible implementation manner, the obtaining, based on intermediate processing data corresponding to a plurality of processing cycles, result data for processing the to-be-processed image feature matrix includes: accumulating intermediate subdata belonging to the same weight matrix in intermediate processing data respectively corresponding to a plurality of processing cycles to obtain sub-result data corresponding to each weight matrix; and obtaining result data for processing the characteristic matrix of the image to be processed based on the sub-result data respectively corresponding to the plurality of weight matrixes.
In a possible embodiment, the preset operation corresponding to any processing cycle includes: and performing sub-operation of full connection operation on the image feature matrix to be processed.
Therefore, full connection processing of the image feature matrix to be processed is realized, the efficiency of the full connection processing is higher, and the processing speed of the neural network which performs the full connection processing by adopting the mode is improved.
In a second aspect, an embodiment of the present disclosure provides a data processing apparatus, including: a controller and a processing engine PE array; the controller is used for determining target characteristic elements and target weight elements corresponding to a plurality of processing cycles from the characteristic matrix and the weight matrix of the image to be processed; the characteristic matrix of the image to be processed corresponds to a plurality of weight matrixes; the PE array is used for responding to any processing period, and each PE in the PE array acquires a target characteristic element corresponding to the processing period and a corresponding target weight element and performs preset operation to obtain intermediate processing data; obtaining result data for processing the characteristic matrix of the image to be processed based on the intermediate processing data respectively corresponding to the plurality of processing cycles; for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the repeated feature elements respectively correspond to target weight elements corresponding to the repeated feature elements in different weight matrices.
In one possible embodiment, before determining the target feature elements and the target weight elements corresponding to the plurality of processing cycles, the controller is further configured to:
and carrying out size transformation on the original image feature matrix to be processed and the original weight matrix based on the size of the PE array to obtain the image feature matrix to be processed and the weight matrix.
In a possible implementation manner, the target feature elements respectively corresponding to the plurality of processing cycles include at least one image feature element in the to-be-processed image feature matrix;
and the target weight elements respectively corresponding to the processing periods comprise the weight elements corresponding to the positions of the target characteristic elements processed by the corresponding processing periods in at least part of the weight matrixes.
In one possible embodiment, each row of the PE array includes a repeating feature element;
the PE array, in response to any processing cycle coming, when each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, is configured to:
responding to any processing cycle, transmitting the number of target characteristic elements of the PE array row in the image characteristic matrix to be processed to one column of PE in the PE array, and copying the target characteristic elements in the one column of PE to the PEs in other columns as a first operand of the corresponding PE; and are combined
Transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In a possible implementation manner, when the PE array performs a preset operation on a first operand and a second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle, the PE array is configured to:
in the corresponding processing cycle, performing weighted summation on each column of target characteristic elements in the first operand and each column of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
In one possible embodiment, each column of the PE array includes a repeating feature element;
when responding to any processing cycle, each PE in the processing engine PE array acquires a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, the PE array is configured to:
responding to any processing cycle, transmitting the target characteristic elements of the PE array column number in the image characteristic matrix to be processed to a row of PE in the PE array, and copying the target characteristic elements in the row of PE to the PEs in other rows as a first operand of the corresponding PE; and are combined
Transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In a possible implementation manner, when the PE array is used to perform a preset operation on a first operand and a second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle, the PE array is configured to:
in the corresponding processing cycle, performing weighted summation on each row of target characteristic elements in the first operand and each row of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
In one possible embodiment, each PE of the array of PEs includes a repeating feature element;
the PE array, in response to any processing cycle coming, when each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, is configured to:
responding to any processing period, transmitting a target feature element in the image feature matrix to be processed to one PE in the PE array, and copying the target image feature element in the one PE to other PEs as a first operand of the corresponding PE;
transmitting the weight elements of the weight matrix from all the PE numbers in the PE array corresponding to the target characteristic element to each PE of the PE array as a second operand of the corresponding PE;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In a possible embodiment, when the PE array obtains result data for processing the to-be-processed image feature matrix based on intermediate processing data corresponding to a plurality of processing cycles, respectively, the PE array is configured to:
accumulating intermediate subdata belonging to the same weight matrix in intermediate processing data respectively corresponding to a plurality of processing periods to obtain sub-result data corresponding to each weight matrix;
and obtaining result data for processing the characteristic matrix of the image to be processed based on the sub-result data respectively corresponding to the plurality of weight matrixes.
In a possible embodiment, the preset operation corresponding to any processing cycle includes: and performing sub-operation of full connection operation on the image feature matrix to be processed.
In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory, and a data processing apparatus as described in the second aspect.
In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.
For the description of the effects of the data device, the computer apparatus, and the computer-readable storage medium, reference is made to the description of the data processing method, which is not repeated herein.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a flow chart of a data processing method provided by an embodiment of the present disclosure;
fig. 2 illustrates an example of processing a target feature element by using a PE matrix in a data processing method provided by an embodiment of the present disclosure;
fig. 3 illustrates another example of processing a target feature element by using a PE matrix in the data processing method provided by the embodiment of the present disclosure;
fig. 4 illustrates another example of processing a target feature element by using a PE matrix in the data processing method provided by the embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure;
fig. 6 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It has been found through research that the hardware architecture of an Artificial Intelligence (AI) accelerator commonly used at present mainly includes a storage unit, a computing unit, a control unit, and the like, wherein the computing unit of the core is generally composed of a Processing Engine (PE) array and a register array (local register file). In the full-connection processing in the neural network, it is generally necessary to perform weighted summation processing on the feature values of each feature point in the feature map by using multiple sets of full-connection weights, so as to obtain a full-connection processing result of the feature map. When the AI accelerator is used to perform full join processing on image data, at least part of feature elements in a feature map need to be read into a PE array in each of a plurality of processing cycles, and a weight element corresponding to the feature element read into the PE array in one full join weight needs to be read into the PE array, and the PE array performs weighted summation processing on the read feature elements and weight elements. And processing in a plurality of processing cycles to obtain a result of performing full connection processing on the characteristic diagram. However, as the bandwidth of the PE array to read data is limited, for a PE array with a size of m × n, m × n feature elements need to be read in each processing cycle, and corresponding m × n weight elements need to be read in, so that the efficiency of data reading is low, which causes an increase in time consumption required for fully-connecting the feature map, and further causes a problem of low processing efficiency of the PE array.
Based on the above research, the present disclosure provides a data processing method, which reduces the data amount of data that needs to be read into the PE array in each processing cycle by multiplexing the feature elements transmitted to the PE array in each processing cycle, reduces the time consumption required for reading the data into the PE array, and improves the processing efficiency of the PE array.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The following describes a data processing method provided by an embodiment of the present disclosure.
Referring to fig. 1, a flowchart of a data processing method provided in an embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:
s101: determining target characteristic elements and target weight elements corresponding to a plurality of processing cycles respectively from the characteristic matrix and the weight matrix of the image to be processed; the characteristic matrix of the image to be processed corresponds to a plurality of weight matrixes;
s102: in response to any processing period coming, each PE in the processing engine PE array acquires a target characteristic element corresponding to the processing period and a corresponding target weight element and performs preset operation to obtain intermediate processing data;
for any processing cycle, the target characteristic elements in the PE array comprise repeated characteristic elements, and the repeated characteristic elements respectively correspond to target weight elements corresponding to the repeated characteristic elements in different weight matrixes;
s103: and obtaining result data for processing the characteristic matrix of the image to be processed based on the intermediate processing data respectively corresponding to the plurality of processing cycles.
In the method, the target feature elements and the target weight elements corresponding to a plurality of processing cycles are determined from the image feature matrix to be processed and the weight matrix, each PE in the PE array acquires the target feature elements and the corresponding target weight elements corresponding to the processing cycles and performs preset operation in response to the arrival of any processing cycle, and result data for processing the image feature matrix to be processed is obtained based on intermediate processing data corresponding to the plurality of processing cycles.
The following describes details of S101 to S103.
In the above S101, when the target feature element and the target weight element corresponding to each of the plurality of processing cycles are determined from the to-be-processed image feature matrix and the weight matrix, for example, the number of required processing cycles may be determined based on the number of the to-be-processed image feature matrix and the weight matrix, and then the target feature element and the target weight element may be determined for each processing cycle.
Illustratively, the number of target feature elements determined for each processing cycle is less than the number of PEs in the PE array; the target feature elements determined for each processing cycle are transferred from the external memory to a part of the PEs in the PE array, and the data required by other PEs in the PE array is a duplicate feature element that multiplexes the target feature elements transferred to the PE array, so that the target feature elements need not be transferred externally but copied from the PEs storing them.
In practical application, the feature map that needs to be subjected to full join operation is usually a feature map output by a convolutional layer, and may include a plurality of feature maps, where the feature matrix of the image to be processed may be one of the plurality of feature maps, or may also be a combined feature map combined by a plurality of feature maps, and for a case of one of the plurality of feature maps, the data processing method provided by the present disclosure may be respectively executed for each feature map, and then the data obtained by each execution is combined to obtain a final full join operation result.
In a possible implementation manner, the target feature elements respectively corresponding to a plurality of processing cycles include at least one image feature element in the to-be-processed image feature matrix;
and the target weight elements respectively corresponding to the processing periods comprise the weight elements corresponding to the positions of the target characteristic elements processed by the corresponding processing periods in at least part of the weight matrixes.
In the embodiment of the present disclosure, for convenience of processing, for example, before determining the target feature element and the target weight element corresponding to each processing cycle, size transformation may be performed on the original image feature matrix to be processed and the original weight matrix based on the size of the PE array, so as to obtain the image feature matrix to be processed and the weight matrix.
For example, if the size of the original to-be-processed image feature matrix is M × N × S, the size of the corresponding original weight matrix is also M × N × S. If the size of the PE array is a × a, when performing size transformation on the original to-be-processed image feature matrix, the size of the to-be-processed image feature matrix obtained is: a × W, wherein W = (M × N × S)/(a × a). The size of each weight matrix in the plurality of weight matrices is also a x a W.
In addition, if the size of one feature subgraph in the feature matrix of the original image to be processed is smaller than or equal to the size of the PE array, the feature matrix of the original image to be processed can be subjected to size transformation, or the feature matrix of the original image to be processed is not subjected to size transformation. Under the condition of not carrying out size transformation on the original image feature matrix to be processed, if the size of the feature subgraph is smaller than that of the PE array, only a part of PEs in the PE array are used in the process of processing by using the PE array, and the PEs are not used completely.
After the target feature elements and the target weight elements corresponding to each processing cycle are determined, that is, after any processing cycle arrives, the target feature elements and the target weight elements corresponding to the processing cycle are subjected to preset processing to obtain intermediate processing data.
Here, the preset processing performed on the target feature element and the target weight element corresponding to the processing cycle includes, for example: and performing sub-operation of full connection operation on the image feature matrix to be processed.
Here, the sub-operations performed are, for example, sub-operations corresponding to the weight matrices.
Illustratively, the feature matrix of the image to be processed is represented as
Figure BDA0002954996680000101
The weight matrix is represented as:
Figure BDA0002954996680000102
where i represents the ith weight matrix. Taking the first set of weight parameters W1 as an example, 16 weight data exist, which are denoted as W1_1, W1_2, W1_3, … …, W1_16 for convenience of description, and can form a weight matrix
Figure BDA0002954996680000103
When the weight matrix is used to perform full join operation on the feature matrix of the image to be processed, the full join operation corresponding to the ith weight matrix can be represented as:
Figure BDA0002954996680000104
the sub-operations corresponding to the weight matrix include, for example:
O1 1 =a1×wi_1+a2×wi_2+a3×wi_3+a4×wi_4;
O1 2 =a5×wi_5+a6×wi_6+a7×wi_7+a8×wi_8;
O1 3 =a9×wi_9+a10×wi_10+a11×wi_11+a12×wi_12;
O1 4 =a13×wi_13+a14×wi_14+a15×wi_15+a16×wi_16。
wherein O1= O1 1 +O1 2 +O1 3 +O1 4
For the above S102 and S103: when each PE in the processing engine PE array acquires a target feature element and a corresponding target weight element corresponding to any processing cycle and performs a preset operation in response to the arrival of the processing cycle, for example: the execution is that after any processing cycle comes, the target characteristic element and the target weight element corresponding to the processing cycle are read from the external memory and stored in part of PEs in the PE array, then the target characteristic element is copied to other PEs in the PE array to form a first operand, then the read target weight element is used as a second operand, and the first operand and the second operand are used for carrying out preset operation to obtain intermediate processing data corresponding to the processing cycle.
The embodiment of the present disclosure takes the example that the size of the feature matrix of the image to be processed is consistent with the size of the PE array, and details of the processing in each processing cycle are described. Here, the following examples (1) to (3) are only examples of determining the target feature element and storing the target feature element in the PE array, and may also determine the target feature element in other manners, and when determining the target feature element, the target feature element may not be determined according to the order of the feature elements, as long as it is ensured that each feature element in the to-be-processed image feature matrix is processed by using the corresponding weight element in the N weight matrices, and the finally obtained processing result corresponding to each weight matrix is the result of weighted summation of all the feature elements in the to-be-processed image feature matrix and the weight matrix.
(1) Each row of the PE array includes repeating feature elements. In this case, in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element and a corresponding target weight element corresponding to the processing cycle and performs a preset operation to obtain intermediate processing data, including:
responding to any processing cycle, transmitting the number of target characteristic elements of the PE array row in the image characteristic matrix to be processed to one column of PE in the PE array, and copying the target characteristic elements in the one column of PE to the PEs in other columns as a first operand of the corresponding PE; transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs; and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
When the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle, for example, the following manner may be adopted: in the corresponding processing cycle, performing weighted summation on each column of target characteristic elements in the first operand and each column of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes; and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
In this case, the target feature elements determined for a plurality of processing cycles include, for example, a row of feature elements in a feature matrix of the image to be processed. Since the number of the feature elements in one row in the feature matrix of the image to be processed is the same as the number of columns in the PE array, when reading a row of feature elements into the PE array, for example, a row of feature elements is read into a column of PEs in the PE array.
For example, a feature matrix of the image to be processed
Figure BDA0002954996680000111
And a weight matrix
Figure BDA0002954996680000121
For example, i is an integer from 1 to 4.
In a first processing period, image feature elements a1, a2, a3 and a4 included in a first row in the image feature matrix to be processed are taken as target feature elements and are copied in other rows to obtain image feature elements in the first processing periodI.e. may form a matrix
Figure BDA0002954996680000122
The corresponding weight data may form a matrix
Figure BDA0002954996680000123
I.e. the second operand.
When the first operand and the second operand are stored in the PE array, as shown in fig. 2 (a), PE1, PE2, PE3, and PE4 all store a1, and respectively store the weight values corresponding to a1 in W1 to W4: w1_1, w2_1, w3_1, w4_1.
Similarly, PE5 to PE8 each store a2, and store the weight values corresponding to a2 in W1 to W4, respectively: w1_2, w2_2, w3_2, w4_2.
PE9 to PE12 store a3, and respectively store the weight values corresponding to a3 in W1 to W4: w1_3, w2_3, w3_3, w4_3.
PE13 to PE16 each store a4, and respectively store the weight values corresponding to a4 in W1 to W4: w1_4, w2_4, w3_4, w4_4.
Then, taking W1_1, W1_2, W1_3 and W1_4 as weights, the first column a1, a2, a3 and a4 are weighted and summed to obtain intermediate data O1 corresponding to the weight matrix W1 1
Taking W2_1, W2_2, W2_3 and W2_4 as weights, carrying out weighted summation on the second columns a1, a2, a3 and a4 to obtain intermediate data O2 corresponding to the weight matrix W2 1
Taking W3_1, W3_2, W3_3 and W3_4 as weights, carrying out weighted summation on the third columns a1, a2, a3 and a4 to obtain intermediate data O3 corresponding to the weight matrix W3 1
Taking W4_1, W4_2, W4_3 and W4_4 as weights, carrying out weighted summation on the fourth columns a1, a2, a3 and a4 to obtain intermediate data O4 corresponding to the weight matrix W4 1
Then combined with O1 1 、O2 1 、O3 1 And O4 1 And the intermediate processing data corresponding to the first processing cycle is used.
In addition, in the second processing cycle, the image feature elements a5, a6, a7, a8 included in the second row in the image feature matrix to be processed are taken as target feature elements and are copied in other rows to obtain the first operand in the first processing cycle, that is, the first operand can be formed into a matrix
Figure BDA0002954996680000131
The corresponding weight data may form a matrix
Figure BDA0002954996680000132
I.e. the second operand.
When the first operand and the second operand are stored in the PE array, as shown in fig. 2 (b), PE1, PE2, PE3, and PE4 all store a5, and respectively store the weight values corresponding to a5 in W1 to W4: w1_5, w2_5, w3_5, w4_5.
Similarly, PE5 to PE8 each store a6, and store the weight values corresponding to a6 in W1 to W4, respectively: w1_6, w2_6, w3_6, w4_6.
PE9 to PE12 store a7, and respectively store weight values corresponding to a7 in W1 to W4: w1_7, w2_7, w3_7, w4_7.
PE13 to PE16 each store a8, and respectively store weight values corresponding to a8 in W1 to W4: w1_8, w2_8, w3_8, w4_8.
Then, taking W1_5, W1_6, W1_7 and W1_8 as weights, the first columns a5, a6, a7 and a8 are subjected to weighted summation to obtain intermediate data O1 corresponding to the weight matrix W1 2
Taking W2_5, W2_6, W2_7 and W2_8 as weights, carrying out weighted summation on the second columns a5, a6, a7 and a8 to obtain intermediate data O2 corresponding to the weight matrix W2 2
Taking W3_5, W3_6, W3_7 and W3_8 as weights, and performing weighted summation on the third columns a5, a6, a7 and a8 to obtain intermediate data O3 corresponding to the weight matrix W3 2
Taking w4_5, w4_6, w4_7 and w4_8 as weights, carrying out weighted summation on the fourth columns a5, a6, a7 and a8,obtaining intermediate data O4 corresponding to the weight matrix W4 2
Then combined with O1 2 、O2 2 、O3 2 And O4 2 And the intermediate processing data corresponding to the processing cycle is used.
……
In the third processing cycle, in a similar manner, intermediate data O1 generated by W1_9, W1_10, W1_11 and W1_12 and corresponding a9, a10, a11 and a12 in the weight matrix W1 is obtained 3 And intermediate data O2 generated by W2_9, W2_10, W2_11, W2_12 and corresponding a9, a10, a11, and a12 in the weight matrix W2 3 And intermediate data O3 generated by W3_9, W3_10, W3_11, and W3_12 and corresponding a9, a10, a11, and a12 in the weight matrix W3 3 W4_9, W4_10, W4_11, and W4_12 in the weight matrix W4, and the intermediate data O4 generated by the corresponding a9, a10, a11, and a12 3
In a similar manner, in the fourth processing cycle, intermediate data O1 generated by W1_13, W1_14, W1_15 and W1_16 and corresponding a13, a14, a15 and a16 in the weight matrix W1 is obtained 4 And intermediate data O2 generated by W2_13, W2_14, W2_15, and W2_16 and corresponding a13, a14, a15, and a16 in the weight matrix W2 4 And corresponding intermediate data O3 generated by W3_13, W3_14, W3_15, and W3_16 and corresponding a13, a14, a15, and a16 in the weight matrix W3 4 W4_13, W4_14, W4_15, and W4_16 in the weight matrix W4 and corresponding intermediate data O4 generated by corresponding a13, a14, a15, and a16 4
Completing the feature matrix of the image to be processed by using the weight matrixes W1, W2, W3 and W4 after 4 processing cycles
Figure BDA0002954996680000141
Then O1 is added 1 、O1 2 、O1 3 、O1 4 And adding to obtain a result value O1 corresponding to the weight matrix W1. Mixing O2 1 、O2 2 、O2 3 And O2 4 The result values O2 corresponding to the weight matrix W2 are obtained by addition, and the result value O3 corresponding to the weight matrix W3 and the result value O4 corresponding to the weight matrix W4 are obtained in a similar manner. If it isIf the weighting matrix is more than W1, W2, W3, and W4, similar processing is performed using other weighting matrices. And finally, combining the result values corresponding to all the weight matrixes to obtain processing result data of the image characteristic matrix to be processed, namely a full-connection operation result.
It should be noted here that O2 is added 1 、O2 2 、O2 3 And O2 4 When added, O2 may be added after all processing cycles have been performed 1 、O2 2 、O2 3 And O2 4 And (4) adding.
The result value corresponding to each weight matrix obtained in the present cycle may be added to the result value and the value of all cycles obtained in the previous cycle in each processing cycle except the first processing cycle. Then in the last processing cycle, O2 can be directly output 1 、O2 2 、O2 3 And O2 4 The result of the addition.
For example, after the first processing cycle is over, the PE will obtain O2 1 Storing the data into a register; in the second treatment cycle, O2 is obtained 2 Then, read O2 from the register 1 Introducing O2 1 And O2 2 Adding to obtain a sum O2 1 +O2 2 And stores the result value and the value in a register. In the third treatment cycle, O2 is obtained 3 Then, the result value and the value O2 obtained in the second processing cycle are fetched from the register 1 +O2 2 And 2 is mixed with O 1 +O2 2 And O2 3 Adding to obtain the result value and O2 corresponding to the third processing period 1 +O2 2 +O2 3 … …, and thus, the final processing cycle can obtain the result O2 corresponding to W1.
(2) Each column of the PE array includes repeating feature elements. In this case, in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element and a corresponding target weight element corresponding to the processing cycle and performs a preset operation to obtain intermediate processing data, including: responding to any processing cycle, transmitting the target characteristic elements of the PE array column number in the image characteristic matrix to be processed to a row of PE in the PE array, and copying the target characteristic elements in the row of PE to the PEs in other rows as a first operand of the corresponding PE; transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PE to the PE corresponding to the position of the target characteristic element in the PE array as a second operand of the corresponding PE; and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
When the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle, for example, the following manner may be adopted: in the corresponding processing cycle, performing weighted summation on each row of target characteristic elements in the first operand and each row of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
In this case, the target feature elements determined for a plurality of processing cycles include, for example, a column of feature elements in a feature matrix of the image to be processed. Since the number of a column of feature elements in the feature matrix of the image to be processed is the same as the number of rows in the PE array, when a column of feature elements is read into the PE array, for example, a column of feature elements is read into a row of PEs in the PE array.
For example, a feature matrix of the image to be processed
Figure BDA0002954996680000151
And a weight matrix
Figure BDA0002954996680000152
For example, i is an integer from 1 to 4.
In the first processing period, the first column in the image feature matrix to be processed is packedThe included image feature elements a1, a5, a9, a13 are used as target feature elements, the target feature elements are stored in the first row of PEs and are copied in other rows to obtain a first operand of the PE array in the first processing cycle, namely the first operand can form a matrix
Figure BDA0002954996680000153
The corresponding weight data may form a matrix
Figure BDA0002954996680000161
I.e. the second operand.
When the first operand and the second operand are stored in the PE array, as shown in fig. 3 (a), PE1, PE2, PE3, PE4 store a1, a5, a9, a13, respectively, and store the weight values corresponding to a1, a5, a9, a13, respectively, in W1: w1_1, w1_5, w1_9, w1_13.
Similarly, PE5 to PE8 store a1, a5, a9, a13, respectively, and store weight values corresponding to a1, a5, a9, a13, respectively, in W2: w2_1, w2_5, w2_9, w2_13.
PE9 to PE12 each store a1, a5, a9, a13, respectively, and store a weight value corresponding to a1, a5, a9, a13, respectively, in W3: w3_1, w3_5, w3_9, w3_13.
PE13 to PE16 store a1, a5, a9, and a13, respectively, and store the weight values corresponding to a1, a5, a9, and a13, respectively, in W4: w4_1, w4_5, w4_9, w4_13.
Then, taking W1_1, W1_5, W1_9 and W1_13 as weights, the first rows a1, a5, a9 and a13 are subjected to weighted summation to obtain intermediate data O1 corresponding to the weight matrix W1 1
Taking W2_1, W2_5, W2_9 and W2_13 as weights, carrying out weighted summation on the second rows a1, a5, a9 and a13 to obtain intermediate data O2 corresponding to the weight matrix W2 1
Taking W3_1, W3_5, W3_9 and W3_13 as weights, carrying out weighted summation on the third row a1, a5, a9 and a13 to obtain intermediate data O3 corresponding to the weight matrix W3 1
With w4_1, w4_5, w4_9, w4 \ u 13 is a weight, and the fourth row a1, a5, a9 and a13 are subjected to weighted summation to obtain intermediate data O4 corresponding to the weight matrix W4 1
Then combined with O1 1 、O2 1 、O3 1 And O4 1 And the intermediate processing data corresponding to the first processing cycle is used.
In addition, in the second processing cycle, the image feature elements a2, a6, a10, a14 included in the second column in the image feature matrix to be processed are taken as target feature elements and copied in other columns to obtain the first operand of the PE array in the second processing cycle, that is, the first operand can be formed into a matrix
Figure BDA0002954996680000162
The corresponding weight data may form a matrix
Figure BDA0002954996680000163
When the first operand and the second operand are stored in the PE array, as shown in fig. 3 (b), PE1, PE2, PE3, PE4 store a2, a6, a10, a14, respectively, and store the weight values corresponding to a2, a6, a10, a14, respectively, in W1: w1_2, w1_6, w1_10, w1_14.
Similarly, PE5 to PE8 store a2, a6, a10, a14, respectively, and store weight values corresponding to a2, a6, a10, a14, respectively, in W2: w2_1, w2_5, w2_9, w2_13.
PE9 to PE12 each store a2, a6, a10, a14, respectively, and store the weight values corresponding to a2, a6, a10, a14, respectively, in W3: w3_2, w3_6, w3_10, w3_14.
PE13 to PE16 store a2, a6, a10, and a14, respectively, and store weight values corresponding to a2, a6, a10, and a14, respectively, in W4: w4_2, w4_6, w4_10, w4_14.
Then, taking W1_2, W1_6, W1_10 and W1_14 as weights, the first row a2, a6, a10 and a14 are weighted and summed to obtain intermediate data O1 corresponding to the weight matrix W1 2
Taking w2_2, w2_6, w2_10 and w2_14 as weights, andthe second row a2, a6, a10 and a14 are subjected to weighted summation to obtain intermediate data O2 corresponding to the weight matrix W2 2
Taking W3_2, W3_6, W3_10 and W3_14 as weights, carrying out weighted summation on the third row a2, a6, a10 and a14 to obtain intermediate data O3 corresponding to the weight matrix W3 2
Taking W4_2, W4_6, W4_10 and W4_14 as weights, carrying out weighted summation on the fourth rows a2, a6, a10 and a14 to obtain intermediate data O4 corresponding to the weight matrix W4 2
Then combined with O1 2 、O2 2 、O3 2 And O4 2 And the intermediate processing data corresponding to the processing cycle is used.
……
In the third processing cycle, in a similar manner, intermediate data O1 generated by W1_3, W1_7, W1_11 and W1_15 and corresponding a3, a7, a11 and a15 in the weight matrix W1 is obtained 3 And intermediate data O2 generated by W2_3, W2_7, W2_11, and W2_15 and corresponding a3, a7, a11, and a15 in the weight matrix W2 3 W3_3, W3_7, W3_11, and W3_15 in the weight matrix W3, and intermediate data O3 generated by corresponding a3, a7, a11, and a15 3 W4_3, W4_7, W4_11, and W4_15 in the weight matrix W4, and the intermediate data O4 generated by the corresponding a3, a7, a11, and a15 3
In a similar manner, in the fourth processing cycle, intermediate data O1 generated by W1_4, W1_8, W1_12 and W1_16 and corresponding a4, a8, a12 and a16 in the weight matrix W1 is obtained 4 And intermediate data O2 generated by W2_4, W2_8, W2_12 and W2_16 and corresponding a4, a8, a12 and a16 in the weight matrix W2 4 And intermediate data O3 generated by W3_4, W3_8, W3_12, and W3_16 and corresponding a4, a8, a12, and a16 in the weight matrix W3 4 W4_4, W4_8, W4_12, and W4_16 in the weight matrix W4, and the intermediate data O4 generated by the corresponding a4, a8, a12, and a16 4
Completing the feature matrix of the image to be processed by using the weight matrixes W1, W2, W3 and W4 after 4 processing cycles
Figure BDA0002954996680000181
Then O1 is added 1 、O1 2 、O1 3 、O1 4 And adding to obtain a result value O1 corresponding to the weight matrix W1. Mixing O2 1 、O2 2 、O2 3 And O2 4 And adding to obtain a result value O2 corresponding to the weight matrix W2, obtaining a result value O3 corresponding to the weight matrix W3 and a result value O4 corresponding to the weight matrix W4 in a similar manner, and combining O1, O2, O3 and O4 to be used as a result of processing the image feature matrix to be processed if the weight matrix only has W1, W2, W3 and W4. If the weight matrix is more than W1, W2, W3, and W4, similar processing is performed using other weight matrix. And finally, combining the result values corresponding to all the weight matrixes to obtain the processing result data of the image feature matrix to be processed, namely the full-connection operation result. Here, the process of adding the result values corresponding to the same weight matrix in different processing cycles is similar to that in the above-mentioned (1), and is not described herein again.
(3) Each PE of the array of PEs includes a repeating feature element. In this case, in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element and a corresponding target weight element corresponding to the processing cycle and performs a preset operation to obtain intermediate processing data, including:
responding to any processing period, transmitting a target feature element in the image feature matrix to be processed to one PE in the PE array, and copying the target feature element in the one PE to other PEs as a first operand of the corresponding PE;
transmitting the weight elements of the weight matrix from all the PE numbers in the PE array corresponding to the target characteristic element to each PE in the PE array as a second operand of the corresponding PE;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In this case, the target feature element determined for a plurality of processing cycles, for example, includes one feature element in the image feature to be processed, and in the corresponding processing cycle, the one feature element is read into one PE in the PE array and copied by the one PE to the other PE.
For example, a feature matrix of the image to be processed
Figure BDA0002954996680000182
And a weight matrix
Figure BDA0002954996680000183
For example, i is an integer from 1 to 16.
In a first processing cycle, one image characteristic element a1 in the image characteristic matrix to be processed is taken as a target characteristic element, the target characteristic element is stored in a first PE, and the target characteristic element is copied to other PEs to obtain a first operand of the PE array in the first processing cycle, namely the first operand can form a matrix
Figure BDA0002954996680000191
The corresponding weight data may form a matrix
Figure BDA0002954996680000192
I.e. the second operand.
When the first operand and the second operand are stored in the PE array, as shown in fig. 4 (a), PE1 to PE16 all store a1, and respectively store the weight values corresponding to a1 in W1 to W16: w1_1 to w16_1.
Then, the product of W1_1 and a1 is calculated to obtain intermediate data O1 corresponding to the weight matrix W1 1
Calculating the product of W2_1 and a1 to obtain intermediate data O2 corresponding to the weight matrix W2 1
……
Calculating the product of the calculation W16_1 and the calculation a1 to obtain intermediate data O16 corresponding to the weight matrix W16 1
In the second processing cycle, a2 is taken as the target of the second processing cycleAnd the first operand of the PE array in the second processing cycle is formed by the following elements:
Figure BDA0002954996680000193
the corresponding weight data may form a matrix
Figure BDA0002954996680000194
I.e. the second operand.
When the first operand and the second operand are stored in the PE array, as shown in fig. 4 (b), PE1 to PE16 each store a16, and store the weight values corresponding to a16 in W1 to W16, respectively: w1_16 to w16_16.
Then, the product of W1_2 and a2 is calculated to obtain intermediate data O1 corresponding to the weight matrix W1 2
Calculating the product of W2_2 and a2 to obtain intermediate data O2 corresponding to the weight matrix W2 2
……
Calculating the product of the calculation W16_2 and the calculation a2 to obtain intermediate data O16 corresponding to the weight matrix W16 2
……
In the 16 th processing cycle, taking a16 as a target characteristic element of the PE array in the 16 th processing cycle, the first operand forming the 16 th processing cycle is:
Figure BDA0002954996680000201
the corresponding weight data may form a matrix
Figure BDA0002954996680000202
I.e. the second operand.
Then, the product of W1_16 and a16 is calculated to obtain intermediate data O1 corresponding to the weight matrix W1 16
Calculating the product of W2_16 and a16 to obtain intermediate data O2 corresponding to the weight matrix W2 16
……
Calculate the product of the calculations w16_16 and a16Obtaining intermediate data O16 corresponding to the weight matrix W16 16
After 16 processing cycles, completing the processing of the image feature matrix to be processed by using the weight matrixes W1, W2, W3 and W4
Figure BDA0002954996680000203
Then O1 is added 1 ~O1 16 Adding to obtain a result value O1 corresponding to the weight matrix W1; mixing O2 1 ~O2 16 Adding to obtain a result value O2 corresponding to the weight matrix W2; … …; to be O16 1 ~O16 16 The sum is added to obtain a result value O16 corresponding to the weight matrix W16. And finally, combining result values O1-O16 corresponding to the 16 weight matrixes to obtain processing result data of the image feature matrix to be processed, namely a full-connection operation result.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, a data processing apparatus corresponding to the data processing method is also provided in the embodiments of the present disclosure, and because the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the data processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
Referring to fig. 5, a schematic diagram of a data processing apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: the method comprises the following steps: a controller 51 and a PE array 52;
the controller 51 is configured to determine target feature elements and target weight elements corresponding to a plurality of processing cycles from the feature matrix and the weight matrix of the image to be processed; the characteristic matrix of the image to be processed corresponds to a plurality of weight matrixes;
the PE array 52 is configured to respond to an arrival of any processing cycle, and each PE in the PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data; obtaining result data for processing the characteristic matrix of the image to be processed based on intermediate processing data respectively corresponding to a plurality of processing cycles;
for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the repeated feature elements respectively correspond to target weight elements corresponding to the repeated feature elements in different weight matrices.
The data processing device provided by the embodiment of the disclosure may include a chip, an AI chip, and the like. The computer device provided by the embodiment of the present disclosure may include an intelligent terminal such as a mobile phone, or may also be other devices, servers, and the like that may be used for data processing, and is not limited herein.
In one possible embodiment, before determining the target feature elements and the target weight elements corresponding to the plurality of processing cycles, the controller 51 is further configured to:
and carrying out size transformation on the original image feature matrix to be processed and the original weight matrix based on the size of the PE array to obtain the image feature matrix to be processed and the weight matrix.
In a possible implementation manner, the target feature elements respectively corresponding to the plurality of processing cycles include at least one image feature element in the to-be-processed image feature matrix;
and the target weight elements respectively corresponding to the processing periods comprise the weight elements corresponding to the positions of the target characteristic elements processed by the corresponding processing periods in at least part of the weight matrixes.
In one possible embodiment, each row of the PE array includes a repeating feature element;
in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element, and performs a preset operation to obtain intermediate processing data, where the PE array 52 is configured to:
responding to any processing cycle, transmitting the number of target characteristic elements of the PE array row in the image characteristic matrix to be processed to one column of PE in the PE array, and copying the target characteristic elements in the one column of PE to the PEs in other columns as a first operand of the corresponding PE; and are
Transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In a possible implementation manner, when the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle, the PE array 52 is configured to:
in the corresponding processing cycle, performing weighted summation on each column of target characteristic elements in the first operand and each column of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
In one possible embodiment, each column of the PE array includes a repeating feature element;
in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element, and performs a preset operation to obtain intermediate processing data, where the PE array 52 is configured to:
responding to any processing cycle, transmitting the PE array column number of target characteristic elements in the image characteristic matrix to be processed to a row of PE in the PE array, and copying the target characteristic elements in the row of PE to the PE in other rows as a first operand of the corresponding PE; and are
Transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In a possible implementation manner, when the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array to obtain intermediate processing data corresponding to the corresponding processing cycle, the PE array 52 is configured to:
in the corresponding processing cycle, performing weighted summation on each row of target characteristic elements in the first operand and each row of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
In one possible embodiment, each PE of the PE array includes a repeating feature element;
in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element, and performs a preset operation to obtain intermediate processing data, where the PE array 52 is configured to:
responding to any processing period, transmitting a target feature element in the image feature matrix to be processed to one PE in the PE array, and copying the target image feature element in the one PE to other PEs as a first operand of the corresponding PE;
transmitting the weight elements of the weight matrix from all the PE numbers in the PE array corresponding to the target characteristic element to each PE in the PE array as a second operand of the corresponding PE;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
In a possible embodiment, when obtaining result data for processing the to-be-processed image feature matrix based on intermediate processing data corresponding to a plurality of processing cycles, the PE array 52 is configured to:
accumulating intermediate subdata belonging to the same weight matrix in intermediate processing data respectively corresponding to a plurality of processing cycles to obtain sub-result data corresponding to each weight matrix;
and obtaining result data for processing the characteristic matrix of the image to be processed based on the sub-result data respectively corresponding to the plurality of weight matrixes.
In a possible embodiment, the preset operation corresponding to any processing cycle includes: and performing sub-operation of full connection operation on the image feature matrix to be processed.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
An embodiment of the present disclosure further provides a computer device, as shown in fig. 6, which is a schematic structural diagram of the computer device provided in the embodiment of the present disclosure, and the computer device includes:
a processor 61, a memory 62 and a data processing device 63 as provided by the present disclosure.
The memory 62 includes a memory 621 and an external memory 622; the memory 621 is also referred to as an internal memory, and temporarily stores operation data in the processor 61 and data exchanged with the external memory 622 such as a hard disk, and the processor 61 exchanges data with the external memory 622 via the memory 621.
For the specific execution process of the instruction, reference may be made to the steps of the data processing method described in the embodiments of the present disclosure, and details are not described here.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the data processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the data processing method in the foregoing method embodiments, which may be referred to specifically in the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solutions of the present disclosure, which are essential or part of the technical solutions contributing to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (20)

1. A data processing method, comprising:
determining target characteristic elements and target weight elements corresponding to a plurality of processing cycles respectively from the characteristic matrix and the weight matrix of the image to be processed; the characteristic matrix of the image to be processed corresponds to a plurality of weight matrixes;
in response to any processing period coming, each PE in the processing engine PE array acquires a target characteristic element corresponding to the processing period and a corresponding target weight element and performs preset operation to obtain intermediate processing data; wherein, when each row of the PE array includes a repeated feature element, in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, including: responding to any processing cycle, transmitting the number of target characteristic elements of the PE array row in the image characteristic matrix to be processed to one column of PE in the PE array, and copying the target characteristic elements in the one column of PE to the PEs in other columns as a first operand of the corresponding PE; transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs; performing preset operation on a first operand and a second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to a corresponding processing period;
for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the repeated feature elements respectively correspond to target weight elements corresponding to the repeated feature elements in different weight matrices;
and obtaining result data for processing the characteristic matrix of the image to be processed based on the intermediate processing data respectively corresponding to the plurality of processing cycles.
2. The data processing method of claim 1, wherein before determining the target feature element and the target weight element corresponding to each of the plurality of processing cycles, the method further comprises:
and carrying out size transformation on the original image feature matrix to be processed and the original weight matrix based on the size of the PE array to obtain the image feature matrix to be processed and the weight matrix.
3. The data processing method according to claim 1 or 2, wherein the target feature elements respectively corresponding to the plurality of processing cycles comprise at least one image feature element in the image feature matrix to be processed;
and the target weight elements respectively corresponding to the processing periods comprise the weight elements corresponding to the positions of the target characteristic elements processed by the corresponding processing periods in at least part of the weight matrixes.
4. The data processing method according to claim 1, wherein the performing a predetermined operation on the first operand and the second operand stored in the PE array by using the PE array to obtain the intermediate processing data corresponding to the corresponding processing cycle comprises:
in the corresponding processing cycle, performing weighted summation on each column of target characteristic elements in the first operand and each column of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
5. The data processing method according to claim 1 or 2, wherein each column of the PE array comprises repeating feature elements;
in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, including:
responding to any processing cycle, transmitting the PE array column number of target characteristic elements in the image characteristic matrix to be processed to a row of PE in the PE array, and copying the target characteristic elements in the row of PE to the PE in other rows as a first operand of the corresponding PE; and are
Transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
6. The data processing method according to claim 5, wherein the performing a predetermined operation on the first operand and the second operand stored in the PE array by using the PE array to obtain the intermediate processing data corresponding to the corresponding processing cycle comprises:
in a corresponding processing cycle, performing weighted summation on each row of target characteristic elements in the first operand and each row of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
7. The data processing method of claim 3, wherein each PE of the PE array comprises a repeating feature element;
in response to any processing cycle coming, each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, including:
responding to any processing period, transmitting a target feature element in the image feature matrix to be processed to one PE in the PE array, and copying the target image feature element in the one PE to other PEs as a first operand of the corresponding PE;
transmitting the weight elements of the weight matrix from all the PE numbers in the PE array corresponding to the target characteristic element to each PE in the PE array as a second operand of the corresponding PE;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
8. The data processing method according to claim 1 or 2, wherein the obtaining of result data for processing the to-be-processed image feature matrix based on intermediate processing data corresponding to a plurality of processing cycles, respectively, comprises:
accumulating intermediate subdata belonging to the same weight matrix in intermediate processing data respectively corresponding to a plurality of processing cycles to obtain sub-result data corresponding to each weight matrix;
and obtaining result data for processing the characteristic matrix of the image to be processed based on the sub-result data respectively corresponding to the plurality of weight matrixes.
9. The data processing method according to claim 1 or 2, wherein the predetermined operation corresponding to any processing cycle comprises: and performing sub-operation of full connection operation on the image feature matrix to be processed.
10. A data processing apparatus, comprising: a controller and a processing engine PE array;
the controller is used for determining target characteristic elements and target weight elements corresponding to a plurality of processing cycles from the characteristic matrix and the weight matrix of the image to be processed; the image feature matrix to be processed corresponds to a plurality of weight matrixes;
the PE array is used for responding to any processing period, and each PE in the PE array acquires a target characteristic element corresponding to the processing period and a corresponding target weight element and performs preset operation to obtain intermediate processing data; obtaining result data for processing the characteristic matrix of the image to be processed based on intermediate processing data respectively corresponding to a plurality of processing cycles; under the condition that each row of the PE array includes repeated feature elements, the PE array responds to any processing cycle coming, and each PE in the processing engine PE array acquires a target feature element corresponding to the processing cycle and a corresponding target weight element and performs preset operation, so as to obtain intermediate processing data, the processing engine PE array is configured to: responding to any processing cycle, transmitting the number of target characteristic elements of the PE array row in the image characteristic matrix to be processed to one column of PE in the PE array, and copying the target characteristic elements in the one column of PE to the PEs in other columns as a first operand of the corresponding PE; transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs; performing preset operation on a first operand and a second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to a corresponding processing period;
for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the repeated feature elements respectively correspond to target weight elements corresponding to the repeated feature elements in different weight matrices.
11. The data processing apparatus of claim 10, wherein the controller, prior to determining the target feature element and the target weight element corresponding to each of the plurality of processing cycles, is further configured to:
and carrying out size transformation on the original image feature matrix to be processed and the original weight matrix based on the size of the PE array to obtain the image feature matrix to be processed and the weight matrix.
12. The data processing apparatus according to claim 10 or 11, wherein the target feature elements respectively corresponding to the plurality of processing cycles include at least one image feature element in the image feature matrix to be processed;
and the target weight elements respectively corresponding to the processing periods comprise the weight elements corresponding to the positions of the target characteristic elements processed by the corresponding processing periods in at least part of the weight matrixes.
13. The data processing apparatus as claimed in claim 10, wherein the PE array, when performing the predetermined operation on the first operand and the second operand stored in the PE array by using the PE array to obtain the intermediate processing data corresponding to the corresponding processing cycle, is configured to:
in the corresponding processing cycle, performing weighted summation on each column of target characteristic elements in the first operand and each column of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
14. The data processing apparatus according to claim 10 or 11, wherein each column of the PE array comprises repeating feature elements;
the PE array, in response to any processing cycle coming, when each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, is configured to:
responding to any processing cycle, transmitting the target characteristic elements of the PE array column number in the image characteristic matrix to be processed to a row of PE in the PE array, and copying the target characteristic elements in the row of PE to the PEs in other rows as a first operand of the corresponding PE; and are
Transmitting the weight elements from different weight matrixes respectively corresponding to the target characteristic elements in each row of PEs to the PEs corresponding to the positions of the target characteristic elements in the PE array to serve as second operands of the corresponding PEs;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
15. The data processing apparatus according to claim 14, wherein the PE array, when performing the predetermined operation on the first operand and the second operand stored in the PE array by using the PE array to obtain the intermediate processing data corresponding to the corresponding processing cycle, is configured to:
in the corresponding processing cycle, performing weighted summation on each row of target characteristic elements in the first operand and each row of weight elements in the second operand to obtain intermediate subdata corresponding to the different weight matrixes;
and obtaining intermediate processing data corresponding to the corresponding processing period based on the intermediate data corresponding to the different weight matrixes.
16. The data processing apparatus according to claim 12, wherein each PE of the PE array comprises a repeating feature element;
the PE array, in response to any processing cycle coming, when each PE in the processing engine PE array obtains a target feature element corresponding to the processing cycle and a corresponding target weight element and performs a preset operation to obtain intermediate processing data, is configured to:
responding to any processing period, transmitting a target feature element in the image feature matrix to be processed to one PE in the PE array, and copying the target image feature element in the one PE to other PEs as a first operand of the corresponding PE;
transmitting the weight elements of the weight matrix from all the PE numbers in the PE array corresponding to the target characteristic element to each PE in the PE array as a second operand of the corresponding PE;
and performing preset operation on the first operand and the second operand stored in the PE array by using the PE array to obtain intermediate processing data corresponding to the corresponding processing period.
17. The data processing apparatus according to claim 10 or 11, wherein the PE array, when obtaining result data for processing the to-be-processed image feature matrix based on intermediate processing data corresponding to a plurality of the processing cycles, is configured to:
accumulating intermediate subdata belonging to the same weight matrix in intermediate processing data respectively corresponding to a plurality of processing cycles to obtain sub-result data corresponding to each weight matrix;
and obtaining result data for processing the characteristic matrix of the image to be processed based on the sub-result data respectively corresponding to the plurality of weight matrixes.
18. The data processing apparatus according to claim 10 or 11, wherein the predetermined operation corresponding to any processing cycle comprises: and performing sub-operation of full connection operation on the image feature matrix to be processed.
19. A computer device, comprising: processor, memory, and data processing apparatus according to any one of claims 10-18.
20. A computer-readable storage medium, having stored thereon a computer program for performing the steps of the data processing method according to any one of claims 1 to 9 when executed by a controller and the PE array.
CN202110221235.3A 2021-02-26 2021-02-26 Data processing method and device, computer equipment and storage medium Active CN112966729B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110221235.3A CN112966729B (en) 2021-02-26 2021-02-26 Data processing method and device, computer equipment and storage medium
PCT/CN2021/115789 WO2022179075A1 (en) 2021-02-26 2021-08-31 Data processing method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110221235.3A CN112966729B (en) 2021-02-26 2021-02-26 Data processing method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112966729A CN112966729A (en) 2021-06-15
CN112966729B true CN112966729B (en) 2023-01-31

Family

ID=76275794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110221235.3A Active CN112966729B (en) 2021-02-26 2021-02-26 Data processing method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112966729B (en)
WO (1) WO2022179075A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966729B (en) * 2021-02-26 2023-01-31 成都商汤科技有限公司 Data processing method and device, computer equipment and storage medium
CN113253336B (en) * 2021-07-02 2021-10-01 深圳市翩翩科技有限公司 Earthquake prediction method and system based on deep learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229645A (en) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 Convolution accelerates and computation processing method, device, electronic equipment and storage medium
CN108805275A (en) * 2017-06-16 2018-11-13 上海兆芯集成电路有限公司 Programmable device and its operating method and computer usable medium
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method
US10489479B1 (en) * 2016-09-12 2019-11-26 Habana Labs Ltd. Matrix multiplication engine
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN111095241A (en) * 2017-07-24 2020-05-01 特斯拉公司 Accelerated math engine
CN111414994A (en) * 2020-03-03 2020-07-14 哈尔滨工业大学 FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN111582467A (en) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 Artificial intelligence accelerator and electronic equipment
CN111897579A (en) * 2020-08-18 2020-11-06 腾讯科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium
WO2020264264A1 (en) * 2019-06-28 2020-12-30 Amazon Technologies, Inc. Dilated convolution using systolic array
CN112214727A (en) * 2017-07-07 2021-01-12 华为技术有限公司 Operation accelerator

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355246B (en) * 2015-10-08 2019-02-15 上海兆芯集成电路有限公司 Three configuration neural network units
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
US10726330B2 (en) * 2016-10-11 2020-07-28 The Research Foundation For The State University Of New York System, method, and accelerator to process convolutional neural network layers
US10515302B2 (en) * 2016-12-08 2019-12-24 Via Alliance Semiconductor Co., Ltd. Neural network unit with mixed data and weight size computation capability
CN108665059A (en) * 2018-05-22 2018-10-16 中国科学技术大学苏州研究院 Convolutional neural networks acceleration system based on field programmable gate array
CN110659445B (en) * 2018-06-29 2022-12-30 龙芯中科技术股份有限公司 Arithmetic device and processing method thereof
CN109740115A (en) * 2019-01-08 2019-05-10 郑州云海信息技术有限公司 A kind of method, device and equipment for realizing matrix multiplication operation
CN112149047A (en) * 2019-06-27 2020-12-29 深圳市中兴微电子技术有限公司 Data processing method and device, storage medium and electronic device
CN111967582B (en) * 2020-08-07 2022-07-08 苏州浪潮智能科技有限公司 CNN convolutional layer operation method and CNN convolutional layer operation accelerator
CN112966729B (en) * 2021-02-26 2023-01-31 成都商汤科技有限公司 Data processing method and device, computer equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489479B1 (en) * 2016-09-12 2019-11-26 Habana Labs Ltd. Matrix multiplication engine
CN108229645A (en) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 Convolution accelerates and computation processing method, device, electronic equipment and storage medium
CN108805275A (en) * 2017-06-16 2018-11-13 上海兆芯集成电路有限公司 Programmable device and its operating method and computer usable medium
CN112214727A (en) * 2017-07-07 2021-01-12 华为技术有限公司 Operation accelerator
CN111095241A (en) * 2017-07-24 2020-05-01 特斯拉公司 Accelerated math engine
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method
WO2020264264A1 (en) * 2019-06-28 2020-12-30 Amazon Technologies, Inc. Dilated convolution using systolic array
CN110705687A (en) * 2019-09-05 2020-01-17 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN111414994A (en) * 2020-03-03 2020-07-14 哈尔滨工业大学 FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN111582467A (en) * 2020-05-14 2020-08-25 上海商汤智能科技有限公司 Artificial intelligence accelerator and electronic equipment
CN111897579A (en) * 2020-08-18 2020-11-06 腾讯科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
"A Survey of Accelerator Architectures for Deep Neural Networks";YiranChen等;《Engineering》;20200331;第6卷(第3期);264-274 *
"High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands";Dibakar Gope等;《Machine Learning》;20200803;1-6 *
"Search-free Inference Acceleration for Sparse Convolutional Neural Networks";Bosheng Liu等;《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 》;20210101;1-6 *
"Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs";Caiwen Ding等;《arXiv:1804.11239v1 》;20180528;1-6 *
"Systolic Array Based Accelerator and Algorithm Mapping for Deep Learning Algorithms";Yang, Z 等;《NPC 2018: Network and Parallel Computing》;20181230;第 11276 卷;153–158 *
"分段式高精度隧道洞外亮度测量方法研究";常亮等;《电子产品可靠性与环境试验》;20190820;第37卷(第04期);65-72 *
"卷积神经网络加速器的实现与优化";孙凡;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20190115(第(2019)01期);I138-1895 *
"基于Spark平台的网络攻击检测系统";龚剑敏 等;《电脑知识与技术》;20210205;第17卷(第04期);44-45 *
"深度神经网络硬件加速研究";张祖扬;《中国优秀硕士学位论文全文数据库 (信息科技辑)》;20200115(第(2020)01期);I137-85 *

Also Published As

Publication number Publication date
CN112966729A (en) 2021-06-15
WO2022179075A1 (en) 2022-09-01

Similar Documents

Publication Publication Date Title
CN109003132B (en) Advertisement recommendation method and related product
US11341399B2 (en) Reducing power consumption in a neural network processor by skipping processing operations
EP3407266B1 (en) Artificial neural network calculating device and method for sparse connection
CN111310050B (en) Recommendation method based on multilayer attention
EP2877905A2 (en) Neural processing engine and architecture using the same
CN112966729B (en) Data processing method and device, computer equipment and storage medium
CN107944545B (en) Computing method and computing device applied to neural network
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN110738324A (en) Deep learning system and method for processing data for deep learning system
KR20190107766A (en) Computing device and method
JP2022541721A (en) Systems and methods that support alternate number formats for efficient multiplication
US11120328B1 (en) Systems and methods for reducing power consumption of convolution operations for artificial neural networks
CN111353598A (en) Neural network compression method, electronic device and computer readable medium
WO2021218037A1 (en) Target detection method and apparatus, computer device and storage medium
CN112967172A (en) Data processing device, method, computer equipment and storage medium
CN110490317B (en) Neural network operation device and operation method
CN114003198B (en) Inner product processing unit, arbitrary precision calculation device, method, and readable storage medium
JP7251354B2 (en) Information processing device, information processing program, and information processing method
CN114298329A (en) Model training method, device, equipment and storage medium
US20200409663A1 (en) Neural processing element with single instruction multiple data (simd) compute lanes
KR20210014897A (en) Matrix operator and matrix operation method for artificial neural network
CN112784206A (en) Winograd convolution operation method, device, equipment and storage medium
CN114692847B (en) Data processing circuit, data processing method and related products
CN111047024A (en) Computing device and related product
CN113469365B (en) Reasoning and compiling method based on neural network model and related products thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40049195

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant