WO2022179075A1 - Data processing method and apparatus, computer device and storage medium - Google Patents

Data processing method and apparatus, computer device and storage medium Download PDF

Info

Publication number
WO2022179075A1
WO2022179075A1 PCT/CN2021/115789 CN2021115789W WO2022179075A1 WO 2022179075 A1 WO2022179075 A1 WO 2022179075A1 CN 2021115789 W CN2021115789 W CN 2021115789W WO 2022179075 A1 WO2022179075 A1 WO 2022179075A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing
array
data
weight
elements
Prior art date
Application number
PCT/CN2021/115789
Other languages
French (fr)
Chinese (zh)
Inventor
周军
常亮
周亮
吴飞
Original Assignee
成都商汤科技有限公司
电子科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都商汤科技有限公司, 电子科技大学 filed Critical 成都商汤科技有限公司
Publication of WO2022179075A1 publication Critical patent/WO2022179075A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present disclosure relates to the field of computer technology, and in particular, to a data processing method, apparatus, computer device, and storage medium.
  • the method before the determining of the target feature elements and target weight elements corresponding to the multiple processing cycles respectively, the method further includes: based on the size of the PE array, performing the processing on the original to-be-processed image feature matrix and the original weight matrix. The size is transformed to obtain the feature matrix of the to-be-processed image and the weight matrix.
  • the use of the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain intermediate processing data corresponding to the processing cycle Including: in the processing cycle, weighted summation is performed on each row of target feature elements in the first operand and each row of weight elements in the second operand to obtain the different weight matrices Corresponding intermediate data; based on the intermediate data corresponding to the different weight matrices, obtain the intermediate processing data corresponding to the processing period.
  • the original to-be-processed image feature matrix may be size-transformed, or not.
  • the size of the feature map is smaller than the size of the PE array, then only a part of the PE in the PE array will be used during the process of using the PE array, while Not all will be used.
  • the fourth column a1, a2, a3, and a4 are weighted and summed to obtain the intermediate data O4 1 corresponding to the weight matrix W4.

Abstract

A data processing method and apparatus, a computer device and a storage medium. The method comprises: determining, from an image feature matrix to be processed and weight matrices, target feature elements and target weight elements corresponding to a plurality of processing cycles, wherein said image feature matrix corresponds to a plurality of weight matrices; in response to the arrival of any processing cycle, each processing engine (PE) in a PE array acquiring a corresponding target feature element and a corresponding target weight element of the processing cycle, and performing a preset operation, so as to obtain intermediate processing data, wherein for any processing cycle, target feature elements in the PE array comprise repeated feature elements, and target weight elements corresponding to the repeated feature elements are weight elements, corresponding to the repeated feature elements, in different weight matrices; and on the basis of intermediate processing data corresponding to the plurality of processing cycles, obtaining result data of the processing of said image feature matrix.

Description

一种数据处理方法、装置、计算机设备及存储介质A data processing method, device, computer equipment and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求于2021年2月26日提交的、申请号为202110221235.3的中国专利公开的优先权,该中国专利公开的全部内容以引用的方式并入本文中。The present disclosure claims priority to Chinese Patent Publication No. 202110221235.3 filed on February 26, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本公开涉及计算机技术领域,具体而言,涉及一种数据处理方法、装置、计算机设备及存储介质。The present disclosure relates to the field of computer technology, and in particular, to a data processing method, apparatus, computer device, and storage medium.
背景技术Background technique
卷积神经网络(Convolutional Neural Networks,CNN)作为深度学习的重要模型,在图像识别、自然语言处理等方面有着广泛的应用。在卷积神经网络中,包含有卷积层、池化层、激活层、以及全连接层等多种不同的网络层。Convolutional Neural Networks (CNN), as an important model of deep learning, have a wide range of applications in image recognition, natural language processing, etc. In a convolutional neural network, there are many different network layers such as convolutional layers, pooling layers, activation layers, and fully connected layers.
其中,全连接层由较多个节点构成,在对全连接层的各个节点进行计算时,由于输入数据和相关参数等处理数据的数据量较大,因此每次计算都需要较长的时间将需要的数据传输至运算单元中,造成处理效率较低。Among them, the fully connected layer is composed of a large number of nodes. When calculating each node of the fully connected layer, due to the large amount of processing data such as input data and related parameters, it takes a long time to calculate each time. The required data is transmitted to the operation unit, resulting in low processing efficiency.
发明内容SUMMARY OF THE INVENTION
本公开实施例至少提供一种数据处理方法、装置、计算机设备及存储介质。Embodiments of the present disclosure provide at least a data processing method, apparatus, computer device, and storage medium.
第一方面,本公开实施例提供了一种数据处理方法,包括:从待处理图像特征矩阵以及权重矩阵中,确定多个处理周期分别对应的目标特征元素以及目标权重元素;其中,所述待处理图像特征矩阵对应多个权重矩阵;响应于任一处理周期到来,处理引擎PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据;其中,针对任一处理周期,所述PE阵列中的目标特征元素包括重复特征元素,以及该重复特征元素对应的目标权重元素分别为不同权重矩阵中与该重复特征元素对应的权重元素;基于多个处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据。In a first aspect, an embodiment of the present disclosure provides a data processing method, including: from a feature matrix and a weight matrix of an image to be processed, determining target feature elements and target weight elements corresponding to multiple processing cycles respectively; The processing image feature matrix corresponds to multiple weight matrices; in response to the arrival of any processing cycle, each PE in the processing engine PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation to obtain an intermediate Processing data; wherein, for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the target weight elements corresponding to the repeated feature elements are respectively the weight elements corresponding to the repeated feature elements in different weight matrices ; Based on the intermediate processing data corresponding to a plurality of processing cycles, the result data of processing the feature matrix of the to-be-processed image is obtained.
这样,通过在每个处理周期中,分别复用传输至PE阵列中的特征元素,减少每个处理周期需要读入到PE阵列中的数据的数据量,减少数据读入PE阵列中所需要的耗时,提升PE阵列的处理效率。In this way, by multiplexing the feature elements transmitted to the PE array in each processing cycle, the amount of data that needs to be read into the PE array in each processing cycle is reduced, and the amount of data that needs to be read into the PE array is reduced. Time-consuming and improve the processing efficiency of the PE array.
一种可能的实施方式中,所述确定多个处理周期分别对应的目标特征元素以及目标权重元素之前,还包括:基于所述PE阵列的尺寸,对原始待处理图像特征矩阵以及原始权重矩阵进行尺寸变换,得到所述待处理图像特征矩阵、以及所述权重矩阵。In a possible implementation manner, before the determining of the target feature elements and target weight elements corresponding to the multiple processing cycles respectively, the method further includes: based on the size of the PE array, performing the processing on the original to-be-processed image feature matrix and the original weight matrix. The size is transformed to obtain the feature matrix of the to-be-processed image and the weight matrix.
这样,可以将原始待处理图像特征矩阵、以及原始权重矩阵的尺寸变换至与PE阵列匹配,能够在后续处理过程中处理逻辑更加简单,简化处理过程。In this way, the size of the original to-be-processed image feature matrix and the original weight matrix can be transformed to match the PE array, which can simplify the processing logic in the subsequent processing process and simplify the processing process.
一种可能的实施方式中,所述多个处理周期分别对应的目标特征元素,包括所述待处理图像特征矩阵中的至少一个图像特征元素;所述目标特征元素在所述待处理图像特征矩阵中的位置,与对应的目标权重元素在相应权重矩阵中的位置一致。In a possible implementation manner, the target feature elements corresponding to the multiple processing cycles respectively include at least one image feature element in the to-be-processed image feature matrix; the target feature element is in the to-be-processed image feature matrix. The position in , is consistent with the position of the corresponding target weight element in the corresponding weight matrix.
一种可能的实施方式中,所述PE阵列的每一行包括重复特征元素;响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列行数个目标特征元素传输至所述PE阵列中的一列PE中,并将所述一列PE中的目标特征元素复制到其他列的PE中,作为对应PE的第一个操作数;并将与每一列PE中目标特征元素分别对应的来自不同权重矩阵的目标 权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each row of the PE array includes repeated feature elements; in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element. and performing a preset operation to obtain intermediate processing data, including: in response to the arrival of any processing cycle, transmitting the target feature elements of the PE array rows in the to-be-processed image feature matrix to a column of PEs in the PE array , and copy the target feature elements in one column of PEs to PEs in other columns as the first operand of the corresponding PEs; and copy the target feature elements from different weight matrices corresponding to the target feature elements in each column of PEs respectively The weight element is transferred to the PE in the PE array corresponding to the position of the target feature element as the second operand of the corresponding PE; the first operand and the second operand stored in the PE array are processed by the PE array. A preset operation is performed on the operands to obtain intermediate processing data corresponding to the processing cycle.
这样,实现了在PE阵列中复用待处理图像特征数据中的一列特征元素,减少需要传输至PE阵列中的特征数据的数量,进而提升数据处理效率。In this way, a column of feature elements in the feature data of the image to be processed is multiplexed in the PE array, thereby reducing the quantity of feature data that needs to be transmitted to the PE array, thereby improving the data processing efficiency.
一种可能的实施方式中,所述利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据,包括:在所述处理周期中,将所述第一个操作数中的每列目标特征元素、和所述第二个操作数中的每列权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。In a possible implementation manner, the use of the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain intermediate processing data corresponding to the processing cycle, Including: in the processing cycle, weighted summation is performed on each column of target feature elements in the first operand and each column of weight elements in the second operand to obtain the different weight matrices Corresponding intermediate data; based on the intermediate data corresponding to the different weight matrices, obtain the intermediate processing data corresponding to the processing period.
这样,实现了在一个处理周期中对多个权重数据对应的处理任务的并行处理,并通过多个处理周期完成对多个权重数据对应处理任务的处理,使得待处理图像特征矩阵中的元素在每个周期都能够得到复用,提升数据处理的效率。In this way, the parallel processing of the processing tasks corresponding to multiple weight data in one processing cycle is realized, and the processing of the processing tasks corresponding to the multiple weight data is completed through multiple processing cycles, so that the elements in the feature matrix of the image to be processed are in Each cycle can be multiplexed to improve the efficiency of data processing.
一种可能的实施方式中,所述PE阵列的每一列包括重复特征元素;响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列列数个目标特征元素传输至所述PE阵列中的一行PE中,并将所述一行PE中的目标特征元素复制到其他行的PE中,作为对应PE的第一个操作数;并将与每一行PE中目标特征元素分别对应的来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each column of the PE array includes repeated feature elements; in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element. and performing a preset operation to obtain intermediate processing data, including: in response to the arrival of any processing cycle, transmitting the target feature elements of the PE array columns in the to-be-processed image feature matrix to a row of PEs in the PE array , and copy the target feature elements in the PE in the row to the PEs in other rows as the first operand of the corresponding PE; and copy the target feature elements from different weight matrices corresponding to the target feature elements in each row PE respectively The weight element is transferred to the PE in the PE array corresponding to the position of the target feature element as the second operand of the corresponding PE; the first operand and the second operand stored in the PE array are processed by the PE array. A preset operation is performed on the operands to obtain intermediate processing data corresponding to the processing cycle.
这样,实现了在PE阵列中复用待处理图像特征数据中的一行特征元素,减少需要传输至PE阵列中的特征数据的数量,进而提升数据处理效率。In this way, a row of feature elements in the feature data of the image to be processed is multiplexed in the PE array, reducing the quantity of feature data that needs to be transmitted to the PE array, thereby improving data processing efficiency.
一种可能的实施方式中,所述利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据,包括:在所述处理周期中,将所述第一个操作数中的每行目标特征元素、和所述第二个操作数中的每行权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。In a possible implementation manner, the use of the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain intermediate processing data corresponding to the processing cycle, Including: in the processing cycle, weighted summation is performed on each row of target feature elements in the first operand and each row of weight elements in the second operand to obtain the different weight matrices Corresponding intermediate data; based on the intermediate data corresponding to the different weight matrices, obtain the intermediate processing data corresponding to the processing period.
一种可能的实施方式中,所述PE阵列的每个PE包括重复特征元素;响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:响应于任一处理周期到来,将所述待处理图像特征矩阵中的一个目标特征元素传输至所述PE阵列中的一个PE中,并将所述一个PE中的目标特征元素复制到其他的PE中,作为对应PE的第一个操作数;将与该一个目标特征元素对应的来自所述PE阵列中所有PE个数不同的权重矩阵的目标权重元素传输至所述PE阵列的各PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each PE of the PE array includes repeated feature elements; in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and perform a preset operation to obtain intermediate processing data, including: in response to the arrival of any processing cycle, transferring a target feature element in the feature matrix of the to-be-processed image to a PE in the PE array, and transferring The target feature element in the one PE is copied to other PEs as the first operand of the corresponding PE; the weight matrix corresponding to the one target feature element from all the PEs in the PE array with different numbers The target weight element is transferred to each PE of the PE array as the second operand of the corresponding PE; the first operand and the second operand stored in the PE array are pre-processed by using the PE array. Set the operation to obtain the intermediate processing data corresponding to the processing cycle.
一种可能的实施方式中,所述基于多个所述处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据,包括:将多个处理周期中分别对应的中间处理数据中,属于同一权重矩阵的中间数据进行累加,得到各个权重矩阵对应的结果值;基于多个权重矩阵分别对应的结果值,得到对所述待处理图像特征矩阵进行处理的结果数据。In a possible implementation manner, the obtaining result data of processing the feature matrix of the to-be-processed image based on the intermediate processing data corresponding to the plurality of processing cycles respectively includes: In the intermediate processing data, the intermediate data belonging to the same weight matrix are accumulated to obtain the result value corresponding to each weight matrix; the result data of processing the feature matrix of the image to be processed is obtained based on the result values corresponding to the multiple weight matrices respectively.
一种可能的实施方式中,任一处理周期对应的预设运算,包括:对所述待处理图像特征矩阵进行全连接运算的子运算。In a possible implementation manner, the preset operation corresponding to any processing cycle includes: a sub-operation of performing a fully connected operation on the feature matrix of the to-be-processed image.
这样,实现了对待处理图像特征矩阵的全连接处理,使得全连接处理的效率更高,提升采用该种方式进行全连接处理的神经网络的处理速度。In this way, the full connection processing of the feature matrix of the image to be processed is realized, so that the efficiency of the full connection processing is higher, and the processing speed of the neural network using this method for the full connection processing is improved.
第二方面,本公开实施例提供一种数据处理装置,包括:控制器以及处理引擎PE阵列;所述控制器用于从待处理图像特征矩阵以及权重矩阵中,确定多个处理周期分别对应的目标特征元素以及目标权重元素;其中,所述待处理图像特征矩阵对应多个权重矩阵;所述PE阵列用于响应于任一处理周期到来,所述PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据;基于多个处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据;其中,针对任一处理周期,所述PE阵列中的目标特征元素包括重复特征元素,以及该重复特征元素对应的目标权重元素分别为不同权重矩阵中与该重复特征元素对应的权重元素。In a second aspect, an embodiment of the present disclosure provides a data processing apparatus, including: a controller and a processing engine PE array; the controller is configured to determine targets corresponding to multiple processing cycles from a feature matrix and a weight matrix of an image to be processed Feature element and target weight element; wherein, the feature matrix of the image to be processed corresponds to multiple weight matrices; the PE array is used to respond to the arrival of any processing cycle, and each PE in the PE array obtains the corresponding processing cycle. The target feature element and the corresponding target weight element are pre-set to obtain intermediate processing data; based on the intermediate processing data corresponding to multiple processing cycles, the result data of processing the feature matrix of the to-be-processed image is obtained; wherein, For any processing cycle, the target feature elements in the PE array include repeated feature elements, and the target weight elements corresponding to the repeated feature elements are weight elements corresponding to the repeated feature elements in different weight matrices, respectively.
一种可能的实施方式中,所述控制器,在确定多个处理周期分别对应的目标特征元素以及目标权重元素之前,还用于:基于所述PE阵列的尺寸,对原始待处理图像特征矩阵以及原始权重矩阵进行尺寸变换,得到所述待处理图像特征矩阵、以及所述权重矩阵。In a possible implementation manner, the controller, before determining the target feature elements and target weight elements corresponding to the multiple processing cycles respectively, is further configured to: and the original weight matrix to perform size transformation to obtain the feature matrix of the to-be-processed image and the weight matrix.
一种可能的实施方式中,所述多个处理周期分别对应的目标特征元素,包括所述待处理图像特征矩阵中的至少一个图像特征元素;所述目标特征元素在所述待处理图像特征矩阵中的位置,与对应的目标权重元素在相应权重矩阵中的位置一致。In a possible implementation manner, the target feature elements corresponding to the multiple processing cycles respectively include at least one image feature element in the to-be-processed image feature matrix; the target feature element is in the to-be-processed image feature matrix. The position in , is consistent with the position of the corresponding target weight element in the corresponding weight matrix.
一种可能的实施方式中,所述PE阵列的每一行包括重复特征元素;所述PE阵列,在响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列行数个目标特征元素传输至所述PE阵列中的一列PE中,并将所述一列PE中的目标特征元素复制到其他列的PE中,作为对应PE的第一个操作数;并将与每一列PE中目标特征元素分别对应的来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each row of the PE array includes repeated feature elements; in the PE array, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle. and the corresponding target weight elements and perform a preset operation to obtain intermediate processing data, used for: in response to the arrival of any processing cycle, the PE array row in the to-be-processed image feature matrix is transmitted to several target feature elements. In a row of PEs in the PE array, copy the target feature elements in the one row of PEs to PEs in other rows as the first operand of the corresponding PE; and separate the target feature elements from each row of PEs. Corresponding target weight elements from different weight matrices are transferred to the PE in the PE array corresponding to the position of the target feature element as the second operand of the corresponding PE; A preset operation is performed on the first operand and the second operand to obtain intermediate processing data corresponding to the processing cycle.
一种可能的实施方式中,所述PE阵列,在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据时,用于:在所述处理周期中,将所述第一个操作数中的每列目标特征元素、和所述第二个操作数中的每列权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。In a possible implementation manner, the PE array uses the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the corresponding value of the processing cycle. During the intermediate processing of data, it is used for: in the processing cycle, weighted summation is performed on each column of target feature elements in the first operand and each column of weight elements in the second operand, Obtain intermediate data corresponding to the different weight matrices; and obtain intermediate processing data corresponding to the processing period based on the intermediate data corresponding to the different weight matrices.
一种可能的实施方式中,所述PE阵列的每一列包括重复特征元素;所述PE阵列,在响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列列数个目标特征元素传输至所述PE阵列中的一行PE中,并将所述一行PE中的目标特征元素复制到其他行的PE中,作为对应PE的第一个操作数;并将与每一行PE中目标特征元素分别对应的来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each column of the PE array includes repeated feature elements; in the PE array, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle. and the corresponding target weight elements and perform a preset operation to obtain intermediate processing data, used for: in response to the arrival of any processing cycle, the number of target feature elements in the PE array column in the image feature matrix to be processed is transmitted to In a row of PEs in the PE array, copy the target feature elements in the row of PEs to PEs in other rows as the first operand of the corresponding PE; and separate the target feature elements from each row of PEs. Corresponding target weight elements from different weight matrices are transferred to the PE in the PE array corresponding to the position of the target feature element as the second operand of the corresponding PE; A preset operation is performed on the first operand and the second operand to obtain intermediate processing data corresponding to the processing cycle.
一种可能的实施方式中,所述PE阵列,在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据时用于:在所述处理周期中,将所述第一个操作数中的每行目标特征元素、和所述第二个操作数中的每行权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。In a possible implementation manner, the PE array uses the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the corresponding value of the processing cycle. During the intermediate processing of data, it is used for: in the processing cycle, weighted summation is performed on each row of target feature elements in the first operand and each row of weight elements in the second operand to obtain The intermediate data corresponding to the different weight matrices; based on the intermediate data corresponding to the different weight matrices, the intermediate processing data corresponding to the processing period is obtained.
一种可能的实施方式中,所述PE阵列的每个PE包括重复特征元素;所述PE阵列,在响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:响应于任一处理周期到来,将所述待处理图像特征矩阵中的一个目标特征元素传输至所述PE阵列中的一个PE中,并将所述一个PE中的目标特征元素复制到其他的PE中,作为对应PE的第一个操作数;将与该一个目标特征元素对应的来自所述PE阵列中所有PE个数不同的权重矩阵的目标权重元素传输至所述PE阵列的各PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each PE of the PE array includes repeated feature elements; in the PE array, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature corresponding to the processing cycle. element and the corresponding target weight element and perform a preset operation to obtain intermediate processing data, used for: in response to the arrival of any processing cycle, a target feature element in the feature matrix of the to-be-processed image is transmitted to the PE array in one PE, and copy the target feature element in the one PE to other PEs as the first operand of the corresponding PE; copy the target feature element corresponding to the one from all the PE arrays. The target weight elements of weight matrices with different numbers of PEs are transferred to each PE of the PE array as the second operand of the corresponding PE; the first operand stored in the PE array is processed by the PE array. and perform a preset operation on the second operand to obtain intermediate processing data corresponding to the processing cycle.
一种可能的实施方式中,所述PE阵列,在基于多个所述处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据时,用于:将多个处理周期中分别对应的中间处理数据中,属于同一权重矩阵的中间数据进行累加,得到各个权重矩阵对应的结果值;基于多个权重矩阵分别对应的结果值,得到对所述待处理图像特征矩阵进行处理的结果数据。In a possible implementation manner, the PE array, when obtaining the result data of processing the image feature matrix to be processed based on the intermediate processing data corresponding to the plurality of processing cycles, is used to: Among the intermediate processing data corresponding respectively in the processing cycle, the intermediate data belonging to the same weight matrix are accumulated to obtain the result value corresponding to each weight matrix; based on the result values corresponding to the multiple weight matrices, the feature matrix of the image to be processed is obtained. The resulting data for processing.
一种可能的实施方式中,任一处理周期对应的预设运算,包括:对所述待处理图像特征矩阵进行全连接运算的子运算。In a possible implementation manner, the preset operation corresponding to any processing cycle includes: a sub-operation of performing a fully connected operation on the feature matrix of the to-be-processed image.
第三方面,本公开实施例还提供一种计算机设备,包括:处理器、存储器、及如第二方面所述的数据处理装置。In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory, and the data processing apparatus according to the second aspect.
第四方面,本公开可选实现方式还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的步骤。In a fourth aspect, an optional implementation manner of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program executes the first aspect, or any of the first aspect, when the computer program is run. steps in one possible implementation.
关于上述数据装置、计算机设备、及计算机可读存储介质的效果描述参见上述数据处理方法的说明,这里不再赘述。For the description of the effects of the above-mentioned data apparatus, computer equipment, and computer-readable storage medium, reference may be made to the description of the above-mentioned data processing method, which will not be repeated here.
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.
图1示出了本公开实施例所提供的一种数据处理方法的流程图;FIG. 1 shows a flowchart of a data processing method provided by an embodiment of the present disclosure;
图2示出了本公开实施例所提供的数据处理方法中,利用PE矩阵对目标特征元素进行处理的一种示例;FIG. 2 shows an example of processing target feature elements by using PE matrix in the data processing method provided by the embodiment of the present disclosure;
图3示出了本公开实施例所提供的数据处理方法中,利用PE矩阵对目标特征元素进行处理的另一种示例;FIG. 3 shows another example of processing the target feature element by using the PE matrix in the data processing method provided by the embodiment of the present disclosure;
图4示出了本公开实施例所提供的数据处理方法中,利用PE矩阵对目标特征元素进行处理的另一种示例;FIG. 4 shows another example of processing the target feature element by using the PE matrix in the data processing method provided by the embodiment of the present disclosure;
图5示出了本公开实施例所提供的一种数据处理装置的示意图;FIG. 5 shows a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure;
图6示出了本公开实施例所提供的一种计算机设备的示意图。FIG. 6 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.
经研究发现,现在常用的人工智能(Artificial Intelligence,AI)加速器硬件架构主要包括存储单元、计算单元、控制单元等,其中核心的计算单元一般由处理引擎(Processing Engine,PE)阵列和寄存器阵列(local register file)构成。在神经网络中的全连接处理中,通常需要利用多组全连接权重,分别对特征图中各个特征点的特征值进行加权求和处理,以得到特征图的全连接处理结果。在利用AI加速器对图像数据进行全连接处理时,需要在多个处理周期中的每个处理周期将特征图中至少部分图像特征元素读入到PE阵列,并将一个全连接权重中与读入到PE阵列的图像特征元素对应的权重元素读入到PE阵列,PE阵列对读入的图像特征元素和权重元素进行加权求和处理。经过多个处理周期的处理,得到对特征图进行全连接处理的结果。但是由于PE阵列读入数据的带宽有限,对于尺寸为m*n的PE阵列,每个处理周期,需要读入m*n个图像特征元素,并读入对应的m*n个权重元素,数据读入的效率低,造成对特征图进行全连接处理时所需要的耗时增加,进而导致了PE阵列处理效率低的问题。After research, it is found that the commonly used artificial intelligence (Artificial Intelligence, AI) accelerator hardware architecture mainly includes storage unit, computing unit, control unit, etc., among which the core computing unit is generally composed of processing engine (Processing Engine, PE) array and register array ( local register file) composition. In the full connection processing in the neural network, it is usually necessary to use multiple sets of full connection weights to perform weighted summation processing on the eigenvalues of each feature point in the feature map, so as to obtain the fully connected processing result of the feature map. When using the AI accelerator to perform full-connection processing on image data, it is necessary to read at least part of the image feature elements in the feature map into the PE array in each processing cycle in multiple processing cycles, and combine a fully-connected weight with the read-in The weight elements corresponding to the image feature elements of the PE array are read into the PE array, and the PE array performs weighted summation processing on the read image feature elements and the weight elements. After multiple processing cycles, the result of the fully connected processing of the feature map is obtained. However, due to the limited bandwidth of the PE array to read data, for a PE array of size m*n, each processing cycle needs to read m*n image feature elements, and read the corresponding m*n weight elements, data The read-in efficiency is low, which increases the time required for full connection processing of the feature map, which in turn leads to the problem of low processing efficiency of the PE array.
基于上述研究,本公开提供了一种数据处理方法,通过在每个处理周期中,复用传输至PE阵列中的图像特征元素,减少每个处理周期需要读入到PE阵列中的数据的数据量,减少数据读入PE阵列中所需要的耗时,提升PE阵列的处理效率。Based on the above research, the present disclosure provides a data processing method. By multiplexing the image feature elements transmitted to the PE array in each processing cycle, the data of the data that needs to be read into the PE array in each processing cycle is reduced. It reduces the time required to read data into the PE array and improves the processing efficiency of the PE array.
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。The defects existing in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions to the above problems proposed by the present disclosure hereinafter should be the inventors Contributions made to this disclosure during the course of this disclosure.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.
下面以对本公开实施例提供的数据处理方法加以说明。The data processing method provided by the embodiments of the present disclosure will be described below.
参见图1所示,为本公开实施例提供的一种数据处理方法的流程图,所述方法包括步骤S101~S103,其中:Referring to FIG. 1, which is a flowchart of a data processing method provided by an embodiment of the present disclosure, the method includes steps S101-S103, wherein:
S101:从待处理图像特征矩阵以及权重矩阵中,确定多个处理周期分别对应的目标特征元素以及目标权重元素;其中,待处理图像特征矩阵对应多个权重矩阵;S101: From the image feature matrix to be processed and the weight matrix, determine target feature elements and target weight elements corresponding to multiple processing cycles respectively; wherein, the image feature matrix to be processed corresponds to multiple weight matrices;
S102:响应于任一处理周期到来,处理引擎PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据;S102: In response to the arrival of any processing cycle, each PE in the processing engine PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation to obtain intermediate processing data;
其中,针对任一处理周期,PE阵列中的目标特征元素包括重复特征元素,以及该重复特征元素对应的目标权重元素分别为不同权重矩阵中与该重复特征元素对应的权重 元素;Wherein, for any processing cycle, the target feature element in the PE array includes a repeating feature element, and the target weight element corresponding to this repeating feature element is respectively a weight element corresponding to this repeating feature element in different weight matrices;
S103:基于多个处理周期分别对应的中间处理数据,得到对待处理图像特征矩阵进行处理的结果数据。S103: Obtain result data of processing the feature matrix of the image to be processed based on the intermediate processing data corresponding to the multiple processing cycles respectively.
本公开实施例通过从待处理图像特征矩阵以及权重矩阵中,确定多个处理周期分别对应的目标特征元素以及目标权重元素,并响应于任一处理周期的到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,基于多个处理周期分别对应的中间处理数据,得到对待处理图像特征矩阵进行处理的结果数据,在该过程中,针对任一处理周期,PE阵列中目标特征元素包括重复特征元素,以及该重复特征元素对应的目标权重元素分别为不同权重矩阵中与该重复特征元素对应的权重元素,从而通过在各个处理周期中,复用传输至PE阵列中的图像特征元素,减少每个处理周期需要读入到PE阵列中的数据的数据量,减少数据读入PE阵列中所需要的耗时,提升PE阵列的处理效率。In this embodiment of the present disclosure, target feature elements and target weight elements corresponding to multiple processing cycles are determined from the feature matrix and weight matrix of the image to be processed, and in response to the arrival of any processing cycle, each PE in the PE array obtains The target feature element corresponding to the processing cycle and the corresponding target weight element are subjected to a preset operation, and the result data of processing the feature matrix of the image to be processed is obtained based on the intermediate processing data corresponding to the multiple processing cycles. In this process, for In any processing cycle, the target feature elements in the PE array include repeated feature elements, and the target weight elements corresponding to the repeated feature elements are respectively the weight elements corresponding to the repeated feature elements in different weight matrices, so that in each processing cycle, The image feature elements transmitted to the PE array are multiplexed to reduce the amount of data that needs to be read into the PE array in each processing cycle, reduce the time required for data to be read into the PE array, and improve the processing efficiency of the PE array.
下面对上述S101~S103加以详细说明。The above S101 to S103 will be described in detail below.
针对上述S101,在从待处理图像特征矩阵以及权重矩阵中确定多个处理周期分别对应的目标特征元素以及目标权重元素时,例如可以首先基于待处理图像特征矩阵与权重矩阵的数量,确定需要的处理周期的数量,然后为各个处理周期确定目标特征元素以及目标权重元素。For the above S101, when determining target feature elements and target weight elements corresponding to multiple processing cycles from the image feature matrix and weight matrix to be processed, for example, based on the number of image feature matrices and weight matrices to be processed, determine the required The number of processing cycles, and then target feature elements and target weight elements are determined for each processing cycle.
示例性的,为每个处理周期确定的目标特征元素的数量,少于PE阵列中PE的数量;为每个处理周期确定的目标特征元素,会从外部存储器中传输至PE阵列中的部分PE中,PE阵列中的其他PE所需要的数据,是对传输至PE阵列中的目标特征元素进行复用的重复特征元素,因此不需要由外部传输,而是从存储这些目标特征元素的PE中进行复制即可。Exemplarily, the number of target feature elements determined for each processing cycle is less than the number of PEs in the PE array; the target feature elements determined for each processing cycle are transferred from the external memory to some PEs in the PE array , the data required by other PEs in the PE array is the repeated feature elements that multiplex the target feature elements transmitted to the PE array, so it does not need to be transmitted externally, but from the PE that stores these target feature elements. Just make a copy.
在实际应用中,需要进行全连接运算的特征图通常为卷积层输出的特征图,可以包括多个,这里的待处理图像特征矩阵可以为多个特征图中的一个,也可以为多个特征图组合成的组合特征图,针对为多个特征图中的一个的情况,可以针对每个特征图分别执行本公开提供的数据处理方法,再对每次执行得到的数据进行组合得到最终的全连接运算结果。In practical applications, the feature map that needs to be fully connected is usually the feature map output by the convolution layer, which can include multiple ones. The feature matrix of the image to be processed here can be one of multiple feature maps, or multiple feature maps. For a combined feature map composed of feature maps, in the case of one of multiple feature maps, the data processing method provided by the present disclosure can be separately executed for each feature map, and then the data obtained by each execution can be combined to obtain the final result. The result of the full join operation.
一种可能的实施方式中,多个处理周期分别对应的目标特征元素,包括所述待处理图像特征矩阵中的至少一个图像特征元素;所述目标特征元素在所述待处理图像特征矩阵中的位置,与对应的目标权重元素在相应权重矩阵中的位置一致。In a possible implementation, the target feature elements corresponding to multiple processing cycles respectively include at least one image feature element in the to-be-processed image feature matrix; the target feature element is in the to-be-processed image feature matrix. position, which is consistent with the position of the corresponding target weight element in the corresponding weight matrix.
本公开实施例中,为了处理方便,例如可以在确定各个处理周期对应的目标特征元素以及目标权重元素之前,可以基于PE阵列的尺寸,对原始待处理图像特征矩阵、以及原始权重矩阵进行尺寸变换,得到待处理图像特征矩阵以及权重矩阵。In this embodiment of the present disclosure, for the convenience of processing, for example, before determining the target feature elements and target weight elements corresponding to each processing cycle, the original image feature matrix to be processed and the original weight matrix may be size-transformed based on the size of the PE array , to obtain the feature matrix and weight matrix of the image to be processed.
示例性的,若原始待处理图像特征矩阵的尺寸为M*N*S,对应原始权重矩阵的尺寸也为M*N*S。若PE阵列的尺寸为A*A,则在对原始待处理图像特征矩阵进行尺寸变换时,得到的待处理图像特征矩阵的尺寸为:A*A*W,其中,W=(M*N*S)/(A*A)。多个权重矩阵中每个权重矩阵的尺寸也为A*A*W。Exemplarily, if the size of the original to-be-processed image feature matrix is M*N*S, the size of the corresponding original weight matrix is also M*N*S. If the size of the PE array is A*A, when performing size transformation on the original image feature matrix to be processed, the size of the obtained image feature matrix to be processed is: A*A*W, where W=(M*N* S)/(A*A). The dimensions of each of the multiple weight matrices are also A*A*W.
另外,若原始待处理图像特征矩阵所表示特征图的尺寸小于或者等于PE阵列的尺寸,可以对原始待处理图像特征矩阵进行尺寸变换,也可以不对原始待处理图像特征矩阵进行尺寸变换。在不对原始待处理图像特征矩阵进行尺寸变换的情况下,若该特征图的尺寸小于PE阵列的尺寸,那么在利用PE阵列进行处理的过程中,PE阵列中的PE只会被使用一部分,而不会被全部使用。In addition, if the size of the feature map represented by the original to-be-processed image feature matrix is smaller than or equal to the size of the PE array, the original to-be-processed image feature matrix may be size-transformed, or not. In the case of no size transformation of the original image feature matrix to be processed, if the size of the feature map is smaller than the size of the PE array, then only a part of the PE in the PE array will be used during the process of using the PE array, while Not all will be used.
在确定了各个处理周期分别对应的目标特征元素以及目标权重元素后,即可以在任一处理周期到来后,对该处理周期对应的目标特征元素和目标权重元素进行预设处理,得到中间处理数据。After the target feature elements and target weight elements corresponding to each processing period are determined, after any processing period arrives, preset processing can be performed on the target feature elements and target weight elements corresponding to the processing period to obtain intermediate processing data.
此处,对该处理周期对应的目标特征元素和目标权重元素进行的预设处理,例如包括:对所述待处理图像特征矩阵进行全连接运算的子运算。Here, the preset processing performed on the target feature element and the target weight element corresponding to the processing period includes, for example, a sub-operation of performing a full connection operation on the feature matrix of the to-be-processed image.
此处,所进行的子运算,例如是权重矩阵分别对应的子运算。Here, the sub-operations performed are, for example, sub-operations corresponding to the weight matrices respectively.
示例性的,待处理图像特征矩阵ImgC表示为Exemplarily, the image feature matrix ImgC to be processed is expressed as
Figure PCTCN2021115789-appb-000001
Figure PCTCN2021115789-appb-000001
权重矩阵Wi表示为:The weight matrix Wi is expressed as:
Figure PCTCN2021115789-appb-000002
Figure PCTCN2021115789-appb-000002
其中,i表示第i个权重矩阵。以第一组权重参数W1为例,其存在16个权重数据,为便于表述,表示为W1_1、W1_2、W1_3、……、W1_16,可以构成权重矩阵Among them, i represents the ith weight matrix. Taking the first group of weight parameters W1 as an example, there are 16 weight data, which are expressed as W1_1, W1_2, W1_3, ..., W1_16 for the convenience of expression, which can form a weight matrix
Figure PCTCN2021115789-appb-000003
Figure PCTCN2021115789-appb-000003
则利用权重矩阵Wi对待处理图像特征矩阵ImgC进行全连接运算时,第i个权重矩阵对应的全连接运算可以表示为:Then when using the weight matrix Wi to perform the full connection operation on the feature matrix ImgC of the image to be processed, the full connection operation corresponding to the i-th weight matrix can be expressed as:
O1=a1×wi_1+a2×wi_2+a3×wi_3+a4×wi_4O1=a1×wi_1+a2×wi_2+a3×wi_3+a4×wi_4
+a5×wi_5+a6×wi_6+a7×wi_7+a8×wi_8+a5×wi_5+a6×wi_6+a7×wi_7+a8×wi_8
+a9×wi_9+a10×wi_10+a11×wi_11+a12×wi_12+a9×wi_9+a10×wi_10+a11×wi_11+a12×wi_12
+a13×wi_13+a14×wi_14+a15×wi_15+a16×wi_16。+a13×wi_13+a14×wi_14+a15×wi_15+a16×wi_16.
与该权重矩阵对应的子运算例如包括:The sub-operations corresponding to the weight matrix include, for example:
O1 1=a1×wi_1+a2×wi_2+a3×wi_3+a4×wi_4; O1 1 =a1×wi_1+a2×wi_2+a3×wi_3+a4×wi_4;
O1 2=a5×wi_5+a6×wi_6+a7×wi_7+a8×wi_8; O1 2 =a5×wi_5+a6×wi_6+a7×wi_7+a8×wi_8;
O1 3=a9×wi_9+a10×wi_10+a11×wi_11+a12×wi_12; O1 3 =a9×wi_9+a10×wi_10+a11×wi_11+a12×wi_12;
O1 4=a13×wi_13+a14×wi_14+a15×wi_15+a16×wi_16。 O1 4 =a13×wi_13+a14×wi_14+a15×wi_15+a16×wi_16.
其中,in,
O1=O1 1+O1 2+O1 3+O1 4O1=O1 1 +O1 2 +O1 3 +O1 4 .
针对上述S102和S103:在响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算时,例如:在任一处理周期到来后,从外部存储器中读取与该处理周期对应的目标特征元素和目标权重元素,存储至PE阵列的部分PE中,然后将目标特征元素复制到PE阵列的其他PE中,形成第一个操作数,然后将读入的目标权重元素作为第二个操作数,并利用第一个操作数和第二个操作数进行预设运算,得到处理周期对应的中间处理数据。For the above S102 and S103: in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation, for example: in any processing cycle After arrival, read the target feature elements and target weight elements corresponding to the processing cycle from the external memory, store them in some PEs of the PE array, and then copy the target feature elements to other PEs in the PE array to form the first Operand, and then use the read target weight element as the second operand, and use the first operand and the second operand to perform a preset operation to obtain the intermediate processing data corresponding to the processing cycle.
本公开实施例以待处理图像特征矩阵的尺寸与PE阵列的尺寸一致为例,对各个处理周期的处理加以详细说明。这里,下述(1)~(3)中的示例,仅仅是确定目标特征元素,并将之存储至PE阵列的示例,还可以采用其他的方式确定目标特征元素,在确定目标特征元素的时候,可以不按照图像特征元素的顺序来确定,只要保证待处理图像特征矩阵中的每个图像特征元素,都利用N个权重矩阵中对应权重元素进行过处理,且最终得到的每个权重矩阵对应的处理结果,为对待处理图像特征矩阵中的所有图像特征元素和该权重矩阵进行加权求和的结果即可。In the embodiment of the present disclosure, the processing of each processing cycle is described in detail by taking as an example that the size of the feature matrix of the image to be processed is the same as the size of the PE array. Here, the examples in the following (1) to (3) are only examples of determining the target feature element and storing it in the PE array. Other methods can also be used to determine the target feature element. When determining the target feature element can not be determined according to the order of the image feature elements, as long as it is ensured that each image feature element in the image feature matrix to be processed is processed by the corresponding weight elements in the N weight matrices, and each final weight matrix corresponding to The processing result can be the result of weighted summation of all image feature elements in the feature matrix of the image to be processed and the weight matrix.
(1)PE阵列的每一行包括重复特征元素。在该种情况下,响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列行数个目标特征元素传输至所述PE阵列中的一列PE中,并将所述一列PE中的目标特征元素复制到其他列的PE中,作为对应PE的第一个操作数;并将与每一列PE中目标特征元素分别对应的来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。(1) Each row of the PE array includes repeating feature elements. In this case, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation to obtain intermediate processing data, including: responding When any processing cycle arrives, transfer the target feature elements of the PE array row in the image feature matrix to be processed to a column of PEs in the PE array, and copy the target feature elements in the PE array In the PE of other columns, as the first operand of the corresponding PE; and transfer the target weight elements from different weight matrices corresponding to the target feature elements in each column of PEs to the position of the target feature element in the PE array. In the corresponding PE, as the second operand of the corresponding PE; use the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the corresponding processing cycle. intermediate processing data.
在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据时,例如可以采用下述方式:在所述处理周期中,将所述第一个操作数中的每列目标特征元素、和所述第二个操作数中的每列权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。When using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the intermediate processing data corresponding to the processing cycle, for example, the following method may be used: In the processing cycle, each column of target feature elements in the first operand and each column of weight elements in the second operand are weighted and summed to obtain the middle corresponding to the different weight matrices. data; based on the intermediate data corresponding to the different weight matrices, obtain the intermediate processing data corresponding to the processing period.
在该种情况下,为多个处理周期确定的目标特征元素,例如包括待处理图像特征矩阵中的一行图像特征元素。由于待处理图像特征矩阵中一行图像特征元素的数量,与PE阵列的列数相同,因此在将一行图像特征元素读入到PE阵列中时,可以是将一行图像特征元素读入到PE阵列中的一列PE中。In this case, the target feature elements determined for multiple processing cycles include, for example, a row of image feature elements in the image feature matrix to be processed. Since the number of image feature elements in a row of the image feature matrix to be processed is the same as the number of columns in the PE array, when reading a row of image feature elements into the PE array, it can be read into the PE array. in a column of PE.
例如,继续以如上的待处理图像特征矩阵ImgC和权重矩阵Wi为例,其中i取值为1至4的整数。For example, continue to take the above to-be-processed image feature matrix ImgC and weight matrix Wi as an example, where i is an integer from 1 to 4.
在第一个处理周期,将待处理图像特征矩阵中的第一行包括的图像特征元素a1、a2、a3、a4作为目标特征元素存储至第一行PE中,并在其他PE行进行复制,得到PE阵列在第一处理周期中的第一个操作数,也即可以构成矩阵In the first processing cycle, the image feature elements a1, a2, a3, and a4 included in the first row of the image feature matrix to be processed are stored as target feature elements in the first row PE, and copied in other PE rows, Get the first operand of the PE array in the first processing cycle, that is, it can form a matrix
Figure PCTCN2021115789-appb-000004
Figure PCTCN2021115789-appb-000004
对应的权重数据可以构成矩阵The corresponding weight data can form a matrix
Figure PCTCN2021115789-appb-000005
Figure PCTCN2021115789-appb-000005
也即第二个操作数。That is, the second operand.
将第一个操作数和第二个操作数存储至PE阵列中时,如图2中(a)所示,PE1、PE2、PE3、PE4均存储图像特征元素a1,并分别存储权重矩阵W1~W4中与图像特征元素a1对应的权重值:w1_1、w2_1、w3_1、w4_1。When the first operand and the second operand are stored in the PE array, as shown in (a) in Figure 2, PE1, PE2, PE3, and PE4 all store the image feature element a1, and store the weight matrix W1~ Weight values corresponding to the image feature element a1 in W4: w1_1, w2_1, w3_1, w4_1.
类似的,PE5~PE8均存储图像特征元素a2,并分别存储权重矩阵W1~W4中与图像特征元素a2对应的权重值:w1_2、w2_2、w3_2、w4_2。Similarly, PE5-PE8 all store the image feature element a2, and respectively store the weight values corresponding to the image feature element a2 in the weight matrices W1-W4: w1_2, w2_2, w3_2, w4_2.
PE9~PE12均存储图像特征元素a3,并分别存储权重矩阵W1~W4中与图像特征元素a3对应的权重值:w1_3、w2_3、w3_3、w4_3。PE9-PE12 all store the image feature element a3, and respectively store the weight values corresponding to the image feature element a3 in the weight matrices W1-W4: w1_3, w2_3, w3_3, w4_3.
PE13~PE16均存储图像特征元素a4,并分别存储权重矩阵W1~W4中与a4对应的权重值:w1_4、w2_4、w3_4、w4_4。PE13-PE16 all store the image feature element a4, and respectively store the weight values corresponding to a4 in the weight matrices W1-W4: w1_4, w2_4, w3_4, w4_4.
然后,以w1_1、w1_2、w1_3、w1_4为权重,对第一列a1、a2、a3以及a4进行加权求和,得到权重矩阵W1对应的中间数据O1 1Then, using w1_1, w1_2, w1_3, and w1_4 as weights, the first columns a1, a2, a3, and a4 are weighted and summed to obtain the intermediate data O1 1 corresponding to the weight matrix W1.
以w2_1、w2_2、w2_3、w2_4为权重,对第二列a1、a2、a3以及a4进行加权求和,得到权重矩阵W2对应的中间数据O2 1Taking w2_1, w2_2, w2_3, and w2_4 as weights, weighted summation is performed on the second columns a1, a2, a3, and a4 to obtain the intermediate data O2 1 corresponding to the weight matrix W2.
以w3_1、w3_2、w3_3、w3_4为权重,对第三列a1、a2、a3以及a4进行加权求和,得到权重矩阵W3对应的中间数据O3 1Taking w3_1, w3_2, w3_3, and w3_4 as weights, the third column a1, a2, a3, and a4 are weighted and summed to obtain the intermediate data O3 1 corresponding to the weight matrix W3.
以w4_1、w4_2、w4_3、w4_4为权重,对第四列a1、a2、a3以及a4进行加权求和,得到权重矩阵W4对应的中间数据O4 1Taking w4_1, w4_2, w4_3, and w4_4 as weights, the fourth column a1, a2, a3, and a4 are weighted and summed to obtain the intermediate data O4 1 corresponding to the weight matrix W4.
然后结合中间数据O1 1、O2 1、O3 1和O4 1,作为该第一个处理周期对应的中间处理数据。 Then, combine the intermediate data O1 1 , O2 1 , O3 1 and O4 1 as the intermediate processing data corresponding to the first processing cycle.
另外,在第二个处理周期,将待处理图像特征矩阵ImgC中的第二行包括的图像特征元素a5、a6、a7、a8为目标特征元素存储至第一行PE中,并在其他PE行进行复制,得到在第二处理周期中的第一个操作数,也即可以构成矩阵In addition, in the second processing cycle, the image feature elements a5, a6, a7, and a8 included in the second row of the image feature matrix ImgC to be processed are stored as target feature elements in the first row PE, and in other PE rows Copy to get the first operand in the second processing cycle, that is, a matrix can be formed
Figure PCTCN2021115789-appb-000006
Figure PCTCN2021115789-appb-000006
对应的权重数据可以构成矩阵The corresponding weight data can form a matrix
Figure PCTCN2021115789-appb-000007
Figure PCTCN2021115789-appb-000007
也即第二个操作数。That is, the second operand.
将第一个操作数和第二个操作数存储至PE阵列中时,如图2中(b)所示,PE1、PE2、PE3、PE4均存储图像特征元素a5,并分别存储权重矩阵W1~W4中与图像特征元素a5对应的权重值:w1_5、w2_5、w3_5、w4_5。When the first operand and the second operand are stored in the PE array, as shown in (b) of Figure 2, PE1, PE2, PE3, and PE4 all store the image feature element a5, and store the weight matrix W1~ Weight values corresponding to the image feature element a5 in W4: w1_5, w2_5, w3_5, w4_5.
类似的,PE5~PE8均存储图像特征元素a6,并分别存储权重矩阵W1~W4中与图像特征元素a6对应的权重值:w1_6、w2_6、w3_6、w4_6。Similarly, PE5-PE8 all store the image feature element a6, and respectively store the weight values corresponding to the image feature element a6 in the weight matrices W1-W4: w1_6, w2_6, w3_6, w4_6.
PE9~PE12均存储图像特征元素a7,并分别存储权重矩阵W1~W4中与图像特征元素a7对应的权重值:w1_7、w2_7、w3_7、w4_7。PE9-PE12 all store the image feature element a7, and respectively store the weight values corresponding to the image feature element a7 in the weight matrices W1-W4: w1_7, w2_7, w3_7, w4_7.
PE13~PE16均存储图像特征元素a8,并分别存储权重矩阵W1~W4中与图像特征元素a8对应的权重值:w1_8、w2_8、w3_8、w4_8。PE13-PE16 all store the image feature element a8, and respectively store the weight values corresponding to the image feature element a8 in the weight matrices W1-W4: w1_8, w2_8, w3_8, w4_8.
然后,以w1_5、w1_6、w1_7、w1_8为权重,对第一列a5、a6、a7以及a8进行加权求和,得到权重矩阵W1对应的中间数据O1 2Then, using w1_5, w1_6, w1_7, and w1_8 as weights, the first columns a5, a6, a7, and a8 are weighted and summed to obtain the intermediate data O1 2 corresponding to the weight matrix W1.
以w2_5、w2_6、w2_7、w2_8为权重,对第二列a5、a6、a7以及a8进行加权求和,得到权重矩阵W2对应的中间数据O2 2Taking w2_5, w2_6, w2_7, and w2_8 as weights, weighted summation is performed on the second columns a5, a6, a7, and a8 to obtain the intermediate data O2 2 corresponding to the weight matrix W2.
以w3_5、w3_6、w3_7、w3_8为权重,对第三列a5、a6、a7以及a8进行加权求和,得到权重矩阵W3对应的中间数据O3 2Taking w3_5, w3_6, w3_7, and w3_8 as weights, the third column a5, a6, a7, and a8 are weighted and summed to obtain the intermediate data O3 2 corresponding to the weight matrix W3.
以w4_5、w4_6、w4_7、w4_8为权重,对第四列a5、a6、a7以及a8进行加权求和,得到权重矩阵W4对应的中间数据O4 2Taking w4_5, w4_6, w4_7, and w4_8 as weights, weighted summation is performed on the fourth columns a5, a6, a7, and a8 to obtain the intermediate data O4 2 corresponding to the weight matrix W4.
然后结合中间数据O1 2、O2 2、O3 2和O4 2,作为该处理周期对应的中间处理数据。 Then combine the intermediate data O1 2 , O2 2 , O3 2 and O4 2 as the intermediate processing data corresponding to the processing cycle.
在第三个处理周期,利用类似的方式,得到权重矩阵W1中w1_9、w1_10、w1_11、w1_12与对应的图像特征元素a9、a10、a11以及a12生成的中间数据O1 3、权重矩阵W2中w2_9、w2_10、w2_11、w2_12与对应的图像特征元素a9、a10、a11以及a12生成的中间数据O2 3、权重矩阵W3中w3_9、w3_10、w3_11、w3_12与对应的图像特征元素a9、a10、a11以及a12生成的中间数据O3 3、权重矩阵W4中w4_9、w4_10、w4_11、w4_12与对应的图像特征元素a9、a10、a11以及a12生成的中间数据O4 3In the third processing cycle, in a similar manner, the intermediate data O1 3 generated by w1_9, w1_10, w1_11, w1_12 in the weight matrix W1 and the corresponding image feature elements a9, a10, a11 and a12, w2_9, w2_10, w2_11, w2_12 and the intermediate data O2 3 generated by the corresponding image feature elements a9, a10, a11 and a12, and w3_9, w3_10, w3_11, w3_12 in the weight matrix W3 and the corresponding image feature elements a9, a10, a11 and a12 are generated The intermediate data O3 3 , the intermediate data O4 3 generated by w4_9, w4_10, w4_11, w4_12 in the weight matrix W4 and the corresponding image feature elements a9, a10, a11 and a12.
在第四个处理周期利用类似的方式,得到权重矩阵W1中w1_13、w1_14、w1_15、w1_16与对应的图像特征元素a13、a14、a15以及a16生成的中间数据O1 4、权重矩阵W2中w2_13、w2_14、w2_15、w2_16与对应的图像特征元素a13、a14、a15以及a16生成的中间数据O2 4、权重矩阵W3中w3_13、w3_14、w3_15、w3_16与对应的图像特征元素a13、a14、a15以及a16生成的对应的中间数据O3 4、权重矩阵W4中w4_13、w4_14、w4_15、w4_16与对应的图像特征元素a13、a14、a15以及a16生成的对应的中间数据O4 4In the fourth processing cycle, the intermediate data O1 4 generated by w1_13, w1_14, w1_15, w1_16 in the weight matrix W1 and the corresponding image feature elements a13, a14, a15 and a16, w2_13, w2_14 in the weight matrix W2 are obtained in a similar manner , w2_15, w2_16 and the intermediate data O2 4 generated by the corresponding image feature elements a13, a14, a15 and a16, w3_13, w3_14, w3_15, w3_16 in the weight matrix W3 and the corresponding image feature elements a13, a14, a15 and a16 are generated The corresponding intermediate data O3 4 , the corresponding intermediate data O4 4 generated by w4_13, w4_14, w4_15, w4_16 in the weight matrix W4 and the corresponding image feature elements a13, a14, a15 and a16.
经过4个处理周期,完成利用权重矩阵W1、W2、W3和W4对待处理图像特征矩阵ImgC的处理,然后将中间数据O1 1、O1 2、O1 3、O1 4相加,得到权重矩阵W1对应的结果值O1。将中间数据O2 1、O2 2、O2 3和O2 4相加,得到权重矩阵W2对应的结果值O2,利用类似的方式,得到权重矩阵W3对应的结果值O3、权重矩阵W4对应的结果值O4。若权重矩阵除了W1、W2、W3和W4还有更多,则利用其他的权重矩阵,进行类似处理。最终结合所有权重矩阵对应的结果值,得到对待处理图像特征矩阵进行处理的结果数据,即全连接运算结果。 After 4 processing cycles, the processing of the feature matrix ImgC of the image to be processed using the weight matrices W1, W2, W3 and W4 is completed, and then the intermediate data O1 1 , O1 2 , O1 3 , and O1 4 are added to obtain the corresponding weight matrix W1. The resulting value is O1. Add the intermediate data O2 1 , O2 2 , O2 3 and O2 4 to obtain the result value O2 corresponding to the weight matrix W2. In a similar way, the result value O3 corresponding to the weight matrix W3 and the result value O4 corresponding to the weight matrix W4 are obtained. . If there are more weight matrices in addition to W1, W2, W3 and W4, use other weight matrices and perform similar processing. Finally, the result data corresponding to all the weight matrices are combined to obtain the result data of processing the feature matrix of the image to be processed, that is, the result of the full connection operation.
这里需要注意的是,在将权重矩阵Wi对应的中间数据Oi 1、Oi 2、Oi 3和Oi 4相 加时,可以在所有处理周期都执行完毕后,将权重矩阵Wi对应的中间数据Oi 1、Oi 2、Oi 3和Oi 4相加。 It should be noted here that when adding the intermediate data Oi 1 , Oi 2 , Oi 3 and Oi 4 corresponding to the weight matrix Wi, the intermediate data Oi 1 corresponding to the weight matrix Wi may be added after all processing cycles are completed. , Oi 2 , Oi 3 and Oi 4 are added.
可以在除第一个处理周期的每个处理周期,将本周期得到的各个权重矩阵对应的结果值,和上一周期得到的所有周期的结果值和值相加。然后在最后一个处理周期,可以直接输出权重矩阵Wi对应的中间数据Oi 1、Oi 2、Oi 3和Oi 4相加的结果。 In each processing cycle except the first processing cycle, the result values corresponding to each weight matrix obtained in this cycle can be added to the result values and values of all cycles obtained in the previous cycle. Then in the last processing cycle, the result of adding the intermediate data Oi 1 , Oi 2 , Oi 3 and Oi 4 corresponding to the weight matrix Wi can be directly output.
例如,以权重矩阵W2为例,在第一个处理周期结束后,PE将得到的中间数据O2 1存储至寄存器;在第二个处理周期得到中间数据O2 2后,从寄存器中读取中间数据O2 1,将中间数据O2 1和O2 2相加,得到结果值和值O2 1+O2 2,并将该结果值和值存储至寄存器中。在第三个处理周期得到中间数据O2 3后,从寄存器中取出第二个处理周期得到的结果值和值O2 1+O2 2,并将O2 1+O2 2和O2 3相加,得到第三个处理周期对应的结果值和值O2 1+O2 2+O2 3,……,如此,可以在最后一个处理周期得到权重矩阵W2对应的结果值O2。 For example, taking the weight matrix W2 as an example, after the end of the first processing cycle, the PE stores the obtained intermediate data O2 1 in the register; after obtaining the intermediate data O2 2 in the second processing cycle, the intermediate data is read from the register. O2 1 , add the intermediate data O2 1 and O2 2 to obtain the result value and value O2 1 +O2 2 , and store the result value and value in the register. After the intermediate data O2 3 is obtained in the third processing cycle, the result value and the value O2 1 +O2 2 obtained in the second processing cycle are taken out from the register, and O2 1 +O2 2 and O2 3 are added to obtain the third The result value and value O2 1 +O2 2 +O2 3 , .
(2)PE阵列的每一列包括重复特征元素。在该种情况下,响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列列数个目标特征元素传输至所述PE阵列的一行PE中,并将所述一行PE中的目标特征元素复制到其他行的PE中,作为对应PE的第一个操作数;并将与每一行PE中目标特征元素分别对应的来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。(2) Each column of the PE array includes repeating feature elements. In this case, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation to obtain intermediate processing data, including: responding When any processing cycle arrives, transfer the target feature elements of the PE array columns in the image feature matrix to be processed to a row of PEs in the PE array, and copy the target feature elements in the row of PEs to In the PEs of other rows, as the first operand of the corresponding PE; and transfer the target weight elements from different weight matrices corresponding to the target feature elements in each row of PEs to the PE array corresponding to the target feature element position. In the PE, as the second operand of the corresponding PE; use the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the corresponding processing cycle Intermediate processing data.
在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据时,例如可以采用下述方式:在所述处理周期中,将所述第一个操作数中的每行目标特征元素、和所述第二个操作数中的每行权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。When using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the intermediate processing data corresponding to the processing cycle, for example, the following method may be used: In the processing cycle, each row of target feature elements in the first operand and each row of weight elements in the second operand are weighted and summed to obtain the middle corresponding to the different weight matrices. data; based on the intermediate data corresponding to the different weight matrices, obtain the intermediate processing data corresponding to the processing period.
在该种情况下,为多个处理周期确定的目标特征元素,例如包括待处理图像特征矩阵中的一列图像特征元素。由于待处理图像特征矩阵中一列图像特征元素的数量,与PE阵列的行数相同,因此在将一列图像特征元素读入到PE阵列中时,可以是将一列图像特征元素读入到PE阵列中的一行PE中。In this case, the target feature elements determined for multiple processing cycles include, for example, a column of image feature elements in the image feature matrix to be processed. Since the number of image feature elements in a column of the image feature matrix to be processed is the same as the number of rows in the PE array, when reading a column of image feature elements into the PE array, you can read a column of image feature elements into the PE array. in a line of PE.
例如,继续以如上的待处理图像特征矩阵ImgC、和权重矩阵Wi为例,其中i取值为1至4的整数。For example, continue to take the above to-be-processed image feature matrix ImgC and weight matrix Wi as an example, where i is an integer from 1 to 4.
在第一个处理周期,将待处理图像特征矩阵中的第一列包括的图像特征元素a1、a5、a9、a13作为目标特征元素,将目标特征元素存储至第一行PE中,并在其他PE行进行复制,得到PE阵列在第一个处理周期中的第一个操作数,也即可以构成矩阵In the first processing cycle, the image feature elements a1, a5, a9, and a13 included in the first column of the image feature matrix to be processed are used as the target feature elements, and the target feature elements are stored in the first row PE, and in other The PE row is copied to obtain the first operand of the PE array in the first processing cycle, that is, a matrix can be formed
Figure PCTCN2021115789-appb-000008
Figure PCTCN2021115789-appb-000008
对应的权重数据可以构成矩阵The corresponding weight data can form a matrix
Figure PCTCN2021115789-appb-000009
Figure PCTCN2021115789-appb-000009
也即第二个操作数。That is, the second operand.
将第一个操作数和第二个操作数存储至PE阵列中时,如图3中(a)所示,PE1、PE2、PE3、PE4分别存储图像特征元素a1、a5、a9、a13,并存储权重矩阵W1中分别与图像特征元素a1、a5、a9、a13对应的权重值:w1_1、w1_5、w1_9、w1_13。When the first operand and the second operand are stored in the PE array, as shown in (a) in Figure 3, PE1, PE2, PE3, PE4 store image feature elements a1, a5, a9, a13 respectively, and The weight values corresponding to the image feature elements a1, a5, a9, and a13 in the weight matrix W1 are stored: w1_1, w1_5, w1_9, and w1_13.
类似的,PE5~PE8分别存储图像特征元素a1、a5、a9、a13,并存储权重矩阵W2中分别与图像特征元素a1、a5、a9、a13对应的权重值:w2_1、w2_5、w2_9、w2_13。Similarly, PE5 to PE8 store the image feature elements a1, a5, a9, and a13, respectively, and store the weight values corresponding to the image feature elements a1, a5, a9, and a13 in the weight matrix W2: w2_1, w2_5, w2_9, w2_13.
PE9~PE12均存储分别存储图像特征元素a1、a5、a9、a13,并分别存储权重矩阵W3中分别与图像特征元素a1、a5、a9、a13对应的权重值:w3_1、w3_5、w3_9、w3_13。PE9 to PE12 all store the image feature elements a1, a5, a9, and a13, respectively, and respectively store the weight values in the weight matrix W3 corresponding to the image feature elements a1, a5, a9, and a13: w3_1, w3_5, w3_9, w3_13.
PE13~PE16分别存储图像特征元素a1、a5、a9、a13,并存储权重矩阵W4中分别与图像特征元素a1、a5、a9、a13对应的权重值:w4_1、w4_5、w4_9、w4_13。PE13 to PE16 store image feature elements a1, a5, a9, and a13, respectively, and store weight values corresponding to image feature elements a1, a5, a9, and a13 in the weight matrix W4: w4_1, w4_5, w4_9, and w4_13.
然后,以w1_1、w1_5、w1_9、w1_13为权重,对第一行a1、a5、a9以及a13进行加权求和,得到权重矩阵W1对应的中间数据O1 1Then, using w1_1, w1_5, w1_9, and w1_13 as weights, the first row a1, a5, a9, and a13 are weighted and summed to obtain the intermediate data O1 1 corresponding to the weight matrix W1.
以w2_1、w2_5、w2_9、w2_13为权重,对第二行a1、a5、a9以及a13进行加权求和,得到权重矩阵W2对应的中间数据O2 1Taking w2_1, w2_5, w2_9, and w2_13 as weights, the second row a1, a5, a9, and a13 are weighted and summed to obtain the intermediate data O2 1 corresponding to the weight matrix W2.
以w3_1、w3_5、w3_9、w3_13为权重,对第三行a1、a5、a9以及a13进行加权求和,得到权重矩阵W3对应的中间数据O3 1Taking w3_1, w3_5, w3_9, and w3_13 as weights, the third row a1, a5, a9, and a13 are weighted and summed to obtain the intermediate data O3 1 corresponding to the weight matrix W3.
以w4_1、w4_5、w4_9、w4_13为权重,对第四行a1、a5、a9以及a13进行加权求和,得到权重矩阵W4对应的中间数据O4 1Taking w4_1, w4_5, w4_9, and w4_13 as weights, the fourth row a1, a5, a9, and a13 are weighted and summed to obtain the intermediate data O4 1 corresponding to the weight matrix W4.
然后结合中间数据O1 1、O2 1、O3 1和O4 1,作为该第一个处理周期对应的中间处理数据。 Then, combine the intermediate data O1 1 , O2 1 , O3 1 and O4 1 as the intermediate processing data corresponding to the first processing cycle.
另外,在第二个处理周期,将待处理图像特征矩阵中的第二列包括的图像特征元素a2、a6、a10、a14作为目标特征元素存储至第一行PE中,并在其他PE行进行复制,得到PE阵列在第二个处理周期中的第一个操作数,也即可以构成矩阵In addition, in the second processing cycle, the image feature elements a2, a6, a10, and a14 included in the second column of the image feature matrix to be processed are stored as target feature elements in the first row PE, and are processed in other PE rows. Copy to get the first operand of the PE array in the second processing cycle, that is, a matrix can be formed
Figure PCTCN2021115789-appb-000010
Figure PCTCN2021115789-appb-000010
对应的权重数据可以构成矩阵The corresponding weight data can form a matrix
Figure PCTCN2021115789-appb-000011
Figure PCTCN2021115789-appb-000011
将第一个操作数和第二个操作数存储至PE阵列中时,如图3中(b)所示,PE1、 PE2、PE3、PE4分别存储图像特征元素a2、a6、a10、a14,并存储权重矩阵W1中分别与图像特征元素a2、a6、a10、a14对应的权重值:w1_2、w1_6、w1_10、w1_14。When the first operand and the second operand are stored in the PE array, as shown in (b) in Figure 3, PE1, PE2, PE3, PE4 store image feature elements a2, a6, a10, a14 respectively, and The weight values corresponding to the image feature elements a2, a6, a10, and a14 in the weight matrix W1 are stored: w1_2, w1_6, w1_10, and w1_14.
类似的,PE5~PE8分别存储图像特征元素a2、a6、a10、a14,并存储权重矩阵W2中分别与图像特征元素a2、a6、a10、a14对应的权重值:w2_2、w2_6、w2_10、w2_14。Similarly, PE5 to PE8 store the image feature elements a2, a6, a10, and a14, respectively, and store the weight values corresponding to the image feature elements a2, a6, a10, and a14 in the weight matrix W2: w2_2, w2_6, w2_10, and w2_14.
PE9~PE12均存储分别存储图像特征元素a2、a6、a10、a14,并存储权重矩阵W3中分别与图像特征元素a2、a6、a10、a14对应的权重值:w3_2、w3_6、w3_10、w3_14。PE9 to PE12 all store the image feature elements a2, a6, a10, and a14, respectively, and store the weight values corresponding to the image feature elements a2, a6, a10, and a14 in the weight matrix W3: w3_2, w3_6, w3_10, w3_14.
PE13~PE16分别存储图像特征元素a2、a6、a10、a14,并存储权重矩阵W4中分别与图像特征元素a2、a6、a10、a14对应的权重值:w4_2、w4_6、w4_10、w4_14。PE13-PE16 store the image feature elements a2, a6, a10, and a14 respectively, and store the weight values corresponding to the image feature elements a2, a6, a10, and a14 in the weight matrix W4: w4_2, w4_6, w4_10, and w4_14.
然后,以w1_2、w1_6、w1_10、w1_14为权重,对第一行a2、a6、a10以及a14进行加权求和,得到权重矩阵W1对应的中间数据O1 2Then, using w1_2, w1_6, w1_10, and w1_14 as weights, perform weighted summation on the first row a2, a6, a10, and a14 to obtain the intermediate data O1 2 corresponding to the weight matrix W1.
以w2_2、w2_6、w2_10、w2_14为权重,对第二行a2、a6、a10以及a14进行加权求和,得到权重矩阵W2对应的中间数据O2 2Taking w2_2, w2_6, w2_10, and w2_14 as weights, the second row a2, a6, a10, and a14 are weighted and summed to obtain the intermediate data O2 2 corresponding to the weight matrix W2.
以w3_2、w3_6、w3_10、w3_14为权重,对第三行a2、a6、a10以及a14进行加权求和,得到权重矩阵W3对应的中间数据O3 2Taking w3_2, w3_6, w3_10, and w3_14 as weights, the third row a2, a6, a10, and a14 are weighted and summed to obtain the intermediate data O3 2 corresponding to the weight matrix W3.
以w4_2、w4_6、w4_10、w4_14为权重,对第四行a2、a6、a10以及a14进行加权求和,得到权重矩阵W4对应的中间数据O4 2Taking w4_2, w4_6, w4_10, and w4_14 as weights, the fourth row a2, a6, a10, and a14 are weighted and summed to obtain the intermediate data O4 2 corresponding to the weight matrix W4.
然后结合中间数据O1 2、O2 2、O3 2和O4 2,作为该处理周期对应的中间处理数据。 Then combine the intermediate data O1 2 , O2 2 , O3 2 and O4 2 as the intermediate processing data corresponding to the processing cycle.
在第三个处理周期,利用类似的方式,得到权重矩阵W1中w1_3、w1_7、w1_11、w1_15与对应的图像特征元素a3、a7、a11以及a15生成的中间数据O1 3、权重矩阵W2中w2_3、w2_7、w2_11、w2_15与对应的图像特征元素a3、a7、a11以及a15生成的中间数据O2 3、权重矩阵W3中w3_3、w3_7、w3_11、w3_15与对应的图像特征元素a3、a7、a11以及a15生成的中间数据O3 3、权重矩阵W4中w4_3、w4_7、w4_11、w4_15与对应的图像特征元素a3、a7、a11以及a15生成的中间数据O4 3In the third processing cycle, in a similar manner, the intermediate data O1 3 generated by w1_3, w1_7, w1_11, w1_15 in the weight matrix W1 and the corresponding image feature elements a3, a7, a11 and a15, w2_3, w2_3, The intermediate data O2 3 generated by w2_7, w2_11, w2_15 and the corresponding image feature elements a3, a7, a11 and a15, w3_3, w3_7, w3_11, w3_15 in the weight matrix W3 and the corresponding image feature elements a3, a7, a11 and a15 are generated The intermediate data O3 3 , the intermediate data O4 3 generated by w4_3, w4_7, w4_11, w4_15 in the weight matrix W4 and the corresponding image feature elements a3, a7, a11 and a15.
在第四个处理周期利用类似的方式,得到权重矩阵W1中w1_4、w1_8、w1_12、w1_16与对应的图像特征元素a4、a8、a12以及a16生成的中间数据O1 4、权重矩阵W2中w2_4、w2_8、w2_12、w2_16与对应的图像特征元素a4、a8、a12以及a16生成的中间数据O2 4、权重矩阵W3中w3_4、w3_8、w3_12、w3_16与对应的图像特征元素a4、a8、a12以及a16生成的中间数据O3 4、权重矩阵W4中w4_4、w4_8、w4_12、w4_16与对应的图像特征元素a4、a8、a12以及a16生成的中间数据O4 4In the fourth processing cycle, the intermediate data O1 4 generated by w1_4, w1_8, w1_12, w1_16 in the weight matrix W1 and the corresponding image feature elements a4, a8, a12 and a16, w2_4, w2_8 in the weight matrix W2 are obtained in a similar manner , w2_12, w2_16 and the intermediate data O2 4 generated by the corresponding image feature elements a4, a8, a12 and a16, w3_4, w3_8, w3_12, w3_16 in the weight matrix W3 and the corresponding image feature elements a4, a8, a12 and a16 are generated Intermediate data O3 4 , intermediate data O4 4 generated by w4_4, w4_8, w4_12, w4_16 and corresponding image feature elements a4, a8, a12 and a16 in the weight matrix W4.
经过4个处理周期,完成利用权重矩阵W1、W2、W3和W4对待处理图像特征矩阵ImgC的处理,然后将中间数据O1 1、O1 2、O1 3、O1 4相加,得到权重矩阵W1对应的结果值O1。将中间数据O2 1、O2 2、O2 3和O2 4相加,得到权重矩阵W2对应的结果值O2,利用类似的方式,得到权重矩阵W3对应的结果值O3、权重矩阵W4对应的结果值O4,若权重矩阵仅有W1、W2、W3和W4,则结合结果值O1、O2、O3和O4,作为对待处理图像特征矩阵进行处理的结果。若权重矩阵除了W1、W2、W3和W4还 有更多,则利用其他的权重矩阵,进行类似处理。最终结合所有权重矩阵对应的结果值,得到对待处理图像特征矩阵进行处理的结果数据,即全连接运算结果。此处,将同一权重矩阵在不同处理周期分别对应的结果值进行相加的过程与上述(1)中类似,在此不再赘述。 After 4 processing cycles, the processing of the feature matrix ImgC of the image to be processed using the weight matrices W1, W2, W3 and W4 is completed, and then the intermediate data O1 1 , O1 2 , O1 3 , and O1 4 are added to obtain the corresponding weight matrix W1. The resulting value is O1. Add the intermediate data O2 1 , O2 2 , O2 3 and O2 4 to obtain the result value O2 corresponding to the weight matrix W2. In a similar way, the result value O3 corresponding to the weight matrix W3 and the result value O4 corresponding to the weight matrix W4 are obtained. , if the weight matrix only has W1, W2, W3 and W4, then combine the result values O1, O2, O3 and O4 as the result of processing the feature matrix of the image to be processed. If there are more weight matrices in addition to W1, W2, W3 and W4, use other weight matrices and perform similar processing. Finally, the result data corresponding to all the weight matrices are combined to obtain the result data of processing the feature matrix of the image to be processed, that is, the result of the full connection operation. Here, the process of adding the result values corresponding to the same weight matrix in different processing cycles is similar to that in (1) above, and details are not repeated here.
(3)所述PE阵列的每个PE包括重复特征元素。在该种情况下,响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:响应于任一处理周期到来,将所述待处理图像特征矩阵中的一个目标特征元素传输至所述PE阵列中的一个PE中,并将所述一个PE中的目标特征元素复制到其他的PE中,作为对应PE的第一个操作数;将与该一个目标特征元素对应的来自Acl个数不同的权重矩阵的目标权重元素传输至所述PE阵列的各PE中,作为对应PE的第二个操作数,其中,Acl表示所述PE阵列中所有PE的个数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。(3) Each PE of the PE array includes repeating feature elements. In this case, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation to obtain intermediate processing data, including: responding When any processing cycle comes, transfer a target feature element in the image feature matrix to be processed to a PE in the PE array, and copy the target feature element in the one PE to other PEs , as the first operand of the corresponding PE; transfer the target weight elements from the weight matrix with different Acl numbers corresponding to the one target feature element to each PE of the PE array, as the second corresponding PE Operands, where Acl represents the number of all PEs in the PE array; use the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the Intermediate processing data corresponding to the processing cycle.
在该种情况下,为多个处理周期确定的目标特征元素,例如包括待处理图像特征中的一个特征元素,在对应的处理周期中,该一个特征元素被读入到PE阵列中的一个PE中,并由该一个PE复制到其他的PE中。In this case, the target feature element determined for multiple processing cycles includes, for example, one feature element in the image features to be processed, and in the corresponding processing cycle, the one feature element is read into a PE in the PE array , and copied from the one PE to the other PEs.
例如,继续以如上待处理图像特征矩阵ImgC、和权重矩阵Wi为例,其中i取值为1至16的整数。For example, continue to take the above to-be-processed image feature matrix ImgC and weight matrix Wi as an example, where i is an integer ranging from 1 to 16.
在第一个处理周期,将待处理图像特征矩阵中的一个图像特征元素a1作为目标特征元素,并将目标特征元素存储至第一个PE中,并将其复制到其他的PE中,得到PE阵列在第一个处理周期中的第一个操作数,也即可以构成矩阵In the first processing cycle, take an image feature element a1 in the image feature matrix to be processed as the target feature element, store the target feature element in the first PE, and copy it to other PEs to obtain PE The first operand of the array in the first processing cycle, that is, it can form a matrix
Figure PCTCN2021115789-appb-000012
Figure PCTCN2021115789-appb-000012
对应的权重数据可以构成矩阵The corresponding weight data can form a matrix
Figure PCTCN2021115789-appb-000013
Figure PCTCN2021115789-appb-000013
也即第二个操作数。That is, the second operand.
将第一个操作数和第二个操作数存储至PE阵列中时,如图4中(a)所示,PE1~PE16均存储图像特征元素a1,并分别存储权重矩阵W1~W16中与图像特征元素a1对应的权重值:w1_1~w16_1。When the first operand and the second operand are stored in the PE array, as shown in (a) in Figure 4, PE1~PE16 all store the image feature element a1, and respectively store the weight matrix W1~W16 and the image The weight value corresponding to the feature element a1: w1_1~w16_1.
然后计算w1_1和a1的乘积,得到权重矩阵W1对应的中间数据O1 1Then calculate the product of w1_1 and a1 to obtain the intermediate data O1 1 corresponding to the weight matrix W1;
计算w2_1和a1的乘积,得到权重矩阵W2对应的中间数据O2 1Calculate the product of w2_1 and a1 to obtain the intermediate data O2 1 corresponding to the weight matrix W2;
……...
计算w16_1和a1的乘积,得到权重矩阵W16对应的中间数据O16 1Calculate the product of w16_1 and a1 to obtain the intermediate data O16 1 corresponding to the weight matrix W16.
在第二个处理周期,将图像特征元素a2作为第二个处理周期的目标特征元素, 构成PE阵列在第二个处理周期的第一个操作数为:In the second processing cycle, the image feature element a2 is used as the target feature element of the second processing cycle, and the first operand of the PE array in the second processing cycle is:
Figure PCTCN2021115789-appb-000014
Figure PCTCN2021115789-appb-000014
对应的权重数据可以构成矩阵The corresponding weight data can form a matrix
Figure PCTCN2021115789-appb-000015
Figure PCTCN2021115789-appb-000015
也即第二个操作数。That is, the second operand.
将第一个操作数和第二个操作数存储至PE阵列中时,PE1~PE16均存储图像特征元素a2,并分别存储权重矩阵W1~W16中与图像特征元素a2对应的权重值:w1_2~w16_2。When the first operand and the second operand are stored in the PE array, PE1~PE16 all store the image feature element a2, and respectively store the weight value corresponding to the image feature element a2 in the weight matrix W1~W16: w1_2~ w16_2.
然后计算w1_2和a2的乘积,得到权重矩阵W1对应的中间数据O1 2Then calculate the product of w1_2 and a2 to obtain the intermediate data O1 2 corresponding to the weight matrix W1;
计算w2_2和a2的乘积,得到权重矩阵W2对应的中间数据O2 2Calculate the product of w2_2 and a2 to obtain the intermediate data O2 2 corresponding to the weight matrix W2;
……...
计算w16_2和a2的乘积,得到权重矩阵W16对应的中间数据O16 2Calculate the product of w16_2 and a2 to obtain the intermediate data O16 2 corresponding to the weight matrix W16.
依此类推,在第16个处理周期,将图像特征元素a16作为PE阵列在第16个处理周期的目标特征元素,构成第16个处理周期的第一个操作数为:By analogy, in the 16th processing cycle, the image feature element a16 is used as the target feature element of the PE array in the 16th processing cycle, and the first operand constituting the 16th processing cycle is:
Figure PCTCN2021115789-appb-000016
Figure PCTCN2021115789-appb-000016
对应的权重数据可以构成矩阵The corresponding weight data can form a matrix
Figure PCTCN2021115789-appb-000017
Figure PCTCN2021115789-appb-000017
也即第二个操作数。That is, the second operand.
如图4中(b)所示,然后计算w1_16和a16的乘积,得到权重矩阵W1对应的中间数据O1 16As shown in (b) in Figure 4, then calculate the product of w1_16 and a16 to obtain the intermediate data O1 16 corresponding to the weight matrix W1;
计算w2_16和a16的乘积,得到权重矩阵W2对应的中间数据O2 16Calculate the product of w2_16 and a16 to obtain the intermediate data O2 16 corresponding to the weight matrix W2;
……...
计算w16_16和a16的乘积,得到权重矩阵W16对应的中间数据O16 16Calculate the product of w16_16 and a16 to obtain the intermediate data O16 16 corresponding to the weight matrix W16.
经过16个处理周期,完成利用权重矩阵W1、W2、W3、……、W16对待处理图像特征矩阵ImgC的处理,然后将中间数据O1 1~O1 16相加,得到权重矩阵W1对应的结果值O1;将中间数据O2 1~O2 16相加,得到权重矩阵W2对应的结果值O2;……;将中间数据O16 1~O16 16相加,得到权重矩阵W16对应的结果值O16。最终结合16个权重矩阵对应的结果值O1~O16,得到对待处理图像特征矩阵进行处理的结果数据,即全连接运算结果。 After 16 processing cycles, the processing of the feature matrix ImgC of the image to be processed using the weight matrices W1, W2, W3, ..., W16 is completed, and then the intermediate data O1 1 -O1 16 are added to obtain the result value O1 corresponding to the weight matrix W1 ; Add the intermediate data O2 1 to O2 16 to obtain the result value O2 corresponding to the weight matrix W2; ...; Add the intermediate data O16 1 to O16 16 to obtain the result value O16 corresponding to the weight matrix W16. Finally, combining the result values O1-O16 corresponding to the 16 weight matrices, the result data of processing the feature matrix of the image to be processed is obtained, that is, the result of the full connection operation.
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
基于同一发明构思,本公开实施例中还提供了与数据处理方法对应的数据处理装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述数据处理方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, the embodiment of the present disclosure also provides a data processing apparatus corresponding to the data processing method. Reference may be made to the implementation of the method, and repeated descriptions will not be repeated.
参照图5所示,为本公开实施例提供的一种数据处理装置的示意图,所述装置包括:包括:控制器51以及PE阵列52;所述控制器51用于从待处理图像特征矩阵以及权重矩阵中,确定多个处理周期分别对应的目标特征元素以及目标权重元素;其中,所述待处理图像特征矩阵对应多个权重矩阵;所述PE阵列52用于响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据;基于多个处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据;其中,针对任一处理周期,所述PE阵列中的目标特征元素包括重复特征元素,以及该重复特征元素对应的目标权重元素分别为不同权重矩阵中与该重复特征元素对应的权重元素。5 , which is a schematic diagram of a data processing apparatus provided by an embodiment of the present disclosure, the apparatus includes: a controller 51 and a PE array 52 ; the controller 51 is used to obtain the image feature matrix to be processed and the PE array 52 ; In the weight matrix, the target feature elements and target weight elements corresponding to multiple processing cycles are determined respectively; wherein, the image feature matrix to be processed corresponds to multiple weight matrices; the PE array 52 is used to respond to the arrival of any processing cycle, Each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element, and performs a preset operation to obtain intermediate processing data; based on the intermediate processing data corresponding to multiple processing cycles, obtain the The result data of processing the image feature matrix for processing; wherein, for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the target weight elements corresponding to the repeated feature elements are respectively in different weight matrices. Repeat the weight element corresponding to the feature element.
本公开实施例提供的数据处理装置可以包括芯片、AI芯片等。本公开实施例提供的计算机设备可以包括手机等智能终端,或者也可以是其他可以用于进行数据处理的设备、服务器等,这里并不限制。The data processing apparatus provided by the embodiments of the present disclosure may include a chip, an AI chip, and the like. The computer device provided by the embodiment of the present disclosure may include a smart terminal such as a mobile phone, or may be other devices, servers, etc. that can be used for data processing, which is not limited here.
一种可能的实施方式中,所述控制器51,在确定多个处理周期分别对应的目标特征元素以及目标权重元素之前,还用于:基于所述PE阵列的尺寸,对原始待处理图像特征矩阵以及原始权重矩阵进行尺寸变换,得到所述待处理图像特征矩阵、以及所述权重矩阵。In a possible implementation manner, the controller 51, before determining the target feature elements and target weight elements corresponding to the multiple processing cycles respectively, is further configured to: The matrix and the original weight matrix are subjected to size transformation to obtain the to-be-processed image feature matrix and the weight matrix.
一种可能的实施方式中,所述多个处理周期分别对应的目标特征元素,包括所述待处理图像特征矩阵中的至少一个图像特征元素;所述目标特征元素在所述待处理图像特征矩阵中的位置,与对应的目标权重元素在相应权重矩阵中的位置一致。In a possible implementation manner, the target feature elements corresponding to the multiple processing cycles respectively include at least one image feature element in the to-be-processed image feature matrix; the target feature element is in the to-be-processed image feature matrix. The position in , is consistent with the position of the corresponding target weight element in the corresponding weight matrix.
一种可能的实施方式中,所述PE阵列的每一行包括重复特征元素;所述PE阵列52,在响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列行数个目标特征元素传输至所述PE阵列中的一列PE中,并将所述一列PE中的目标特征元素复制到其他列的PE中,作为对应PE的第一个操作数;并将与每一列PE中目标特征元素分别对应的来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each row of the PE array includes repeated feature elements; in the PE array 52, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature corresponding to the processing cycle. element and the corresponding target weight element and perform a preset operation to obtain intermediate processing data, which is used to: in response to the arrival of any processing cycle, transmit several target feature elements of the PE array row in the image feature matrix to be processed. to a row of PEs in the PE array, and copy the target feature elements in the one row of PEs to PEs in other rows as the first operand of the corresponding PE; and combine the target feature elements in each row of PEs with the The corresponding target weight elements from different weight matrices are transferred to the PE in the PE array corresponding to the position of the target feature element as the second operand of the corresponding PE; the PE array is used to store data in the PE array. Perform preset operations on the first operand and the second operand of , to obtain intermediate processing data corresponding to the processing cycle.
一种可能的实施方式中,所述PE阵列52,在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间 处理数据时,用于:在所述处理周期中,将所述第一个操作数中的每列目标特征元素、和所述第二个操作数中的每列权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。In a possible implementation manner, the PE array 52 uses the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the corresponding processing cycle. When the intermediate processing data of , is used for: in the processing cycle, weighted summation is performed on each column of target feature elements in the first operand and each column of weight elements in the second operand , to obtain the intermediate data corresponding to the different weight matrices; and based on the intermediate data corresponding to the different weight matrices, obtain the intermediate processing data corresponding to the processing period.
一种可能的实施方式中,所述PE阵列的每一列包括重复特征元素;所述PE阵列52,在响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:响应于任一处理周期到来,将所述待处理图像特征矩阵中所述PE阵列列数个目标特征元素传输至所述PE阵列中的一行PE中,并将所述一行PE中的目标特征元素复制到其他行的PE中,作为对应PE的第一个操作数;并将与每一行PE中目标特征元素分别对应的来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each column of the PE array includes repeated feature elements; in the PE array 52, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature corresponding to the processing cycle. element and the corresponding target weight element and perform a preset operation to obtain intermediate processing data, which is used to: in response to the arrival of any processing cycle, transmit the number of target feature elements in the PE array column in the image feature matrix to be processed. To a row of PEs in the PE array, and copy the target feature elements in the row of PEs to the PEs of other rows, as the first operand of the corresponding PE; and match the target feature elements in each row of PEs The corresponding target weight elements from different weight matrices are transferred to the PE in the PE array corresponding to the position of the target feature element as the second operand of the corresponding PE; the PE array is used to store data in the PE array. Perform preset operations on the first operand and the second operand of , to obtain intermediate processing data corresponding to the processing cycle.
一种可能的实施方式中,所述PE阵列52,在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据时用于:在所述处理周期中,将所述第一个操作数中的每行目标特征元素、和所述第二个操作数中的每行权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。In a possible implementation manner, the PE array 52 uses the PE array to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the corresponding processing cycle. The intermediate processing data is used for: in the processing cycle, weighted summation is performed on each row of target feature elements in the first operand and each row of weight elements in the second operand, Obtain intermediate data corresponding to the different weight matrices; and obtain intermediate processing data corresponding to the processing period based on the intermediate data corresponding to the different weight matrices.
一种可能的实施方式中,所述PE阵列的每个PE包括重复特征元素;所述PE阵列52,在响应于任一处理周期到来,PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:响应于任一处理周期到来,将所述待处理图像特征矩阵中的一个目标特征元素传输至所述PE阵列中的一个PE中,并将所述一个PE中的目标特征元素复制到其他的PE中,作为对应PE的第一个操作数;将与该一个目标特征元素对应的来自所述PE阵列中所有PE个数不同的权重矩阵的目标权重元素传输至所述PE阵列的各PE中,作为对应PE的第二个操作数;利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。In a possible implementation manner, each PE in the PE array includes repeating feature elements; in the PE array 52, in response to the arrival of any processing cycle, each PE in the PE array obtains the target corresponding to the processing cycle. The feature elements and the corresponding target weight elements are subjected to a preset operation to obtain intermediate processing data, which are used for: in response to the arrival of any processing cycle, a target feature element in the feature matrix of the to-be-processed image is transmitted to the PE in one PE in the array, and copy the target feature element in the one PE to other PEs as the first operand of the corresponding PE; copy the target feature element corresponding to the one from the PE array The target weight elements of all weight matrices with different numbers of PEs are transferred to each PE of the PE array as the second operand of the corresponding PE; the first operation stored in the PE array is performed by the PE array A preset operation is performed on the number and the second operand to obtain the intermediate processing data corresponding to the processing cycle.
一种可能的实施方式中,所述PE阵列52,在基于多个所述处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据时,用于:将多个处理周期中分别对应的中间处理数据中,属于同一权重矩阵的中间数据进行累加,得到各个权重矩阵对应的结果值;基于多个权重矩阵分别对应的结果值,得到对所述待处理图像特征矩阵进行处理的结果数据。In a possible implementation manner, when the PE array 52 obtains the result data of processing the image feature matrix to be processed based on the intermediate processing data corresponding to the plurality of processing cycles, the PE array 52 is used to: Among the intermediate processing data corresponding to each processing cycle, the intermediate data belonging to the same weight matrix are accumulated to obtain the result value corresponding to each weight matrix; based on the result values corresponding to the multiple weight matrices, the feature of the image to be processed is obtained. Matrix to process the resulting data.
一种可能的实施方式中,任一处理周期对应的预设运算,包括:对所述待处理图像特征矩阵进行全连接运算的子运算。In a possible implementation manner, the preset operation corresponding to any processing cycle includes: a sub-operation of performing a fully connected operation on the feature matrix of the to-be-processed image.
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.
本公开实施例还提供了一种计算机设备,如图6所示,为本公开实施例提供的计算机设备结构示意图,包括:处理器61、存储器62和本公开提供的数据处理装置63。An embodiment of the present disclosure also provides a computer device. As shown in FIG. 6 , a schematic structural diagram of the computer device provided by the embodiment of the present disclosure includes: a processor 61 , a memory 62 , and a data processing apparatus 63 provided by the present disclosure.
上述存储器62包括内存621和外部存储器622;这里的内存621也称内存储器,用于暂时存放处理器61中的运算数据,以及与硬盘等外部存储器622交换的数据,处理器61通过内存621与外部存储器622进行数据交换。The above-mentioned memory 62 includes a memory 621 and an external memory 622; the memory 621 here is also called an internal memory, and is used to temporarily store the operation data in the processor 61 and the data exchanged with the external memory 622 such as the hard disk. The external memory 622 performs data exchange.
上述指令的具体执行过程可以参考本公开实施例中所述的数据处理方法的步骤,此处不再赘述。For the specific execution process of the above instruction, reference may be made to the steps of the data processing method described in the embodiments of the present disclosure, and details are not repeated here.
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的数据处理方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。Embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the data processing method described in the foregoing method embodiments are executed. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的数据处理方法的步骤,具体可参见上述方法实施例,在此不再赘述。Embodiments of the present disclosure further provide a computer program product, where the computer program product carries program codes, and the instructions included in the program codes can be used to execute the steps of the data processing methods described in the foregoing method embodiments. For details, please refer to the foregoing methods. The embodiments are not repeated here.
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。Wherein, the above-mentioned computer program product can be specifically implemented by means of hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure rather than limit them. The protection scope of the present disclosure is not limited thereto, although referring to the foregoing The embodiments describe the present disclosure in detail. Those of ordinary skill in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Changes can be easily thought of, or equivalent replacements are made to some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims (22)

  1. 一种数据处理方法,其特征在于,包括:A data processing method, comprising:
    从待处理图像特征矩阵以及对应的多个权重矩阵中,确定多个处理周期分别对应的目标特征元素以及目标权重元素;From the feature matrix of the image to be processed and the corresponding multiple weight matrices, determine the target feature element and the target weight element corresponding to the multiple processing cycles respectively;
    响应于任一处理周期到来,处理引擎PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据;其中,针对任一处理周期,所述PE阵列中的目标特征元素包括重复特征元素,以及该重复特征元素对应的目标权重元素分别为不同权重矩阵中与该重复特征元素对应的权重元素;In response to the arrival of any processing cycle, each PE in the processing engine PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation to obtain intermediate processing data; wherein, for any processing cycle , the target feature elements in the PE array include repeated feature elements, and the target weight elements corresponding to the repeated feature elements are respectively the weight elements corresponding to the repeated feature elements in different weight matrices;
    基于多个处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据。Based on the intermediate processing data corresponding to the plurality of processing cycles, the result data of processing the feature matrix of the image to be processed is obtained.
  2. 根据权利要求1所述的数据处理方法,其特征在于,所述确定多个处理周期分别对应的目标特征元素以及目标权重元素之前,还包括:The data processing method according to claim 1, wherein before determining the target feature elements and target weight elements corresponding to the multiple processing cycles respectively, the method further comprises:
    基于所述PE阵列的尺寸,对原始待处理图像特征矩阵以及原始权重矩阵进行尺寸变换,得到所述待处理图像特征矩阵、以及所述权重矩阵。Based on the size of the PE array, size transformation is performed on the original to-be-processed image feature matrix and the original weight matrix to obtain the to-be-processed image feature matrix and the weight matrix.
  3. 根据权利要求1或2所述的数据处理方法,其特征在于,The data processing method according to claim 1 or 2, characterized in that:
    所述多个处理周期分别对应的目标特征元素,包括所述待处理图像特征矩阵中的至少一个图像特征元素;The target feature elements corresponding to the multiple processing cycles respectively include at least one image feature element in the to-be-processed image feature matrix;
    所述目标特征元素在所述待处理图像特征矩阵中的位置,与对应的目标权重元素在相应权重矩阵中的位置一致。The position of the target feature element in the feature matrix of the image to be processed is consistent with the position of the corresponding target weight element in the corresponding weight matrix.
  4. 根据权利要求1至3任一项所述的数据处理方法,其特征在于,响应于任一处理周期到来,所述PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:The data processing method according to any one of claims 1 to 3, wherein, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target Weight elements and perform preset operations to obtain intermediate processing data, including:
    响应于任一处理周期到来,将所述待处理图像特征矩阵中Ar个目标特征元素传输至所述PE阵列的一列PE中,并将所述一列PE中的目标特征元素复制到其他列的PE中,作为对应PE的第一个操作数,其中Ar表示所述PE阵列的行数;并In response to the arrival of any processing cycle, the Ar target feature elements in the feature matrix of the to-be-processed image are transferred to one column of PEs in the PE array, and the target feature elements in the one column of PEs are copied to PEs in other columns , as the first operand of the corresponding PE, where Ar represents the row number of the PE array; and
    将与每一列PE中目标特征元素分别对应的、来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;The target weight elements corresponding to the target feature elements in each column of PEs and from different weight matrices are transferred to the PEs corresponding to the target feature element positions in the PE array, as the second operand of the corresponding PE;
    利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。Using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array, intermediate processing data corresponding to the processing cycle is obtained.
  5. 根据权利要求4所述的数据处理方法,其特征在于,所述利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据,包括:The data processing method according to claim 4, wherein the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the processing The intermediate processing data corresponding to the period, including:
    在所述处理周期中,将所述第一个操作数中的每列目标特征元素、和所述第二个操作数中的每列权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;In the processing cycle, weighted summation is performed on each column of target feature elements in the first operand and each column of weight elements in the second operand to obtain the corresponding value of the different weight matrices. intermediate data;
    基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。Based on the intermediate data corresponding to the different weight matrices, the intermediate processing data corresponding to the processing period is obtained.
  6. 根据权利要求1至3任一项所述的数据处理方法,其特征在于,响应于任一处理周期到来,所述PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:The data processing method according to any one of claims 1 to 3, wherein, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target Weight elements and perform preset operations to obtain intermediate processing data, including:
    响应于任一处理周期到来,将所述待处理图像特征矩阵中Ac个目标特征元素传输至所述PE阵列的一行PE中,并将所述一行PE中的目标特征元素复制到其他行的PE中,作为对应PE的第一个操作数,其中Ac表示所述PE阵列的列数;并In response to the arrival of any processing cycle, the Ac target feature elements in the feature matrix of the image to be processed are transferred to a row of PEs in the PE array, and the target feature elements in the row of PEs are copied to PEs in other rows , as the first operand of the corresponding PE, where Ac represents the number of columns of the PE array; and
    将与每一行PE中目标特征元素分别对应的、来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;The target weight elements corresponding to the target feature elements in each row of PEs and from different weight matrices are transferred to the PEs corresponding to the target feature element positions in the PE array, as the second operand of the corresponding PE;
    利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。Using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array, intermediate processing data corresponding to the processing cycle is obtained.
  7. 根据权利要求6所述的数据处理方法,其特征在于,所述利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据,包括:The data processing method according to claim 6, wherein the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array to obtain the processing The intermediate processing data corresponding to the period, including:
    在所述处理周期中,将所述第一个操作数中的每行目标特征元素、和所述第二个操作数中的每行权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;In the processing cycle, weighted summation is performed on each row of target feature elements in the first operand and each row of weight elements in the second operand to obtain the corresponding value of the different weight matrices. intermediate data;
    基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。Based on the intermediate data corresponding to the different weight matrices, the intermediate processing data corresponding to the processing period is obtained.
  8. 根据权利要求3所述的数据处理方法,其特征在于,响应于任一处理周期到来,所述PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据,包括:The data processing method according to claim 3, wherein, in response to the arrival of any processing cycle, each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element, and performs pre-processing steps. Set the operation to obtain the intermediate processing data, including:
    响应于任一处理周期到来,将所述待处理图像特征矩阵中的一个目标特征元素传输至所述PE阵列的一个PE中,并将所述一个PE中的目标特征元素复制到其他的PE中,作为对应PE的第一个操作数;In response to the arrival of any processing cycle, a target feature element in the feature matrix of the image to be processed is transferred to a PE of the PE array, and the target feature element in the one PE is copied to other PEs , as the first operand of the corresponding PE;
    将与该一个目标特征元素对应的来自Acl个数不同的权重矩阵的目标权重元素,传输至所述PE阵列的各PE中,作为对应PE的第二个操作数,其中Acl表示所述PE阵列中所有PE的个数;Transfer the target weight elements from weight matrices with different Acl numbers corresponding to the one target feature element to each PE of the PE array as the second operand of the corresponding PE, where Acl represents the PE array The number of all PEs in;
    利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。Using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array, intermediate processing data corresponding to the processing cycle is obtained.
  9. 根据权利要求1至8任一项所述的数据处理方法,其特征在于,所述基于多个所述处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据,包括:The data processing method according to any one of claims 1 to 8, wherein the result data of processing the image feature matrix to be processed is obtained based on the intermediate processing data corresponding to the plurality of processing cycles respectively ,include:
    将所述多个处理周期分别对应的中间处理数据中,属于同一权重矩阵的中间数据进行累加,得到各个权重矩阵对应的结果值;Accumulate the intermediate data belonging to the same weight matrix in the intermediate processing data corresponding to the multiple processing cycles respectively, to obtain the result value corresponding to each weight matrix;
    基于多个权重矩阵分别对应的结果值,得到对所述待处理图像特征矩阵进行处理的结果数据。Based on the result values corresponding to the multiple weight matrices respectively, the result data of processing the feature matrix of the image to be processed is obtained.
  10. 根据权利要求1至9任一项所述的数据处理方法,其特征在于,所述预设运算,包括对所述待处理图像特征矩阵进行全连接运算的子运算。The data processing method according to any one of claims 1 to 9, wherein the preset operation includes a sub-operation of performing a full connection operation on the feature matrix of the image to be processed.
  11. 一种数据处理装置,其特征在于,包括控制器以及处理引擎PE阵列;A data processing device, comprising a controller and a processing engine PE array;
    所述控制器用于从待处理图像特征矩阵以及对应的多个权重矩阵中,确定多个处理周期分别对应的目标特征元素以及目标权重元素;The controller is configured to determine, from the feature matrix of the image to be processed and a plurality of corresponding weight matrices, target feature elements and target weight elements corresponding to a plurality of processing cycles respectively;
    所述PE阵列用于响应于任一处理周期到来,所述PE阵列中的每个PE获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据;基于多个处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据;The PE array is used to respond to the arrival of any processing cycle, and each PE in the PE array obtains the target feature element corresponding to the processing cycle and the corresponding target weight element and performs a preset operation to obtain intermediate processing data; based on The intermediate processing data corresponding to the multiple processing cycles respectively, to obtain the result data of processing the feature matrix of the to-be-processed image;
    其中,针对任一处理周期,所述PE阵列中的目标特征元素包括重复特征元素,以及该重复特征元素对应的目标权重元素分别为不同权重矩阵中与该重复特征元素对应的权重元素。Wherein, for any processing cycle, the target feature elements in the PE array include repeated feature elements, and the target weight elements corresponding to the repeated feature elements are weight elements corresponding to the repeated feature elements in different weight matrices, respectively.
  12. 根据权利要求11所述数据处理装置,其特征在于,所述控制器,在确定多个处理周期分别对应的目标特征元素以及目标权重元素之前,还用于:The data processing device according to claim 11, wherein the controller, before determining the target feature elements and target weight elements corresponding to the plurality of processing cycles respectively, is further configured to:
    基于所述PE阵列的尺寸,对原始待处理图像特征矩阵以及原始权重矩阵进行尺寸变换,得到所述待处理图像特征矩阵、以及所述权重矩阵。Based on the size of the PE array, size transformation is performed on the original to-be-processed image feature matrix and the original weight matrix to obtain the to-be-processed image feature matrix and the weight matrix.
  13. 根据权利要求11或12所述数据处理装置,其特征在于,所述多个处理周期分别对应的目标特征元素,包括所述待处理图像特征矩阵中的至少一个图像特征元素;The data processing device according to claim 11 or 12, wherein the target feature elements corresponding to the multiple processing cycles respectively include at least one image feature element in the to-be-processed image feature matrix;
    所述目标特征元素在所述待处理图像特征矩阵中的位置,与对应的目标权重元素在相应权重矩阵中的位置一致。The position of the target feature element in the feature matrix of the image to be processed is consistent with the position of the corresponding target weight element in the corresponding weight matrix.
  14. 根据权利要求11至13任一项所述数据处理装置,其特征在于,所述PE阵列中的每个PE,在响应于任一处理周期到来,获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:The data processing device according to any one of claims 11 to 13, wherein each PE in the PE array, in response to the arrival of any processing cycle, acquires the target feature element corresponding to the processing cycle and the corresponding Target weight elements and perform preset operations to obtain intermediate processing data, used for:
    响应于任一处理周期到来,将所述待处理图像特征矩阵中Ar个目标特征元素传输至所述PE阵列的一列PE中,并将所述一列PE中的目标特征元素复制到其他列的PE中,作为对应PE的第一个操作数,其中Ar表示所述PE阵列的行数;并In response to the arrival of any processing cycle, the Ar target feature elements in the feature matrix of the to-be-processed image are transferred to one column of PEs in the PE array, and the target feature elements in the one column of PEs are copied to PEs in other columns , as the first operand of the corresponding PE, where Ar represents the row number of the PE array; and
    将与每一列PE中目标特征元素分别对应的、来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;The target weight elements corresponding to the target feature elements in each column of PEs and from different weight matrices are transferred to the PEs corresponding to the target feature element positions in the PE array, as the second operand of the corresponding PE;
    利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。Using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array, intermediate processing data corresponding to the processing cycle is obtained.
  15. 根据权利要求14所述数据处理装置,其特征在于,所述PE阵列,在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据时,用于:The data processing apparatus according to claim 14, wherein, in the PE array, the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array to obtain When the intermediate processing data corresponding to the processing cycle is used, it is used for:
    在所述处理周期中,将所述第一个操作数中的每列目标特征元素、和所述第二个操作数中的每列权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;In the processing cycle, weighted summation is performed on each column of target feature elements in the first operand and each column of weight elements in the second operand to obtain the corresponding value of the different weight matrices. intermediate data;
    基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。Based on the intermediate data corresponding to the different weight matrices, the intermediate processing data corresponding to the processing period is obtained.
  16. 根据权利要求11至13任一项所述的数据处理装置,其特征在于,所述PE阵列中的每个PE,在响应于任一处理周期到来,获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:The data processing apparatus according to any one of claims 11 to 13, wherein each PE in the PE array, in response to the arrival of any processing cycle, acquires the target feature element corresponding to the processing cycle and the corresponding The target weight element of , and the preset operation is performed to obtain intermediate processing data, which is used for:
    响应于任一处理周期到来,将所述待处理图像特征矩阵中Ac个目标特征元素传输至所述PE阵列的一行PE中,并将所述一行PE中的目标特征元素复制到其他行的PE中,作为对应PE的第一个操作数,其中Ac表示所述PE阵列的列数;并In response to the arrival of any processing cycle, the Ac target feature elements in the feature matrix of the image to be processed are transferred to a row of PEs in the PE array, and the target feature elements in the row of PEs are copied to PEs in other rows , as the first operand of the corresponding PE, where Ac represents the number of columns of the PE array; and
    将与每一行PE中目标特征元素分别对应的、来自不同权重矩阵的目标权重元素传输至所述PE阵列中与目标特征元素位置对应的PE中,作为对应PE的第二个操作数;The target weight elements corresponding to the target feature elements in each row of PEs and from different weight matrices are transferred to the PEs corresponding to the target feature element positions in the PE array, as the second operand of the corresponding PE;
    利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。Using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array, intermediate processing data corresponding to the processing cycle is obtained.
  17. 根据权利要求16所述的数据处理装置,其特征在于,所述PE阵列,在利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据时用于:The data processing apparatus according to claim 16, wherein the PE array is used to perform a preset operation on the first operand and the second operand stored in the PE array by using the PE array, When obtaining the intermediate processing data corresponding to the processing cycle, it is used for:
    在所述处理周期中,将所述第一个操作数中的每行目标特征元素、和所述第二个操作数中的每行权重元素进行加权求和,得到所述不同权重矩阵对应的中间数据;In the processing cycle, weighted summation is performed on each row of target feature elements in the first operand and each row of weight elements in the second operand to obtain the corresponding value of the different weight matrices. intermediate data;
    基于所述不同权重矩阵对应的中间数据,得到所述处理周期对应的中间处理数据。Based on the intermediate data corresponding to the different weight matrices, the intermediate processing data corresponding to the processing period is obtained.
  18. 根据权利要求13所述的数据处理装置,其特征在于,所述PE阵列中的每个PE,在响应于任一处理周期到来,获取该处理周期对应的目标特征元素以及对应的目标权重元素并进行预设运算,得到中间处理数据时,用于:The data processing apparatus according to claim 13, wherein each PE in the PE array, in response to the arrival of any processing cycle, acquires the target feature element and the corresponding target weight element corresponding to the processing cycle, and When performing preset operations to obtain intermediate processing data, it is used to:
    响应于任一处理周期到来,将所述待处理图像特征矩阵中的一个目标特征元素传输至所述PE阵列的一个PE中,并将所述一个PE中的目标特征元素复制到其他的PE中,作为对应PE的第一个操作数;In response to the arrival of any processing cycle, a target feature element in the feature matrix of the image to be processed is transferred to a PE of the PE array, and the target feature element in the one PE is copied to other PEs , as the first operand of the corresponding PE;
    将与该一个目标特征元素对应的来自Acl个数不同的权重矩阵的目标权重元素,传输至所述PE阵列的各PE中,作为对应PE的第二个操作数,其中Acl表示所述PE阵列中所有PE的个数;Transfer the target weight elements from weight matrices with different Acl numbers corresponding to the one target feature element to each PE of the PE array as the second operand of the corresponding PE, where Acl represents the PE array The number of all PEs in;
    利用所述PE阵列对所述PE阵列中存储的第一个操作数以及第二个操作数进行预设运算,得到所述处理周期对应的中间处理数据。Using the PE array to perform a preset operation on the first operand and the second operand stored in the PE array, intermediate processing data corresponding to the processing cycle is obtained.
  19. 根据权利要求11至18任一项所述的数据处理装置,其特征在于,所述PE阵列,在基于多个所述处理周期分别对应的中间处理数据,得到对所述待处理图像特征矩阵进行处理的结果数据时,用于:The data processing apparatus according to any one of claims 11 to 18, wherein the PE array is based on the intermediate processing data corresponding to a plurality of the processing cycles to obtain the feature matrix of the image to be processed. When processing the resulting data, use:
    将所述多个处理周期分别对应的中间处理数据中,属于同一权重矩阵的中间数据进行累加,得到各个权重矩阵对应的结果值;Accumulate the intermediate data belonging to the same weight matrix in the intermediate processing data corresponding to the multiple processing cycles respectively, to obtain the result value corresponding to each weight matrix;
    基于多个权重矩阵分别对应的结果值,得到对所述待处理图像特征矩阵进行处理的结果数据。Based on the result values corresponding to the multiple weight matrices respectively, the result data of processing the feature matrix of the image to be processed is obtained.
  20. 根据权利要求11至19任一项所述的数据处理装置,其特征在于,所述预设运算,包括:对所述待处理图像特征矩阵进行全连接运算的子运算。The data processing apparatus according to any one of claims 11 to 19, wherein the preset operation includes: a sub-operation of a fully connected operation on the feature matrix of the image to be processed.
  21. 一种计算机设备,其特征在于,包括:处理器、存储器、以及如权利要求11至20任一项所述的数据处理装置。A computer device, characterized in that it comprises: a processor, a memory, and the data processing device according to any one of claims 11 to 20.
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被控制器以及所述PE阵列运行时执行如权利要求1至10任一项所述的数据处理方法的步骤。A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and the computer program is executed by the controller and the PE array when running as described in any one of claims 1 to 10. The steps of the data processing method described.
PCT/CN2021/115789 2021-02-26 2021-08-31 Data processing method and apparatus, computer device and storage medium WO2022179075A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110221235.3A CN112966729B (en) 2021-02-26 2021-02-26 Data processing method and device, computer equipment and storage medium
CN202110221235.3 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022179075A1 true WO2022179075A1 (en) 2022-09-01

Family

ID=76275794

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/115789 WO2022179075A1 (en) 2021-02-26 2021-08-31 Data processing method and apparatus, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN112966729B (en)
WO (1) WO2022179075A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966729B (en) * 2021-02-26 2023-01-31 成都商汤科技有限公司 Data processing method and device, computer equipment and storage medium
CN113253336B (en) * 2021-07-02 2021-10-01 深圳市翩翩科技有限公司 Earthquake prediction method and system based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
US20190220734A1 (en) * 2016-10-11 2019-07-18 The Research Foundation For The State University Of New York System, Method, and Accelerator to Process Convolutional Neural Network Layers
CN112149047A (en) * 2019-06-27 2020-12-29 深圳市中兴微电子技术有限公司 Data processing method and device, storage medium and electronic device
CN112966729A (en) * 2021-02-26 2021-06-15 成都商汤科技有限公司 Data processing method and device, computer equipment and storage medium

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106503796B (en) * 2015-10-08 2019-02-12 上海兆芯集成电路有限公司 Multioperation neural network unit
US10489479B1 (en) * 2016-09-12 2019-11-26 Habana Labs Ltd. Matrix multiplication engine
US10515302B2 (en) * 2016-12-08 2019-12-24 Via Alliance Semiconductor Co., Ltd. Neural network unit with mixed data and weight size computation capability
CN108229645B (en) * 2017-04-28 2021-08-06 北京市商汤科技开发有限公司 Convolution acceleration and calculation processing method and device, electronic equipment and storage medium
CN108805275B (en) * 2017-06-16 2021-01-22 上海兆芯集成电路有限公司 Programmable device, method of operation thereof, and computer usable medium
CN109213962B (en) * 2017-07-07 2020-10-09 华为技术有限公司 Operation accelerator
US10671349B2 (en) * 2017-07-24 2020-06-02 Tesla, Inc. Accelerated mathematical engine
CN108665059A (en) * 2018-05-22 2018-10-16 中国科学技术大学苏州研究院 Convolutional neural networks acceleration system based on field programmable gate array
CN110659445B (en) * 2018-06-29 2022-12-30 龙芯中科技术股份有限公司 Arithmetic device and processing method thereof
CN109635944B (en) * 2018-12-24 2020-10-27 西安交通大学 Sparse convolution neural network accelerator and implementation method
CN109740115A (en) * 2019-01-08 2019-05-10 郑州云海信息技术有限公司 A kind of method, device and equipment for realizing matrix multiplication operation
US11379555B2 (en) * 2019-06-28 2022-07-05 Amazon Technologies, Inc. Dilated convolution using systolic array
CN110705687B (en) * 2019-09-05 2020-11-03 北京三快在线科技有限公司 Convolution neural network hardware computing device and method
CN111414994B (en) * 2020-03-03 2022-07-12 哈尔滨工业大学 FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN111582467B (en) * 2020-05-14 2023-12-22 上海商汤智能科技有限公司 Artificial intelligence accelerator and electronic equipment
CN111967582B (en) * 2020-08-07 2022-07-08 苏州浪潮智能科技有限公司 CNN convolutional layer operation method and CNN convolutional layer operation accelerator
CN111897579B (en) * 2020-08-18 2024-01-30 腾讯科技(深圳)有限公司 Image data processing method, device, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
US20190220734A1 (en) * 2016-10-11 2019-07-18 The Research Foundation For The State University Of New York System, Method, and Accelerator to Process Convolutional Neural Network Layers
CN112149047A (en) * 2019-06-27 2020-12-29 深圳市中兴微电子技术有限公司 Data processing method and device, storage medium and electronic device
CN112966729A (en) * 2021-02-26 2021-06-15 成都商汤科技有限公司 Data processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112966729A (en) 2021-06-15
CN112966729B (en) 2023-01-31

Similar Documents

Publication Publication Date Title
US11341399B2 (en) Reducing power consumption in a neural network processor by skipping processing operations
US20210117810A1 (en) On-chip code breakpoint debugging method, on-chip processor, and chip breakpoint debugging system
CN109003132B (en) Advertisement recommendation method and related product
US20210224125A1 (en) Operation Accelerator, Processing Method, and Related Device
US10943167B1 (en) Restructuring a multi-dimensional array
US11775430B1 (en) Memory access for multiple circuit components
CN109522052B (en) Computing device and board card
US9886377B2 (en) Pipelined convolutional operations for processing clusters
TW202026858A (en) Exploiting activation sparsity in deep neural networks
WO2022179075A1 (en) Data processing method and apparatus, computer device and storage medium
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
WO2022179074A1 (en) Data processing apparatus and method, computer device, and storage medium
US11500811B2 (en) Apparatuses and methods for map reduce
JP2018120549A (en) Processor, information processing device, and operation method for processor
CN110580519B (en) Convolution operation device and method thereof
CN110991630A (en) Convolutional neural network processor for edge calculation
WO2021218037A1 (en) Target detection method and apparatus, computer device and storage medium
WO2023065983A1 (en) Computing apparatus, neural network processing device, chip, and data processing method
EP3844610B1 (en) Method and system for performing parallel computation
CN111353598A (en) Neural network compression method, electronic device and computer readable medium
US11467973B1 (en) Fine-grained access memory controller
WO2021082723A1 (en) Operation apparatus
US11755892B2 (en) Multi-size convolutional layer
CN111382835A (en) Neural network compression method, electronic device and computer readable medium
KR102311659B1 (en) Apparatus for computing based on convolutional neural network model and method for operating the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927502

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE