WO2021147196A1 - Convolution operation method, apparatus and device, and storage medium - Google Patents

Convolution operation method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2021147196A1
WO2021147196A1 PCT/CN2020/087105 CN2020087105W WO2021147196A1 WO 2021147196 A1 WO2021147196 A1 WO 2021147196A1 CN 2020087105 W CN2020087105 W CN 2020087105W WO 2021147196 A1 WO2021147196 A1 WO 2021147196A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
convolution
sample data
intermediate matrix
generate
Prior art date
Application number
PCT/CN2020/087105
Other languages
French (fr)
Chinese (zh)
Inventor
董刚
赵雅倩
李仁刚
杨宏斌
刘海威
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2021147196A1 publication Critical patent/WO2021147196A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the field of deep learning, in particular to a convolution operation method, device, equipment and storage medium.
  • Deep learning refers to the internal laws and representation levels of learning sample data. Its ultimate goal is to enable the machine to have the ability to analyze and learn like humans, recognize text, images, and sound data, and perform convolution operations on the sample data. Feature extraction is currently an important means to realize deep learning.
  • the operation of inner product of sample data and convolution kernel in different data windows in an image is called convolution, and its calculation process is also called filtering.
  • the essence is to extract the characteristics of different frequency bands of the image.
  • the convolution kernel is also called a filter. It is a set of neurons with fixed weights, usually a square two-dimensional matrix. The matrix stores the coefficients for processing the data in the receptive field.
  • the filtering of a convolution kernel can be used Extract specific features, for example, you can extract the contours of objects in the image, the color depth, and so on. Because the matrix elements of the sample data currently acquired in the data window are often more than the matrix elements of the convolution kernel, and the number of matrix elements varies greatly, it is difficult to ensure the overall efficiency of the convolution kernel for convolution operations on the sample data.
  • the purpose of the present invention is to provide a convolution operation method, device, equipment and storage medium to relatively ensure the overall efficiency of the convolution operation process.
  • the present invention provides a convolution operation method, including:
  • performing a convolution operation on the first intermediate matrix through the second intermediate matrix includes:
  • reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory includes:
  • performing a convolution operation on the first intermediate matrix through the second intermediate matrix includes:
  • a convolution operation is performed on the first intermediate matrix through the second intermediate matrix.
  • the method further includes:
  • the convolution result is stored in the storage location corresponding to the sample data matrix in the memory.
  • performing an expansion operation on the sample data matrix to generate the first intermediate matrix includes:
  • the corresponding first transposed data columns are combined into a first intermediate matrix.
  • performing an expansion operation on the convolution kernel matrix to generate the second intermediate matrix includes:
  • a second intermediate matrix is combined based on a plurality of second transposed data columns.
  • performing an expansion operation on the sample data matrix to generate the first intermediate matrix includes:
  • an expansion operation is sequentially performed to generate a first intermediate matrix.
  • the present invention also provides a convolution operation device, including:
  • the matrix reading module is used to read the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory;
  • the preprocessing module is used to perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix, the number of rows and columns between the first intermediate matrix and the second intermediate matrix The numbers are the same;
  • the convolution execution module is configured to perform a convolution operation on the first intermediate matrix through the second intermediate matrix and generate a convolution result.
  • the convolution execution module includes:
  • the matrix product module is used to perform matrix multiplication operations on each first intermediate matrix through the second intermediate matrix and generate a corresponding result matrix
  • the accumulation module is used to perform accumulation operations on each result matrix.
  • the matrix reading module includes:
  • the memory reading module is used to read the sample data matrix in the DDR memory, and read the convolution kernel matrix corresponding to the sample data matrix in the HBM2 memory.
  • the present invention also provides a convolution operation device, including:
  • Memory used to store computer programs
  • the processor is used to implement the steps of the above-mentioned convolution operation method when the computer program is executed.
  • the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the convolution operation method as described above are realized.
  • the convolution operation method provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
  • the number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution
  • the second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result.
  • the first intermediate matrix generated is equivalent to the sample data matrix
  • the second intermediate matrix generated is equivalent to the convolution kernel matrix. Therefore, the second intermediate matrix is equivalent to the convolution kernel matrix.
  • Performing a convolution operation on the first intermediate matrix by the matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring the convolution operation process Overall efficiency.
  • the present invention also provides a convolution operation device, equipment and storage medium, and the beneficial effects are the same as those described above.
  • FIG. 1 is a flowchart of a convolution operation method disclosed in an embodiment of the present invention
  • Figure 2.a is a schematic diagram of the expansion operation of a sample data matrix in a specific application scenario disclosed in an embodiment of the present invention
  • Figure 2.b is a schematic diagram of the expansion operation of a convolution kernel matrix in a specific application scenario disclosed in an embodiment of the present invention
  • FIG. 3 is a flowchart of a specific convolution operation method disclosed in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the composition structure of a convolution operation device disclosed in an embodiment of the present invention.
  • the operation of inner product of sample data and convolution kernel in different data windows in an image is called convolution, and its calculation process is also called filtering.
  • the essence is to extract the characteristics of different frequency bands of the image.
  • the convolution kernel is also called a filter. It is a set of neurons with fixed weights, usually a square two-dimensional matrix. The matrix stores the coefficients for processing the data in the receptive field.
  • the filtering of a convolution kernel can be used Extract specific features, for example, you can extract the contours of objects in the image, the color depth, and so on. Because the matrix elements of the sample data currently acquired in the data window are often more than the matrix elements of the convolution kernel, and the number of matrix elements varies greatly, it is difficult to ensure the overall efficiency of the convolution kernel for convolution operations on the sample data.
  • the core of the present invention is to provide a convolution operation method to relatively ensure the overall efficiency of the convolution operation process.
  • an embodiment of the present invention discloses a convolution operation method, including:
  • Step S10 Read the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
  • the sample data matrix read in this step can be a data matrix extracted from samples including but not limited to pictures, audio, text, etc.
  • the convolution kernel matrix corresponding to the sample data matrix is the pair of samples.
  • the data matrix is a matrix for feature extraction.
  • the elements in the convolution kernel matrix are set according to the specific types of features extracted in the sample data matrix, and the convolution kernel matrix generates feature images by performing convolution operations on the sample data matrix, that is, volume
  • the feature image can reflect the distribution state of the corresponding type of feature in the sample data matrix.
  • the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix are read in the memory. Specifically, the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix can be obtained in the same memory, or It is to obtain the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in two independent memories.
  • Step S11 Perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix.
  • the number of rows and columns between the first intermediate matrix and the second intermediate matrix are the same .
  • the focus of this embodiment is to obtain the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix, and before performing the convolution operation on the sample data matrix with the convolution kernel matrix, first perform the convolution operation on the sample data matrix and the convolution kernel.
  • the matrix is preprocessed, that is, the sample data matrix and the convolution kernel matrix are expanded respectively.
  • the purpose of the expansion operation is to obtain the first intermediate matrix and the second intermediate matrix with the same number of rows and columns. Among them, the first intermediate matrix is equivalent to the sample data matrix, and the second intermediate matrix is equivalent to the convolution kernel matrix.
  • the expansion operation in this step may specifically be expanded by row.
  • Step S12 Perform a convolution operation on the first intermediate matrix through the second intermediate matrix, and generate a convolution result.
  • a convolution operation is further performed on the first intermediate matrix through the second intermediate matrix to generate a corresponding convolution result.
  • the convolution operation method provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
  • the number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution
  • the second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result.
  • the first intermediate matrix generated is equivalent to the sample data matrix
  • the second intermediate matrix generated is equivalent to the convolution kernel matrix. Therefore, the second intermediate matrix is equivalent to the convolution kernel matrix.
  • Performing a convolution operation on the first intermediate matrix by the matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring the convolution operation process Overall efficiency.
  • performing an expansion operation on the sample data matrix to generate a first intermediate matrix includes:
  • the corresponding first transposed data columns are combined into a first intermediate matrix.
  • the matrix transposition and splicing method since the matrix transposition and splicing method is adopted in this embodiment, the first intermediate matrix obtained by the transformation can be characterized by row-first calculation, and the amount of intermediate result data generated in the calculation process is small. Advantages, so it can achieve the effect of reducing hardware resource overhead.
  • performing an expansion operation on the convolution kernel matrix to generate a second intermediate matrix includes:
  • a second intermediate matrix is combined based on a plurality of second transposed data columns.
  • this embodiment can generate the second intermediate matrix relatively efficiently based on the row and column size of the first intermediate matrix, which improves the overall efficiency of the convolution operation.
  • the sample data matrix is a 3x11 matrix
  • the first intermediate matrix after the expansion operation is a 9x9 arrangement.
  • the upper three rows of the 9x9 arrangement are obtained by dividing the first row of the 3x11 format three times, each time taking 9 data.
  • the initial positions of the three selected data are the first, second, and third data respectively.
  • the lower six rows of the 9x9 arrangement can be obtained.
  • the convolution kernel matrix is a 3x3 matrix
  • the expansion operation is to arrange the 3x3 data into a column in the order of rows, and further expand it into 9 columns.
  • the present invention also provides the following series of preferred embodiments.
  • An embodiment of the present invention discloses a convolution operation method, including:
  • Step S20 Read the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
  • Step S21 Perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix.
  • the number of rows and columns between the first intermediate matrix and the second intermediate matrix are the same .
  • Step S22 Perform a matrix multiplication operation on each first intermediate matrix respectively through the second intermediate matrix and generate a corresponding result matrix.
  • Step S23 Perform an accumulation operation on each result matrix and generate a convolution result.
  • the number of sample data matrices is greater than 1, the number of first intermediate matrices generated by performing the expansion operation based on the sample data matrix is also greater than 1, so the second intermediate matrix corresponding to the convolution kernel matrix needs to be the same as all
  • the first intermediate matrix performs matrix multiplication operations and generates corresponding result matrices, and then accumulates each result matrix to generate a convolution result of all sample data matrices.
  • This implementation can relatively ensure the number of sample data matrices When it is greater than 1, the overall accuracy of the convolution operation performed on the sample data matrix.
  • reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory includes:
  • the sample data matrix and the convolution kernel matrix are obtained from two different memories, that is, the sample data matrix is read in the DDR memory, and the sample data matrix corresponding to the sample data matrix is read in the HBM2 memory.
  • Convolution kernel matrix where DDR memory and HBM2 memory can belong to the same arithmetic chip.
  • DDR memory and HBM2 memory belong to the FPGA chip.
  • the FPGA chip obtains the sample data matrix from the local DDR chip.
  • the local HBM2 memory obtains the convolution kernel matrix, and performs the convolution operation of the convolution kernel matrix on the sample data matrix in the FPGA chip.
  • Both the DDR memory and the HBM2 memory in this embodiment can achieve a higher data transmission rate at the same bus frequency as the SDRAM memory, so this embodiment can further improve the overall efficiency of the convolution operation.
  • performing a convolution operation on the first intermediate matrix through the second intermediate matrix includes:
  • a convolution operation is performed on the first intermediate matrix through the second intermediate matrix.
  • the DSP arithmetic array also called a digital signal processor, is a microprocessor with a special structure.
  • the internal structure of the DSP chip is separated from the program and the data. It has a hardware multiplier and widely adopts pipeline operation.
  • the provided DSP instructions can be used to quickly implement various digital signal processing algorithms. Therefore, this implementation mode passes through the DSP arithmetic array.
  • the second intermediate matrix performs the convolution operation on the first intermediate matrix, which can relatively improve the overall efficiency of the second intermediate matrix performing the convolution operation on the first intermediate matrix.
  • the method further includes:
  • the convolution result is stored in the storage location corresponding to the sample data matrix in the memory.
  • the convolution result is further stored in the storage location corresponding to the sample data matrix in the memory.
  • the purpose is to cover the original sample data in the memory by the convolution result.
  • the matrix is used to ensure the space availability of the memory, thereby avoiding the waste of memory space, thereby reducing the storage pressure of the memory, and ensuring the overall stability of the convolution operation.
  • performing an expansion operation on the sample data matrix to generate the first intermediate matrix includes:
  • an expansion operation is sequentially performed to generate a first intermediate matrix.
  • this embodiment performs the expansion operation in sequence based on each element in the target dimension in the sample data to generate the first intermediate matrix, and then the first intermediate matrix is processed through the second intermediate matrix.
  • the intermediate matrix performs a convolution operation, which can sequentially perform the convolution operation between the second intermediate matrix and the first intermediate matrix with each element in the target dimension as a unit, relatively reducing the second intermediate matrix and the second intermediate matrix corresponding to the same element in the target dimension. The amount of intermediate data generated when the first intermediate matrix performs the convolution operation, thereby achieving the effect of reducing hardware resource overhead.
  • the present invention also provides a convolution operation device.
  • FIG. 4 shows a schematic diagram of the composition structure of an embodiment of a convolution operation device, and the device includes:
  • the matrix reading module 10 is used for reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
  • the preprocessing module 11 is used to perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix, the number of rows between the first intermediate matrix and the second intermediate matrix and The number of columns is the same.
  • the convolution execution module 12 is configured to perform a convolution operation on the first intermediate matrix through the second intermediate matrix, and generate a convolution result.
  • the convolution operation device provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
  • the number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution
  • the second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result.
  • the second intermediate matrix generated by the device after the expansion operation of the sample data matrix and the convolution kernel matrix is equivalent to the sample data matrix
  • the second intermediate matrix generated is equivalent to the convolution kernel matrix
  • the second intermediate matrix is equivalent to the convolution kernel matrix.
  • the convolution execution module includes:
  • the matrix product module is used to perform matrix multiplication operations on each first intermediate matrix through the second intermediate matrix and generate a corresponding result matrix
  • the accumulation module is used to perform accumulation operations on each result matrix.
  • the matrix reading module includes:
  • the memory reading module is used to read the sample data matrix in the DDR memory, and read the convolution kernel matrix corresponding to the sample data matrix in the HBM2 memory.
  • the present invention also provides a convolution operation device, including:
  • Memory used to store computer programs
  • the processor is used to implement the steps of the above-mentioned convolution operation method when the computer program is executed.
  • the convolution operation device provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
  • the number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution
  • the second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result.
  • the second intermediate matrix generated by this device after the expansion operation of the sample data matrix and the convolution kernel matrix is equivalent to the sample data matrix
  • the second intermediate matrix generated is equivalent to the convolution kernel matrix
  • the second intermediate matrix is equivalent to the convolution kernel matrix.
  • the present invention also provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the convolution operation method as described above are realized.
  • the computer-readable storage medium provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory, the number of rows or columns of the sample data matrix and the number of rows of the convolution kernel matrix Consistent, and then perform the expansion operation on the sample data matrix and the convolution kernel matrix to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally pass the convolution
  • the second intermediate matrix obtained by the expansion of the product kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result.
  • the first intermediate matrix generated is equivalent to the sample data matrix
  • the second intermediate matrix generated is equivalent to the convolution kernel matrix.
  • Performing a convolution operation on the first intermediate matrix through the second intermediate matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring The overall efficiency of the convolution operation process.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

A convolution operation method, apparatus and device, and a storage medium. The method comprises the steps of: reading, from a memory, a sample data matrix and a convolution kernel matrix corresponding to the sample data matrix; executing an expansion operation on the sample data matrix to generate a first intermediate matrix, and executing an expansion operation on the convolution kernel matrix to generate a second intermediate matrix, wherein the number of rows and the number of columns between the first intermediate matrix and the second intermediate matrix are consistent; and executing a convolution operation on the first intermediate matrix by means of the second intermediate matrix, and generating a convolution result. In the method, executing the convolution operation on the first intermediate matrix by means of the second intermediate matrix is equivalent to executing a convolution operation on the sample data matrix by means of the convolution kernel matrix; and the data amount of a convolution between the two matrices in a unit of time can be increased, thereby relatively ensuring the overall efficiency of a convolution operation process. In addition, a convolution operation apparatus and device, and a storage medium are further provided in the present invention, and the beneficial effects thereof are the same as those described above.

Description

一种卷积运算方法、装置、设备及存储介质Convolution operation method, device, equipment and storage medium
本申请要求于2020年1月20日提交中国专利局、申请号为202010065274.4、发明名称为“一种卷积运算方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 20, 2020, the application number is 202010065274.4, and the invention title is "a convolution operation method, device, equipment, and storage medium". The entire content of the application is approved The reference is incorporated in this application.
技术领域Technical field
本发明涉及深度学习领域,特别是涉及一种卷积运算方法、装置、设备及存储介质。The present invention relates to the field of deep learning, in particular to a convolution operation method, device, equipment and storage medium.
背景技术Background technique
深度学习指的是学习样本数据的内在规律和表示层次,它的最终目标是让机器能够像人一样具有分析学习能力,能够识别文字、图像和声音等数据,而通过卷积运算对样本数据进行特征提取是当前实现深度学习的重要手段。Deep learning refers to the internal laws and representation levels of learning sample data. Its ultimate goal is to enable the machine to have the ability to analyze and learn like humans, recognize text, images, and sound data, and perform convolution operations on the sample data. Feature extraction is currently an important means to realize deep learning.
以图像方面的深度学习为例,图像中不同数据窗口的样本数据和卷积核作内积的操作叫做卷积,其计算过程又称为滤波,本质是提取图像不同频段的特征。卷积核也称为滤波器,是包含一组固定权重的神经元,通常是正方形的二维矩阵,该矩阵中存的是对感受野中数据处理的系数,一个卷积核的滤波可以用来提取特定的特征,例如可以提取图像中的物体轮廓、颜色深浅等。由于当前在数据窗口中获取的样本数据的矩阵元素往往多于卷积核的矩阵元素,并且矩阵元素的数量差异较大,因此难以确保卷积核对样本数据进行卷积运算的整体效率。Taking image deep learning as an example, the operation of inner product of sample data and convolution kernel in different data windows in an image is called convolution, and its calculation process is also called filtering. The essence is to extract the characteristics of different frequency bands of the image. The convolution kernel is also called a filter. It is a set of neurons with fixed weights, usually a square two-dimensional matrix. The matrix stores the coefficients for processing the data in the receptive field. The filtering of a convolution kernel can be used Extract specific features, for example, you can extract the contours of objects in the image, the color depth, and so on. Because the matrix elements of the sample data currently acquired in the data window are often more than the matrix elements of the convolution kernel, and the number of matrix elements varies greatly, it is difficult to ensure the overall efficiency of the convolution kernel for convolution operations on the sample data.
由此可见,提供一种卷积运算方法,以相对确保卷积运算过程的整体效率,是本领域技术人员需要解决的问题。It can be seen that providing a convolution operation method to relatively ensure the overall efficiency of the convolution operation process is a problem that needs to be solved by those skilled in the art.
发明内容Summary of the invention
本发明的目的是提供一种卷积运算方法、装置、设备及存储介质,以相对确保卷积运算过程的整体效率。The purpose of the present invention is to provide a convolution operation method, device, equipment and storage medium to relatively ensure the overall efficiency of the convolution operation process.
为解决上述技术问题,本发明提供一种卷积运算方法,包括:In order to solve the above technical problems, the present invention provides a convolution operation method, including:
在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵;Read the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory;
对样本数据矩阵执行展开操作生成第一中间矩阵,并对卷积核矩阵执行展开操作生成第二中间矩阵,第一中间矩阵与第二中间矩阵之间的行数及列数均一致;Perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix, the number of rows and columns between the first intermediate matrix and the second intermediate matrix are the same;
通过第二中间矩阵对第一中间矩阵执行卷积操作,并生成卷积结果。Perform a convolution operation on the first intermediate matrix through the second intermediate matrix, and generate a convolution result.
优选的,当样本数据矩阵的数量大于1时,通过第二中间矩阵对第一中间矩阵执行卷积操作,包括:Preferably, when the number of sample data matrices is greater than 1, performing a convolution operation on the first intermediate matrix through the second intermediate matrix includes:
通过第二中间矩阵分别对各第一中间矩阵执行矩阵乘运算并生成相应的结果矩阵;Perform a matrix multiplication operation on each first intermediate matrix respectively through the second intermediate matrix and generate a corresponding result matrix;
对各结果矩阵执行累加操作。Perform an accumulation operation on each result matrix.
优选的,在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵,包括:Preferably, reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory includes:
在DDR存储器中读取样本数据矩阵,并在HBM2存储器中读取与样本数据矩阵对应的卷积核矩阵。Read the sample data matrix in the DDR memory, and read the convolution kernel matrix corresponding to the sample data matrix in the HBM2 memory.
优选的,通过第二中间矩阵对第一中间矩阵执行卷积操作,包括:Preferably, performing a convolution operation on the first intermediate matrix through the second intermediate matrix includes:
在DSP运算阵列中通过第二中间矩阵对第一中间矩阵执行卷积操作。In the DSP operation array, a convolution operation is performed on the first intermediate matrix through the second intermediate matrix.
优选的,在生成卷积结果后,方法还包括:Preferably, after generating the convolution result, the method further includes:
将卷积结果存储至存储器中与样本数据矩阵对应的存储位置。The convolution result is stored in the storage location corresponding to the sample data matrix in the memory.
优选的,对样本数据矩阵执行展开操作生成第一中间矩阵,包括:Preferably, performing an expansion operation on the sample data matrix to generate the first intermediate matrix includes:
在样本矩阵中顺序提取与卷积核矩阵的尺寸相同的过程矩阵;Extract sequentially from the sample matrix the process matrix with the same size as the convolution kernel matrix;
对过程矩阵的各行数据分别执行转置操作并依照行间顺序拼接为第一转置数据列;Perform a transposition operation on each row of data of the process matrix and splice them into the first transposed data column according to the order between the rows;
依照各过程矩阵之间的相邻关系将对应的各第一转置数据列组合为第一中间矩阵。According to the adjacent relationship between the process matrices, the corresponding first transposed data columns are combined into a first intermediate matrix.
优选的,对卷积核矩阵执行展开操作生成第二中间矩阵,包括:Preferably, performing an expansion operation on the convolution kernel matrix to generate the second intermediate matrix includes:
对卷积核矩阵的各行数据分别执行转置操作并依照行间顺序拼接为第二转置数据列;Perform a transposition operation on each row of data of the convolution kernel matrix, and splice them into a second transposed data column according to the order between the rows;
基于多个第二转置数据列组合为第二中间矩阵。A second intermediate matrix is combined based on a plurality of second transposed data columns.
优选的,当样本数据矩阵的维度数量大于2时,对样本数据矩阵执行展开操作生成第一中间矩阵,包括:Preferably, when the number of dimensions of the sample data matrix is greater than 2, performing an expansion operation on the sample data matrix to generate the first intermediate matrix includes:
基于样本数据中目标维度下的各个元素依次执行展开操作生成第一中间矩阵。Based on each element in the target dimension in the sample data, an expansion operation is sequentially performed to generate a first intermediate matrix.
此外,本发明还提供一种卷积运算装置,包括:In addition, the present invention also provides a convolution operation device, including:
矩阵读取模块,用于在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵;The matrix reading module is used to read the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory;
预处理模块,用于对样本数据矩阵执行展开操作生成第一中间矩阵,并对卷积核矩阵执行展开操作生成第二中间矩阵,第一中间矩阵与第二中间矩阵之间的行数及列数均一致;The preprocessing module is used to perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix, the number of rows and columns between the first intermediate matrix and the second intermediate matrix The numbers are the same;
卷积执行模块,用于通过第二中间矩阵对第一中间矩阵执行卷积操作,并生成卷积结果。The convolution execution module is configured to perform a convolution operation on the first intermediate matrix through the second intermediate matrix and generate a convolution result.
优选的,卷积执行模块,包括:Preferably, the convolution execution module includes:
矩阵乘积模块,用于通过第二中间矩阵分别对各第一中间矩阵执行矩阵乘运算并生成相应的结果矩阵;The matrix product module is used to perform matrix multiplication operations on each first intermediate matrix through the second intermediate matrix and generate a corresponding result matrix;
累加模块,用于对各结果矩阵执行累加操作。The accumulation module is used to perform accumulation operations on each result matrix.
优选的,矩阵读取模块,包括:Preferably, the matrix reading module includes:
存储器读取模块,用于在DDR存储器中读取样本数据矩阵,并在HBM2存储器中读取与样本数据矩阵对应的卷积核矩阵。The memory reading module is used to read the sample data matrix in the DDR memory, and read the convolution kernel matrix corresponding to the sample data matrix in the HBM2 memory.
此外,本发明还提供一种卷积运算设备,包括:In addition, the present invention also provides a convolution operation device, including:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于执行计算机程序时实现如上述的卷积运算方法的步骤。The processor is used to implement the steps of the above-mentioned convolution operation method when the computer program is executed.
此外,本发明还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述的卷积运算方法的步骤。In addition, the present invention also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the convolution operation method as described above are realized.
本发明所提供的卷积运算方法,首先在存储器中读取样本数据矩阵以及与该样本数据矩阵对应的卷积核矩阵,样本数据矩阵的行数或列数与卷积核矩阵的行数一致,进而对样本数据矩阵以及卷积核矩阵分别执行展开操作生成第一中间矩阵及第二中间矩阵,并且第一中间矩阵与第二中间矩 阵的行数及列数均一致,最终通过卷积核矩阵展开得到的第二中间矩阵对样本数据矩阵展开得到的第一中间矩阵执行卷积操作,生成相应的卷积结果。由于本方法对样本数据矩阵以及卷积核矩阵进行展开操作后生成的第一中间矩阵等效于该样本数据矩阵,生成的第二中间矩阵等效于该卷积核矩阵,因此通过第二中间矩阵对第一中间矩阵执行卷积操作等同于卷积核矩阵对样本数据矩阵执行卷积操作,并能够提高单位时间内两个矩阵之间卷积的数据量,进而相对确保了卷积运算过程的整体效率。此外,本发明还提供一种卷积运算装置、设备及存储介质,有益效果同上所述。The convolution operation method provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory. The number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution The second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result. Since the method performs the expansion operation on the sample data matrix and the convolution kernel matrix, the first intermediate matrix generated is equivalent to the sample data matrix, and the second intermediate matrix generated is equivalent to the convolution kernel matrix. Therefore, the second intermediate matrix is equivalent to the convolution kernel matrix. Performing a convolution operation on the first intermediate matrix by the matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring the convolution operation process Overall efficiency. In addition, the present invention also provides a convolution operation device, equipment and storage medium, and the beneficial effects are the same as those described above.
附图说明Description of the drawings
图1为本发明实施例公开的一种卷积运算方法的流程图;FIG. 1 is a flowchart of a convolution operation method disclosed in an embodiment of the present invention;
图2.a为本发明实施例公开的一种具体应用场景下的样本数据矩阵的展开操作示意图;Figure 2.a is a schematic diagram of the expansion operation of a sample data matrix in a specific application scenario disclosed in an embodiment of the present invention;
图2.b为本发明实施例公开的一种具体应用场景下的卷积核矩阵的展开操作示意图;Figure 2.b is a schematic diagram of the expansion operation of a convolution kernel matrix in a specific application scenario disclosed in an embodiment of the present invention;
图3为本发明实施例公开的一种具体的卷积运算方法的流程图;FIG. 3 is a flowchart of a specific convolution operation method disclosed in an embodiment of the present invention;
图4为本发明实施例公开的一种卷积运算装置的组成结构示意图。FIG. 4 is a schematic diagram of the composition structure of a convolution operation device disclosed in an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下,所获得的所有其他实施例,都属于本发明保护范围。The following describes the technical solutions in the embodiments of the present invention clearly and completely with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
以图像方面的深度学习为例,图像中不同数据窗口的样本数据和卷积核作内积的操作叫做卷积,其计算过程又称为滤波,本质是提取图像不同频段的特征。卷积核也称为滤波器,是包含一组固定权重的神经元,通常是正方形的二维矩阵,该矩阵中存的是对感受野中数据处理的系数,一个 卷积核的滤波可以用来提取特定的特征,例如可以提取图像中的物体轮廓、颜色深浅等。由于当前在数据窗口中获取的样本数据的矩阵元素往往多于卷积核的矩阵元素,并且矩阵元素的数量差异较大,因此难以确保卷积核对样本数据进行卷积运算的整体效率。Taking image deep learning as an example, the operation of inner product of sample data and convolution kernel in different data windows in an image is called convolution, and its calculation process is also called filtering. The essence is to extract the characteristics of different frequency bands of the image. The convolution kernel is also called a filter. It is a set of neurons with fixed weights, usually a square two-dimensional matrix. The matrix stores the coefficients for processing the data in the receptive field. The filtering of a convolution kernel can be used Extract specific features, for example, you can extract the contours of objects in the image, the color depth, and so on. Because the matrix elements of the sample data currently acquired in the data window are often more than the matrix elements of the convolution kernel, and the number of matrix elements varies greatly, it is difficult to ensure the overall efficiency of the convolution kernel for convolution operations on the sample data.
为此,本发明的核心是提供一种卷积运算方法,以相对确保卷积运算过程的整体效率。To this end, the core of the present invention is to provide a convolution operation method to relatively ensure the overall efficiency of the convolution operation process.
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。In order to enable those skilled in the art to better understand the solution of the present invention, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
请参见图1所示,本发明实施例公开了一种卷积运算方法,包括:As shown in FIG. 1, an embodiment of the present invention discloses a convolution operation method, including:
步骤S10:在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵。Step S10: Read the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
需要说明的是,本步骤中读取的样本数据矩阵可以是由包括但不限于图片、音频、文字等类型的样本中提取得到的数据矩阵,样本数据矩阵对应的卷积核矩阵即为对样本数据矩阵进行特征提取的矩阵,卷积核矩阵中的元素根据在样本数据矩阵中所提取特征的具体类型而设置,进而卷积核矩阵通过对样本数据矩阵执行卷积操作产生特征图像,即卷积结果,特征图像能够反映相应类型的特征在样本数据矩阵中的分布状态。另外,本步骤在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵,可以具体是在同一个存储器中获取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵,也可以是在两个独立的存储器中分别获取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵。It should be noted that the sample data matrix read in this step can be a data matrix extracted from samples including but not limited to pictures, audio, text, etc. The convolution kernel matrix corresponding to the sample data matrix is the pair of samples. The data matrix is a matrix for feature extraction. The elements in the convolution kernel matrix are set according to the specific types of features extracted in the sample data matrix, and the convolution kernel matrix generates feature images by performing convolution operations on the sample data matrix, that is, volume As a result, the feature image can reflect the distribution state of the corresponding type of feature in the sample data matrix. In addition, in this step, the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix are read in the memory. Specifically, the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix can be obtained in the same memory, or It is to obtain the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in two independent memories.
步骤S11:对样本数据矩阵执行展开操作生成第一中间矩阵,并对卷积核矩阵执行展开操作生成第二中间矩阵,第一中间矩阵与第二中间矩阵之间的行数及列数均一致。Step S11: Perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix. The number of rows and columns between the first intermediate matrix and the second intermediate matrix are the same .
本实施例的重点在于在获取到样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵,并在利用卷积核矩阵对样本数据矩阵进行卷积运算之前,先对样本数据矩阵以及卷积核矩阵进行预处理,也就是分别对样本数据矩阵以及卷积核矩阵进行展开操作,展开操作的目的是获取到行数以及列数 对应一致的第一中间矩阵和第二中间矩阵。其中,第一中间矩阵等效于样本数据矩阵,第二中间矩阵等效于卷积核矩阵,由于第一中间矩阵与第二中间矩阵的行数以及列数分别相同,因此能够保证在执行后续卷积操作的过程中,第一中间矩阵与第二中间矩阵之间在单位时间内具有较大的数据卷积次数。此外,本步骤中的展开操作可以具体是按行展开。The focus of this embodiment is to obtain the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix, and before performing the convolution operation on the sample data matrix with the convolution kernel matrix, first perform the convolution operation on the sample data matrix and the convolution kernel. The matrix is preprocessed, that is, the sample data matrix and the convolution kernel matrix are expanded respectively. The purpose of the expansion operation is to obtain the first intermediate matrix and the second intermediate matrix with the same number of rows and columns. Among them, the first intermediate matrix is equivalent to the sample data matrix, and the second intermediate matrix is equivalent to the convolution kernel matrix. Since the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same respectively, it can be ensured that in the subsequent execution During the convolution operation, the first intermediate matrix and the second intermediate matrix have a larger number of data convolutions in a unit time. In addition, the expansion operation in this step may specifically be expanded by row.
步骤S12:通过第二中间矩阵对第一中间矩阵执行卷积操作,并生成卷积结果。Step S12: Perform a convolution operation on the first intermediate matrix through the second intermediate matrix, and generate a convolution result.
本步骤是在获取到第一中间矩阵与第二中间矩阵后,进一步通过第二中间矩阵对第一中间矩阵执行卷积操作,进而生成相应的卷积结果。In this step, after the first intermediate matrix and the second intermediate matrix are obtained, a convolution operation is further performed on the first intermediate matrix through the second intermediate matrix to generate a corresponding convolution result.
本发明所提供的卷积运算方法,首先在存储器中读取样本数据矩阵以及与该样本数据矩阵对应的卷积核矩阵,样本数据矩阵的行数或列数与卷积核矩阵的行数一致,进而对样本数据矩阵以及卷积核矩阵分别执行展开操作生成第一中间矩阵及第二中间矩阵,并且第一中间矩阵与第二中间矩阵的行数及列数均一致,最终通过卷积核矩阵展开得到的第二中间矩阵对样本数据矩阵展开得到的第一中间矩阵执行卷积操作,生成相应的卷积结果。由于本方法对样本数据矩阵以及卷积核矩阵进行展开操作后生成的第一中间矩阵等效于该样本数据矩阵,生成的第二中间矩阵等效于该卷积核矩阵,因此通过第二中间矩阵对第一中间矩阵执行卷积操作等同于卷积核矩阵对样本数据矩阵执行卷积操作,并能够提高单位时间内两个矩阵之间卷积的数据量,进而相对确保了卷积运算过程的整体效率。The convolution operation method provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory. The number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution The second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result. Since the method performs the expansion operation on the sample data matrix and the convolution kernel matrix, the first intermediate matrix generated is equivalent to the sample data matrix, and the second intermediate matrix generated is equivalent to the convolution kernel matrix. Therefore, the second intermediate matrix is equivalent to the convolution kernel matrix. Performing a convolution operation on the first intermediate matrix by the matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring the convolution operation process Overall efficiency.
在上述实施例的基础上,作为一种优选的实施方式,对样本数据矩阵执行展开操作生成第一中间矩阵,包括:On the basis of the foregoing embodiment, as a preferred implementation manner, performing an expansion operation on the sample data matrix to generate a first intermediate matrix includes:
在样本矩阵中顺序提取与卷积核矩阵的尺寸相同的过程矩阵;Extract sequentially from the sample matrix the process matrix with the same size as the convolution kernel matrix;
对过程矩阵的各行数据分别执行转置操作并依照行间顺序拼接为第一转置数据列;Perform a transposition operation on each row of data of the process matrix and splice them into the first transposed data column according to the order between the rows;
依照各过程矩阵之间的相邻关系将对应的各第一转置数据列组合为第一中间矩阵。According to the adjacent relationship between the process matrices, the corresponding first transposed data columns are combined into a first intermediate matrix.
需要说明的是,由于本实施方式中采用了矩阵转置拼接的方式,因此 能够使变换得到的第一中间矩阵具有行优先计算的特点,进而具有在计算过程中产生的中间结果数据量小的优势,因此能达到缩减硬件资源开销的效果。It should be noted that, since the matrix transposition and splicing method is adopted in this embodiment, the first intermediate matrix obtained by the transformation can be characterized by row-first calculation, and the amount of intermediate result data generated in the calculation process is small. Advantages, so it can achieve the effect of reducing hardware resource overhead.
在上述实施方式的基础上,作为一种优选的实施方式,对卷积核矩阵执行展开操作生成第二中间矩阵,包括:On the basis of the foregoing implementation manner, as a preferred implementation manner, performing an expansion operation on the convolution kernel matrix to generate a second intermediate matrix includes:
对卷积核矩阵的各行数据分别执行转置操作并依照行间顺序拼接为第二转置数据列;Perform a transposition operation on each row of data of the convolution kernel matrix, and splice them into a second transposed data column according to the order between the rows;
基于多个第二转置数据列组合为第二中间矩阵。A second intermediate matrix is combined based on a plurality of second transposed data columns.
需要说明的是,本实施方式能够基于根据第一中间矩阵的行列尺寸相对高效的生成第二中间矩阵,提高了卷积运算的整体效率。It should be noted that this embodiment can generate the second intermediate matrix relatively efficiently based on the row and column size of the first intermediate matrix, which improves the overall efficiency of the convolution operation.
为了加深对于上述实施方式中展开操作的理解,本实施例通过举例方式进行说明。在具体应用场景下,样本数据矩阵以及卷积核矩阵的展开操作的示意图分别如图2.a以及图2.b所示。In order to deepen the understanding of the unfolding operation in the foregoing embodiment, this embodiment is described by way of example. In a specific application scenario, the schematic diagrams of the expansion operation of the sample data matrix and the convolution kernel matrix are shown in Figure 2.a and Figure 2.b, respectively.
如图2.a所示,样本数据矩阵是3x11形式的矩阵,展开操作后的第一中间矩阵为9x9的排列方式。其中9x9排列方式的上部三行是3x11形式的第一行分三次,每次取9个数据而得到的,这三次选取数据的初始位置分别是第一个、第二个、第三个数据。以此类推可以得到9x9排列方式的下部六行。As shown in Figure 2.a, the sample data matrix is a 3x11 matrix, and the first intermediate matrix after the expansion operation is a 9x9 arrangement. Among them, the upper three rows of the 9x9 arrangement are obtained by dividing the first row of the 3x11 format three times, each time taking 9 data. The initial positions of the three selected data are the first, second, and third data respectively. By analogy, the lower six rows of the 9x9 arrangement can be obtained.
如图2.b所示,卷积核矩阵是3x3形式的矩阵,展开操作是将3x3的数据按行的顺序排成一列,进一步扩展成9列而得。As shown in Figure 2.b, the convolution kernel matrix is a 3x3 matrix, and the expansion operation is to arrange the 3x3 data into a column in the order of rows, and further expand it into 9 columns.
在上述实施例的基础上,本发明还提供如下一系列优选的实施方式。On the basis of the above-mentioned embodiments, the present invention also provides the following series of preferred embodiments.
当样本数据矩阵的数量大于1时,请参见图3所示,本发明实施例公开了一种卷积运算方法,包括:When the number of sample data matrices is greater than 1, please refer to FIG. 3. An embodiment of the present invention discloses a convolution operation method, including:
步骤S20:在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵。Step S20: Read the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
步骤S21:对样本数据矩阵执行展开操作生成第一中间矩阵,并对卷积核矩阵执行展开操作生成第二中间矩阵,第一中间矩阵与第二中间矩阵之间的行数及列数均一致。Step S21: Perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix. The number of rows and columns between the first intermediate matrix and the second intermediate matrix are the same .
步骤S22:通过第二中间矩阵分别对各第一中间矩阵执行矩阵乘运算并生成相应的结果矩阵。Step S22: Perform a matrix multiplication operation on each first intermediate matrix respectively through the second intermediate matrix and generate a corresponding result matrix.
步骤S23:对各结果矩阵执行累加操作并生成卷积结果。Step S23: Perform an accumulation operation on each result matrix and generate a convolution result.
可以理解的是,当样本数据矩阵的数量大于1时,基于样本数据矩阵执行展开操作所生成的第一中间矩阵的数量同样大于1,因此卷积核矩阵对应的第二中间矩阵需要与所有的第一中间矩阵分别执行矩阵乘运算并生成相应的结果矩阵,进而再对各结果矩阵进行累加操作,以此生成全部样本数据矩阵综合的卷积结果,本实施能够相对确保当样本数据矩阵的数量大于1时,对样本数据矩阵执行卷积运算的整体准确性。It is understandable that when the number of sample data matrices is greater than 1, the number of first intermediate matrices generated by performing the expansion operation based on the sample data matrix is also greater than 1, so the second intermediate matrix corresponding to the convolution kernel matrix needs to be the same as all The first intermediate matrix performs matrix multiplication operations and generates corresponding result matrices, and then accumulates each result matrix to generate a convolution result of all sample data matrices. This implementation can relatively ensure the number of sample data matrices When it is greater than 1, the overall accuracy of the convolution operation performed on the sample data matrix.
在上述实施例的基础上,作为一种优选的实施方式,在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵,包括:On the basis of the foregoing embodiment, as a preferred implementation manner, reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory includes:
在DDR存储器中读取样本数据矩阵,并在HBM2存储器中读取与样本数据矩阵对应的卷积核矩阵。Read the sample data matrix in the DDR memory, and read the convolution kernel matrix corresponding to the sample data matrix in the HBM2 memory.
需要说明的是,在本实施方式中,样本数据矩阵与卷积核矩阵获取于不同的两个存储器,即在DDR存储器中读取样本数据矩阵,在HBM2存储器中读取与样本数据矩阵对应的卷积核矩阵,其中,DDR存储器与HBM2存储器可以同属一个运算芯片所有,例如DDR存储器以及HBM2存储器同属于FPGA芯片,在此情况下,FPGA芯片分别在本地的DDR芯片中获取样本数据矩阵,在本地的HBM2存储器获取卷积核矩阵,并在FPGA芯片中进行卷积核矩阵对样本数据矩阵的卷积运算。It should be noted that, in this embodiment, the sample data matrix and the convolution kernel matrix are obtained from two different memories, that is, the sample data matrix is read in the DDR memory, and the sample data matrix corresponding to the sample data matrix is read in the HBM2 memory. Convolution kernel matrix, where DDR memory and HBM2 memory can belong to the same arithmetic chip. For example, DDR memory and HBM2 memory belong to the FPGA chip. In this case, the FPGA chip obtains the sample data matrix from the local DDR chip. The local HBM2 memory obtains the convolution kernel matrix, and performs the convolution operation of the convolution kernel matrix on the sample data matrix in the FPGA chip.
本实施中的DDR存储器以及HBM2存储器均可以在与SDRAM存储器相同的总线频率下达到更高的数据传输率,因此本实施例能够进一步提高卷积运算的整体效率。Both the DDR memory and the HBM2 memory in this embodiment can achieve a higher data transmission rate at the same bus frequency as the SDRAM memory, so this embodiment can further improve the overall efficiency of the convolution operation.
在上述实施例的基础上,作为一种优选的实施方式,通过第二中间矩阵对第一中间矩阵执行卷积操作,包括:On the basis of the foregoing embodiment, as a preferred implementation manner, performing a convolution operation on the first intermediate matrix through the second intermediate matrix includes:
在DSP运算阵列中通过第二中间矩阵对第一中间矩阵执行卷积操作。In the DSP operation array, a convolution operation is performed on the first intermediate matrix through the second intermediate matrix.
需要说明的是,DSP运算阵列,也称数字信号处理器,是一种具有特 殊结构的微处理器。DSP芯片的内部采用程序和数据分开的结构,具有硬件乘法器,广泛采用流水线操作,提供的DSP指令,可以用来快速的实现各种数字信号处理算法,因此本实施方式在DSP运算阵列中通过第二中间矩阵对第一中间矩阵执行卷积操作,能够相对提高第二中间矩阵对第一中间矩阵执行卷积操作的整体效率。It should be noted that the DSP arithmetic array, also called a digital signal processor, is a microprocessor with a special structure. The internal structure of the DSP chip is separated from the program and the data. It has a hardware multiplier and widely adopts pipeline operation. The provided DSP instructions can be used to quickly implement various digital signal processing algorithms. Therefore, this implementation mode passes through the DSP arithmetic array. The second intermediate matrix performs the convolution operation on the first intermediate matrix, which can relatively improve the overall efficiency of the second intermediate matrix performing the convolution operation on the first intermediate matrix.
此外,作为一种优选的实施方式,在生成卷积结果后,方法还包括:In addition, as a preferred implementation manner, after generating the convolution result, the method further includes:
将卷积结果存储至存储器中与样本数据矩阵对应的存储位置。The convolution result is stored in the storage location corresponding to the sample data matrix in the memory.
需要说明的是,由于考虑到在通过卷积核矩阵对样本数据矩阵执行卷积操作后,样本数据矩阵已经完成其在卷积运算中的作用,但是样本数据矩阵仍会持续占用存储器的空间,造成存储器的空间可用率降低,因此本实施方式在生成卷积结果后,进一步将卷积结果存储至存储器中与样本数据矩阵对应的存储位置,目的是通过卷积结果覆盖存储器中原有的样本数据矩阵,以此确保存储器的空间可用率,从而避免存储器空间的浪费,进而降低存储器的存储压力,确保了卷积运算的整体稳定性。It should be noted that, considering that the sample data matrix has completed its role in the convolution operation after the convolution kernel matrix is used to perform the convolution operation on the sample data matrix, but the sample data matrix will continue to occupy memory space, As a result, the memory space availability rate is reduced. Therefore, after the convolution result is generated in this embodiment, the convolution result is further stored in the storage location corresponding to the sample data matrix in the memory. The purpose is to cover the original sample data in the memory by the convolution result. The matrix is used to ensure the space availability of the memory, thereby avoiding the waste of memory space, thereby reducing the storage pressure of the memory, and ensuring the overall stability of the convolution operation.
此外,在上述一系列实施方式的基础上,作为一种优选的实施方式,当样本数据矩阵的维度数量大于2时,对样本数据矩阵执行展开操作生成第一中间矩阵,包括:In addition, on the basis of the foregoing series of implementation manners, as a preferred implementation manner, when the number of dimensions of the sample data matrix is greater than 2, performing an expansion operation on the sample data matrix to generate the first intermediate matrix includes:
基于样本数据中目标维度下的各个元素依次执行展开操作生成第一中间矩阵。Based on each element in the target dimension in the sample data, an expansion operation is sequentially performed to generate a first intermediate matrix.
需要说明的是,当样本数据矩阵的维度数量大于2时,本实施方式基于样本数据中的目标维度下的各个元素依次执行展开操作,生成第一中间矩阵,进而通过第二中间矩阵对第一中间矩阵执行卷积操作,能够依次以目标维度中每个元素为单位执行第二中间矩阵与第一中间矩阵之间的卷积操作,相对减少了目标维度中同一元素对应的第二中间矩阵与第一中间矩阵进行卷积操作时所产生的中间数据的数量,进而达到缩减硬件资源开销的效果。It should be noted that when the number of dimensions of the sample data matrix is greater than 2, this embodiment performs the expansion operation in sequence based on each element in the target dimension in the sample data to generate the first intermediate matrix, and then the first intermediate matrix is processed through the second intermediate matrix. The intermediate matrix performs a convolution operation, which can sequentially perform the convolution operation between the second intermediate matrix and the first intermediate matrix with each element in the target dimension as a unit, relatively reducing the second intermediate matrix and the second intermediate matrix corresponding to the same element in the target dimension. The amount of intermediate data generated when the first intermediate matrix performs the convolution operation, thereby achieving the effect of reducing hardware resource overhead.
另一方面,本发明还提供了一种卷积运算装置。参见图4,其示出了一种卷积运算装置一个实施例的组成结构示意图,该装置包括:On the other hand, the present invention also provides a convolution operation device. Refer to FIG. 4, which shows a schematic diagram of the composition structure of an embodiment of a convolution operation device, and the device includes:
矩阵读取模块10,用于在存储器中读取样本数据矩阵以及与样本数据矩阵对应的卷积核矩阵。The matrix reading module 10 is used for reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory.
预处理模块11,用于对样本数据矩阵执行展开操作生成第一中间矩阵,并对卷积核矩阵执行展开操作生成第二中间矩阵,第一中间矩阵与第二中间矩阵之间的行数及列数均一致。The preprocessing module 11 is used to perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix, the number of rows between the first intermediate matrix and the second intermediate matrix and The number of columns is the same.
卷积执行模块12,用于通过第二中间矩阵对第一中间矩阵执行卷积操作,并生成卷积结果。The convolution execution module 12 is configured to perform a convolution operation on the first intermediate matrix through the second intermediate matrix, and generate a convolution result.
本发明所提供的卷积运算装置,首先在存储器中读取样本数据矩阵以及与该样本数据矩阵对应的卷积核矩阵,样本数据矩阵的行数或列数与卷积核矩阵的行数一致,进而对样本数据矩阵以及卷积核矩阵分别执行展开操作生成第一中间矩阵及第二中间矩阵,并且第一中间矩阵与第二中间矩阵的行数及列数均一致,最终通过卷积核矩阵展开得到的第二中间矩阵对样本数据矩阵展开得到的第一中间矩阵执行卷积操作,生成相应的卷积结果。由于本装置对样本数据矩阵以及卷积核矩阵进行展开操作后生成的第一中间矩阵等效于该样本数据矩阵,生成的第二中间矩阵等效于该卷积核矩阵,因此通过第二中间矩阵对第一中间矩阵执行卷积操作等同于卷积核矩阵对样本数据矩阵执行卷积操作,并能够提高单位时间内两个矩阵之间卷积的数据量,进而相对确保了卷积运算过程的整体效率。The convolution operation device provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory. The number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution The second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result. Since the first intermediate matrix generated by the device after the expansion operation of the sample data matrix and the convolution kernel matrix is equivalent to the sample data matrix, and the second intermediate matrix generated is equivalent to the convolution kernel matrix, the second intermediate matrix is equivalent to the convolution kernel matrix. Performing a convolution operation on the first intermediate matrix by the matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring the convolution operation process Overall efficiency.
此外,作为一种优选的实施方式,卷积执行模块,包括:In addition, as a preferred implementation manner, the convolution execution module includes:
矩阵乘积模块,用于通过第二中间矩阵分别对各第一中间矩阵执行矩阵乘运算并生成相应的结果矩阵;The matrix product module is used to perform matrix multiplication operations on each first intermediate matrix through the second intermediate matrix and generate a corresponding result matrix;
累加模块,用于对各结果矩阵执行累加操作。The accumulation module is used to perform accumulation operations on each result matrix.
此外,作为一种优选的实施方式,矩阵读取模块,包括:In addition, as a preferred embodiment, the matrix reading module includes:
存储器读取模块,用于在DDR存储器中读取样本数据矩阵,并在HBM2存储器中读取与样本数据矩阵对应的卷积核矩阵。The memory reading module is used to read the sample data matrix in the DDR memory, and read the convolution kernel matrix corresponding to the sample data matrix in the HBM2 memory.
另一方面,本发明还提供了一种卷积运算设备,包括:On the other hand, the present invention also provides a convolution operation device, including:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于执行计算机程序时实现如上述的卷积运算方法的步骤。The processor is used to implement the steps of the above-mentioned convolution operation method when the computer program is executed.
本发明所提供的卷积运算设备,首先在存储器中读取样本数据矩阵以及与该样本数据矩阵对应的卷积核矩阵,样本数据矩阵的行数或列数与卷积核矩阵的行数一致,进而对样本数据矩阵以及卷积核矩阵分别执行展开操作生成第一中间矩阵及第二中间矩阵,并且第一中间矩阵与第二中间矩阵的行数及列数均一致,最终通过卷积核矩阵展开得到的第二中间矩阵对样本数据矩阵展开得到的第一中间矩阵执行卷积操作,生成相应的卷积结果。由于本设备对样本数据矩阵以及卷积核矩阵进行展开操作后生成的第一中间矩阵等效于该样本数据矩阵,生成的第二中间矩阵等效于该卷积核矩阵,因此通过第二中间矩阵对第一中间矩阵执行卷积操作等同于卷积核矩阵对样本数据矩阵执行卷积操作,并能够提高单位时间内两个矩阵之间卷积的数据量,进而相对确保了卷积运算过程的整体效率。The convolution operation device provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory. The number of rows or columns of the sample data matrix is equal to the number of rows of the convolution kernel matrix. Therefore, the expansion operation is performed on the sample data matrix and the convolution kernel matrix respectively to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally through the convolution The second intermediate matrix obtained by the expansion of the kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result. Since the first intermediate matrix generated by this device after the expansion operation of the sample data matrix and the convolution kernel matrix is equivalent to the sample data matrix, and the second intermediate matrix generated is equivalent to the convolution kernel matrix, the second intermediate matrix is equivalent to the convolution kernel matrix. Performing a convolution operation on the first intermediate matrix by the matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring the convolution operation process Overall efficiency.
另一方面,本发明还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上述的卷积运算方法的步骤。On the other hand, the present invention also provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the convolution operation method as described above are realized.
本发明所提供的计算机可读存储介质,首先在存储器中读取样本数据矩阵以及与该样本数据矩阵对应的卷积核矩阵,样本数据矩阵的行数或列数与卷积核矩阵的行数一致,进而对样本数据矩阵以及卷积核矩阵分别执行展开操作生成第一中间矩阵及第二中间矩阵,并且第一中间矩阵与第二中间矩阵的行数及列数均一致,最终通过卷积核矩阵展开得到的第二中间矩阵对样本数据矩阵展开得到的第一中间矩阵执行卷积操作,生成相应的卷积结果。由于本计算机可读存储介质对样本数据矩阵以及卷积核矩阵进行展开操作后生成的第一中间矩阵等效于该样本数据矩阵,生成的第二中间矩阵等效于该卷积核矩阵,因此通过第二中间矩阵对第一中间矩阵执行卷积操作等同于卷积核矩阵对样本数据矩阵执行卷积操作,并能够提高单位时间内两个矩阵之间卷积的数据量,进而相对确保了卷积运算过程的整 体效率。The computer-readable storage medium provided by the present invention first reads the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory, the number of rows or columns of the sample data matrix and the number of rows of the convolution kernel matrix Consistent, and then perform the expansion operation on the sample data matrix and the convolution kernel matrix to generate the first intermediate matrix and the second intermediate matrix, and the number of rows and columns of the first intermediate matrix and the second intermediate matrix are the same, and finally pass the convolution The second intermediate matrix obtained by the expansion of the product kernel matrix performs a convolution operation on the first intermediate matrix obtained by the expansion of the sample data matrix to generate a corresponding convolution result. Since the computer-readable storage medium performs the expansion operation on the sample data matrix and the convolution kernel matrix, the first intermediate matrix generated is equivalent to the sample data matrix, and the second intermediate matrix generated is equivalent to the convolution kernel matrix. Performing a convolution operation on the first intermediate matrix through the second intermediate matrix is equivalent to performing a convolution operation on the sample data matrix by the convolution kernel matrix, and can increase the amount of convolution data between the two matrices per unit time, thereby relatively ensuring The overall efficiency of the convolution operation process.
以上对本发明所提供的一种卷积运算方法、装置、设备及存储介质进行了详细介绍。说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。The above describes in detail a convolution operation method, device, equipment, and storage medium provided by the present invention. The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method part. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made to the present invention, and these improvements and modifications also fall within the protection scope of the claims of the present invention.
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this specification, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities or operations. There is any such actual relationship or sequence between operations. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, article, or equipment that includes the element.

Claims (13)

  1. 一种卷积运算方法,其特征在于,包括:A convolution operation method, characterized in that it comprises:
    在存储器中读取样本数据矩阵以及与所述样本数据矩阵对应的卷积核矩阵;Reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory;
    对所述样本数据矩阵执行展开操作生成第一中间矩阵,并对所述卷积核矩阵执行展开操作生成第二中间矩阵,所述第一中间矩阵与所述第二中间矩阵之间的行数及列数均一致;Performing an expansion operation on the sample data matrix to generate a first intermediate matrix, and performing an expansion operation on the convolution kernel matrix to generate a second intermediate matrix, the number of rows between the first intermediate matrix and the second intermediate matrix And the number of columns are the same;
    通过所述第二中间矩阵对所述第一中间矩阵执行卷积操作,并生成卷积结果。Perform a convolution operation on the first intermediate matrix through the second intermediate matrix, and generate a convolution result.
  2. 根据权利要求1所述的卷积运算方法,其特征在于,当所述样本数据矩阵的数量大于1时,所述通过所述第二中间矩阵对所述第一中间矩阵执行卷积操作,包括:The convolution operation method according to claim 1, wherein when the number of the sample data matrix is greater than 1, the performing a convolution operation on the first intermediate matrix through the second intermediate matrix comprises :
    通过所述第二中间矩阵分别对各所述第一中间矩阵执行矩阵乘运算并生成相应的结果矩阵;Perform a matrix multiplication operation on each of the first intermediate matrices through the second intermediate matrix and generate a corresponding result matrix;
    对各所述结果矩阵执行累加操作。An accumulation operation is performed on each of the result matrices.
  3. 根据权利要求1所述的卷积运算方法,其特征在于,所述在存储器中读取样本数据矩阵以及与所述样本数据矩阵对应的卷积核矩阵,包括:The convolution operation method according to claim 1, wherein said reading the sample data matrix and the convolution kernel matrix corresponding to the sample data matrix in the memory comprises:
    在DDR存储器中读取所述样本数据矩阵,并在HBM2存储器中读取与所述样本数据矩阵对应的卷积核矩阵。The sample data matrix is read in the DDR memory, and the convolution kernel matrix corresponding to the sample data matrix is read in the HBM2 memory.
  4. 根据权利要求1所述的卷积运算方法,其特征在于,所述通过所述第二中间矩阵对所述第一中间矩阵执行卷积操作,包括:The convolution operation method according to claim 1, wherein the performing a convolution operation on the first intermediate matrix through the second intermediate matrix comprises:
    在DSP运算阵列中通过所述第二中间矩阵对所述第一中间矩阵执行所述卷积操作。The convolution operation is performed on the first intermediate matrix through the second intermediate matrix in the DSP operation array.
  5. 根据权利要求1所述的卷积运算方法,其特征在于,在所述生成卷积结果后,所述方法还包括:The convolution operation method according to claim 1, wherein after said generating a convolution result, the method further comprises:
    将所述卷积结果存储至所述存储器中与所述样本数据矩阵对应的存储位置。Storing the convolution result in a storage location corresponding to the sample data matrix in the memory.
  6. 根据权利要求1所述的卷积运算方法,其特征在于,所述对所述样本数据矩阵执行展开操作生成第一中间矩阵,包括:The convolution operation method according to claim 1, wherein the performing an expansion operation on the sample data matrix to generate a first intermediate matrix comprises:
    在所述样本矩阵中顺序提取与所述卷积核矩阵的尺寸相同的过程矩阵;Sequentially extracting process matrices with the same size as the convolution kernel matrix from the sample matrix;
    对所述过程矩阵的各行数据分别执行转置操作并依照行间顺序拼接为第一转置数据列;Perform a transposition operation on each row of data of the process matrix, and splice them into a first transposed data column according to the order between rows;
    依照各所述过程矩阵之间的相邻关系将对应的各所述第一转置数据列组合为所述第一中间矩阵。Combine the corresponding first transposed data columns into the first intermediate matrix according to the adjacent relationship between the process matrices.
  7. 根据权利要求6所述的卷积运算方法,其特征在于,所述对所述卷积核矩阵执行展开操作生成第二中间矩阵,包括:The convolution operation method according to claim 6, wherein the performing an expansion operation on the convolution kernel matrix to generate a second intermediate matrix comprises:
    对所述卷积核矩阵的各行数据分别执行转置操作并依照行间顺序拼接为第二转置数据列;Performing a transposition operation on each row of data of the convolution kernel matrix and splicing them into a second transposed data column according to the order between the rows;
    基于多个所述第二转置数据列组合为所述第二中间矩阵。Combining a plurality of the second transposed data columns to form the second intermediate matrix.
  8. 根据权利要求1至7任意一项所述的卷积运算方法,其特征在于,当所述样本数据矩阵的维度数量大于2时,所述对所述样本数据矩阵执行展开操作生成第一中间矩阵,包括:The convolution operation method according to any one of claims 1 to 7, wherein when the number of dimensions of the sample data matrix is greater than 2, the expansion operation is performed on the sample data matrix to generate a first intermediate matrix ,include:
    基于所述样本数据中目标维度下的各个元素依次执行所述展开操作生成所述第一中间矩阵。The first intermediate matrix is generated by sequentially performing the expansion operation based on each element in the target dimension in the sample data.
  9. 一种卷积运算装置,其特征在于,包括:A convolution operation device, characterized in that it comprises:
    矩阵读取模块,用于在存储器中读取样本数据矩阵以及与所述样本数据矩阵对应的卷积核矩阵;A matrix reading module for reading a sample data matrix and a convolution kernel matrix corresponding to the sample data matrix in the memory;
    预处理模块,用于对所述样本数据矩阵执行展开操作生成第一中间矩阵,并对所述卷积核矩阵执行展开操作生成第二中间矩阵,所述第一中间矩阵与所述第二中间矩阵之间的行数及列数均一致;The preprocessing module is configured to perform an expansion operation on the sample data matrix to generate a first intermediate matrix, and perform an expansion operation on the convolution kernel matrix to generate a second intermediate matrix. The number of rows and columns between the matrices are consistent;
    卷积执行模块,用于通过所述第二中间矩阵对所述第一中间矩阵执行卷积操作,并生成卷积结果。The convolution execution module is configured to perform a convolution operation on the first intermediate matrix through the second intermediate matrix, and generate a convolution result.
  10. 根据权利要求9所述的卷积运算装置,其特征在于,所述卷积执行模块,包括:The convolution operation device according to claim 9, wherein the convolution execution module comprises:
    矩阵乘积模块,用于通过所述第二中间矩阵分别对各所述第一中间矩阵执行矩阵乘运算并生成相应的结果矩阵;A matrix product module, configured to perform a matrix multiplication operation on each of the first intermediate matrices through the second intermediate matrix and generate a corresponding result matrix;
    累加模块,用于对各所述结果矩阵执行累加操作。The accumulation module is used to perform accumulation operations on each of the result matrices.
  11. 根据权利要求9所述的卷积运算装置,其特征在于,所述矩阵读取模块,包括:The convolution operation device according to claim 9, wherein the matrix reading module comprises:
    存储器读取模块,用于在DDR存储器中读取所述样本数据矩阵,并在HBM2存储器中读取与所述样本数据矩阵对应的卷积核矩阵。The memory reading module is used to read the sample data matrix in the DDR memory, and read the convolution kernel matrix corresponding to the sample data matrix in the HBM2 memory.
  12. 一种卷积运算设备,其特征在于,包括:A convolution operation device, which is characterized in that it comprises:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于执行所述计算机程序时实现如权利要求1至8任一项所述的卷积运算方法的步骤。The processor is configured to implement the steps of the convolution operation method according to any one of claims 1 to 8 when the computer program is executed.
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述的卷积运算方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the convolution operation according to any one of claims 1 to 8 is realized Method steps.
PCT/CN2020/087105 2020-01-20 2020-04-27 Convolution operation method, apparatus and device, and storage medium WO2021147196A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010065274.4 2020-01-20
CN202010065274.4A CN111310891A (en) 2020-01-20 2020-01-20 Convolution operation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021147196A1 true WO2021147196A1 (en) 2021-07-29

Family

ID=71146891

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087105 WO2021147196A1 (en) 2020-01-20 2020-04-27 Convolution operation method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN111310891A (en)
WO (1) WO2021147196A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111882029A (en) * 2020-06-22 2020-11-03 华控清交信息科技(北京)有限公司 Data processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017049496A1 (en) * 2015-09-23 2017-03-30 Intel Corporation Apparatus and method for local quantization for convolutional neural networks (cnns)
CN107430537A (en) * 2015-03-27 2017-12-01 英特尔公司 From piece selective information is extracted in DRAM ECC
CN108122030A (en) * 2016-11-30 2018-06-05 华为技术有限公司 A kind of operation method of convolutional neural networks, device and server

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171327A (en) * 2017-12-25 2018-06-15 郑州云海信息技术有限公司 A kind of matrix method for transformation, device and medium based on convolution algorithm
CN109871510B (en) * 2019-01-08 2024-01-23 广东浪潮大数据研究有限公司 Two-dimensional convolution operation processing method, system, equipment and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107430537A (en) * 2015-03-27 2017-12-01 英特尔公司 From piece selective information is extracted in DRAM ECC
WO2017049496A1 (en) * 2015-09-23 2017-03-30 Intel Corporation Apparatus and method for local quantization for convolutional neural networks (cnns)
CN108122030A (en) * 2016-11-30 2018-06-05 华为技术有限公司 A kind of operation method of convolutional neural networks, device and server

Also Published As

Publication number Publication date
CN111310891A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
US11403838B2 (en) Image processing method, apparatus, equipment, and storage medium to obtain target image features
EP3637281A1 (en) Operational accelerator
JP6767660B2 (en) Processor, information processing device and how the processor operates
CN112801228B (en) Text recognition method, electronic equipment and storage medium thereof
CN109784372B (en) Target classification method based on convolutional neural network
WO2018214769A1 (en) Image processing method, device and system
WO2019076109A1 (en) Method and device for pooling image information, storage medium and processor
CN108304480A (en) A kind of text similarity determines method, apparatus and equipment
WO2021147196A1 (en) Convolution operation method, apparatus and device, and storage medium
KR20210014561A (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
JP2017532655A (en) Compress cascading style sheet files
CN113498521A (en) Text detection method and device and storage medium
CN111899185A (en) Training method and device of image noise reduction model, electronic equipment and storage medium
CN111144407A (en) Target detection method, system, device and readable storage medium
CN117252890A (en) Carotid plaque segmentation method, device, equipment and medium
CN110059563B (en) Text processing method and device
CN111027670B (en) Feature map processing method and device, electronic equipment and storage medium
US20240071066A1 (en) Object recognition method and apparatus, and device and medium
CN107862316A (en) Convolution operation method and device
US11288534B2 (en) Apparatus and method for image processing for machine learning
CN114220132A (en) Fingerprint image noise reduction method and device
CN110276332B (en) Video feature processing method and device
CN112766471B (en) Computing device and related product
CN114663320A (en) Image processing method, data set expansion method, storage medium, and electronic device
KR101841547B1 (en) Optimization method for the scale space construction on a mobile GPU

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20916109

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20916109

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20916109

Country of ref document: EP

Kind code of ref document: A1