WO2019136750A1 - Artificial intelligence-based computer-aided processing device and method, storage medium, and terminal - Google Patents

Artificial intelligence-based computer-aided processing device and method, storage medium, and terminal Download PDF

Info

Publication number
WO2019136750A1
WO2019136750A1 PCT/CN2018/072662 CN2018072662W WO2019136750A1 WO 2019136750 A1 WO2019136750 A1 WO 2019136750A1 CN 2018072662 W CN2018072662 W CN 2018072662W WO 2019136750 A1 WO2019136750 A1 WO 2019136750A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
zero
artificial intelligence
processed
auxiliary processing
Prior art date
Application number
PCT/CN2018/072662
Other languages
French (fr)
Chinese (zh)
Inventor
肖梦秋
Original Assignee
深圳鲲云信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳鲲云信息科技有限公司 filed Critical 深圳鲲云信息科技有限公司
Priority to PCT/CN2018/072662 priority Critical patent/WO2019136750A1/en
Priority to CN201880002144.7A priority patent/CN109313663B/en
Publication of WO2019136750A1 publication Critical patent/WO2019136750A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of artificial intelligence, and in particular to an artificial intelligence computing auxiliary processing method, apparatus, readable computer storage medium, and terminal.
  • the original data matrix needs to be zero-padded.
  • the zero-padding operation in the prior art can usually only be zero-padded by software technology, and the calculation amount of the CPU is very large, which leads to very low zero-filling efficiency.
  • an object of the present invention is to provide an industrial intelligence calculation auxiliary processing method and apparatus, a readable computer storage medium, and a terminal, which are used for solving the low efficiency and calculation of the zero-padding operation in the prior art.
  • an artificial intelligence calculation auxiliary processing apparatus comprising: a plurality of storage modules storing a data matrix to be processed; a memory, having a zero matrix; and a control module for using the The processing data matrix is taken out from the storage module and placed in the memory zero matrix, so that the to-be-processed data matrix can be constructed with a size of W*W to be convolved around any one of the first matrix elements.
  • the data matrix to be processed includes an n*m matrix, and the zero matrix includes an N*M zero matrix;
  • the data matrix to be processed is taken out from the storage module and placed in the inner zero matrix
  • the method further includes: the control module: the n*m a matrix is taken out from the storage module and placed in an N*M zero matrix of the memory to form a fill matrix; wherein the fill matrix includes: first to first Line, number To the Nth act zero; 1st to the first Column, number The column M is zero; the other regions are the n*m matrix.
  • the data matrix to be processed is taken out from the storage module and placed in the inner zero matrix, and the specific manner includes: dividing the n*m matrix into multiple a block matrix of the same size; the control module sequentially takes out each of the block matrices, and fills each of the block matrices into the N*M zero matrix according to a preset placement condition.
  • the preset placement condition includes: the control module sequentially sets each of the block matrix according to a preset start address in the memory and a size of the block matrix Into the N*M zero matrix.
  • the storage module includes a dual storage module.
  • the artificial intelligence calculation auxiliary processing device includes: a multiplier for multiplying each of the to-be-convolved matrices with a convolution kernel to obtain a corresponding multiplication result matrix; Each of the multiplication result matrices is aligned with each of the first matrix elements; an adder is configured to add the second matrix elements in each of the multiplication result matrices to obtain corresponding convolution result values; The convolution result value is aligned with each of the first matrix elements to form a convolution result matrix of size n*m.
  • the present invention provides an artificial intelligence calculation auxiliary processing method, which is applied to a control module, the method comprising: taking an n*m matrix from a storage module; placing the n*m matrix In the N*M zero matrix of the memory, the n*m matrix is capable of forming a to-be-convoluted matrix of size W*W centered on any one of the first matrix elements for convolution calculation with the convolution kernel ;among them, W is the order of the convolution kernel.
  • the control module extracts the n*m matrix from the storage module, and specifically includes: the n*m matrix is divided into a plurality of block matrices of the same size; The control module sequentially takes out each of the block matrices.
  • the placing the n*m matrix in the N*M zero matrix of the memory specifically includes: the control module sequentially according to the start address and the size of the block matrix Each of the block matrices is filled into the N*M zero matrix.
  • the present invention provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the artificial intelligence calculation auxiliary processing method.
  • an artificial intelligence computing auxiliary processing terminal comprising: a processor and a memory; the memory is for storing a computer program, and the processor is configured to execute the computer for storing the memory a program for causing the terminal to execute the artificial intelligence calculation auxiliary processing method.
  • the artificial intelligence calculation auxiliary processing method and apparatus As described above, the artificial intelligence calculation auxiliary processing method and apparatus, the readable computer storage medium, and the terminal of the present invention have the following beneficial effects: the artificial intelligence calculation auxiliary processing method and apparatus provided by the present invention, and the zero-fill operation is established through the hardware structure.
  • the system, and pre-set zero matrix in the memory can be used to realize the zero-padding operation when the data matrix to be processed is placed. It is not necessary to calculate the number of zero padding or the position of zero padding, which greatly reduces the calculation amount of the system and improves the calculation.
  • the efficiency of the zero-padding operation speeds up the response speed of operations such as image processing. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.
  • FIG. 1 is a schematic diagram of an artificial intelligence calculation auxiliary processing apparatus according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of an artificial intelligence intelligent calculation auxiliary processing device according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing method according to an embodiment of the present invention.
  • the invention provides an artificial intelligence calculation auxiliary processing device for performing a zero-padding operation on a data matrix to be processed.
  • the artificial intelligence calculation auxiliary processing device includes a plurality of storage modules 11, a memory 12, and a control module 13.
  • the plurality of storage modules 11 store an n*m matrix, and both n and m are natural numbers greater than or equal to 1.
  • An N*M zero matrix is stored in the memory 12; the control module 13 is configured to take out the n*m matrix and place it in the N*M zero matrix.
  • the storage module 11 is a dual storage module; the dual storage module specifically includes two storage modules.
  • the dual storage module specifically includes two storage modules.
  • the two storage modules are always in a state in which one is scanning and the other is processing data, and the effect is that each frame of the final output data appears to have undergone two steps of processing and scanning output. In fact, it is achieved through the cooperation of two memories, thus achieving the technical effect of multiplying the data transmission and processing efficiency.
  • the control module implements data transmission by using a DMA method.
  • the DMA is called Direct Memory Access, which means direct memory access, and is a controller that can directly access data from the memory without going through the CPU.
  • DMA mode the CPU only needs to issue instructions to the control module, let the control module process the data transmission, and then feed back the information to the CPU after the data is sent, thus greatly reducing the CPU resource occupancy rate and greatly saving system resources. .
  • the N*M zero matrix is a matrix composed of (N*M) zero values.
  • the W is the order of the W*W order convolution kernel matrix.
  • the convolution kernel matrix is a weight matrix for performing weighted average calculation on the matrix data, and the effect is equivalent to the filter in the convolution calculation.
  • the order of the weight matrix is odd, which facilitates determining the position of the matrix by the center element of the odd-order matrix.
  • control module takes the n*m-order matrix out of the storage module and places it in an N*M zero matrix of the memory to form a padding matrix.
  • n*m matrix from the storage module 11 into an N*M zero matrix in the memory 12 to form a fill matrix, so that the n*m matrix can be any one of its matrix elements
  • the center can form a matrix to be convolved with a size of W*W.
  • the process of convolution calculation of the n*m matrix and the W*W order convolution kernel is described below in a specific embodiment.
  • FIG. 2-6 a schematic diagram of an artificial intelligence intelligent calculation auxiliary processing process in an embodiment of the present invention is shown. among them:
  • Figure 2 shows the data matrix M1 to be processed and the convolution kernel matrix M2 in this embodiment.
  • the data matrix to be processed is a 5*5 order matrix
  • the convolution kernel matrix is a 3*3 order matrix
  • the values in each matrix are matrix elements of the matrix.
  • Figure 3 shows the zero matrix M3 in this embodiment.
  • Figure 4 shows the fill matrix M4 in this embodiment.
  • the padding matrix is formed after the to-be-processed data matrix M1 is placed in the zero matrix M3.
  • the first to the first of the filling matrix Line number To the Nth act zero; 1st to the first Column, number The M column is zero; the other regions are the n*m order matrix: the first, seventh, first, and seventh columns of the padding matrix M4 are all 0, and the second to sixth rows are The areas of the second to sixth columns are used to place the to-be-processed data matrix M1, and the padding matrix M4 is actually a matrix after the zero-padding operation of the to-be-processed data matrix M1.
  • the rectangular dotted frame R1 in Fig. 4 represents a 3*3 order to be convolved matrix M401 centered on the matrix element 18, which is located on the right side of Fig. 4.
  • the rectangular dotted frame R1 is moved to the right by the step size 1, and the to-be-convolved matrices M401 to M405 centered on the matrix elements in the first row of the filling matrix M4 are sequentially obtained.
  • the same operation is sequentially performed on the second to seventh rows in the same manner as in the first row, thereby finally obtaining a number of to-be-convolution matrices M401 to M425 of 25.
  • the matrixes to be convolved M401-M425 and the matrix elements in the data matrix M1 to be processed are aligned with each other.
  • the step of moving the rectangular dotted frame R1 in the embodiment is 1, that is, the distance of moving only one matrix element at a time
  • the step size of the moving of the dotted frame is not limited by the present invention.
  • FIG. 5 is a schematic diagram showing the multiplication of each of the to-be-convolved matrices and the convolution kernel matrix in this embodiment.
  • the artificial intelligence calculation auxiliary processing device includes a multiplier not shown for multiplying the convolution kernel matrix with each of the to-be-convolution matrices. Specifically, the matrix elements in the matrix to be convolved are multiplied with the matrix elements of the convolution kernel matrix to obtain corresponding multiplication result matrices M501 to M525. It should be noted that the multiplication result matrices M501 M M525 are aligned with the matrix elements in the to-be-processed data matrix M1.
  • Fig. 6 is a diagram showing the addition of the multiplication result matrices in this embodiment.
  • the artificial intelligence calculation auxiliary processing device includes an unillustrated adder for performing each of the multiplication result matrices M501 to M525 as follows: adding matrix elements in the multiplication result matrix , get the corresponding convolution result value. For example, the matrix elements in the multiplication result matrix M501 are added to obtain a convolution result value 32, and so on, all 25 result matrices are added, and finally a convolution result matrix M6 is obtained.
  • the artificial intelligence calculation auxiliary processing device provided by the present invention convolutes the n*m matrix in the storage module with the convolution kernel, and outputs a convolution result matrix of order n*m.
  • the artificial intelligence calculation auxiliary processing device constructs a zero-padding operation system through a hardware structure; in addition, the present invention pre-sets a zero matrix in the memory for the data matrix to be processed to be filled with zeros. Operation, no need to calculate the number of zero padding or the position of zero padding.
  • the invention realizes the calculation of the zero-filling operation by the software running through the CPU, greatly reduces the calculation amount of the system, improves the efficiency of the zero-padding operation, and speeds up the response speed of the image processing and the like.
  • the n*m matrix is divided into a plurality of block matrices of the same size, and the control module takes the n*m matrix out of the storage module and places the
  • the specific manner of the N*M zero matrix of the memory includes: the control module sequentially fetching each of the block matrices, and filling each of the block matrices into the N*M zero matrix according to a preset placement condition. The following is described in a specific embodiment.
  • FIG. 7 a schematic diagram of an artificial intelligence calculation auxiliary processing apparatus in an embodiment of the present invention is shown.
  • the processing device includes a storage module 71 with a data matrix M7 to be processed.
  • the data matrix M7 to be processed is a 4*4 matrix
  • the 2*2 matrix is a block matrix, which can be divided into four block matrices.
  • a rectangular dotted frame R2 represents one of the block matrices.
  • the processing device includes a memory 72 having a zero matrix M8 disposed therein, the zero matrix M8 being a 6*6 matrix.
  • the area of the zero matrix M8 for storing the data matrix M7 to be processed starts from the matrix element of the second row and the second column, and its storage address is 0x00220000.
  • the control module (not shown) uses the storage address as a starting address, and places the first block matrix of the data matrix M7 to be processed into a rectangular dotted frame R3.
  • the control module sequentially places each of the block matrices in a corresponding position in the zero matrix M8 according to the initial address and the size of the block matrix. For example, the control module puts the first block matrix into the zero matrix M8, and further stores the second block matrix into the zero matrix M8 with the storage address being 0x00220004 as the starting address. And so on until all the data matrix to be processed is placed in the zero matrix M8.
  • the artificial intelligence calculation auxiliary processing method extracts data from the storage module through the control module, and divides the data matrix to be processed into a plurality of block matrices of the same size, thereby greatly improving the efficiency of extracting data and speeding up the system. responding speed.
  • the invention also provides an artificial intelligence calculation auxiliary processing method, which is applied to the control module, and specifically includes:
  • the to-be-processed data matrix is placed in a zero matrix in the memory, so that the to-be-processed data matrix can form a to-be-convolution matrix of size W*W centering on any one of the first matrix elements.
  • the convolution kernel matrix performs a convolution calculation according to a preset step size; where W is the size of the convolution kernel matrix.
  • the implementation manner of the artificial intelligence calculation auxiliary processing method is similar to the implementation manner of the artificial intelligence calculation auxiliary processing device, and therefore will not be described again.
  • the aforementioned computer program can be stored in a computer readable storage medium.
  • the program when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • the invention also provides an artificial intelligence calculation auxiliary processing terminal, comprising: a processor and a memory.
  • the memory is for storing a computer program for executing the computer program stored by the memory to cause the terminal to execute the artificial intelligence calculation auxiliary processing method.
  • the memory mentioned above may include random access memory (RAM), and may also include non-volatile memory, such as at least one disk storage.
  • RAM random access memory
  • non-volatile memory such as at least one disk storage.
  • the above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP for short), and the like; or a digital signal processor (DSP), an application specific integrated circuit (DSP). ApplicationSpecificIntegratedCircuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • NP Network Processor
  • DSP digital signal processor
  • DSP application specific integrated circuit
  • ASIC ApplicationSpecificIntegratedCircuit
  • FPGA Field-Programmable Gate Array
  • the artificial intelligence calculation auxiliary processing method and device built a zero-padding operation system through a hardware structure, and a zero matrix is pre-set in the memory for the data matrix to be processed to realize the zero-padding operation.
  • parameters such as the number of zero padding or the position of zero padding, which greatly reduces the calculation amount of the system, improves the efficiency of the zero padding operation, and speeds up the response speed of operations such as image processing. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.

Abstract

An artificial intelligence-based computer-aided processing device, comprising: a plurality of storage modules (11) storing data matrixes to be processed; a memory (12) provided with a zero matrix; and a control module (13) for taking the data matrixes to be processed out from the storage modules (11) and placing same into the zero matrix of the memory, such that the data matrixes to be processed can constitute, with any first matrix element thereof as a center, a matrix to be convoluted having a size of w*w for a convolution kernel matrix to perform convolution computation according to a preset step length, wherein w is the size of the convolution kernel matrix. By constructing a zero padding operation system by means of a hardware structure, and presetting a zero matrix in the memory (12) for data matrixes to be processed to be placed, a zero padding operation can be implemented without computing parameters, such as a zero padding number or a zero padding position, so that the computational complexity of the system is greatly reduced, the efficiency of a zero padding operation is improved, and the response speed of operations, such as image processing, is increased.

Description

人工智能计算辅助处理装置、方法、存储介质、及终端Artificial intelligence calculation auxiliary processing device, method, storage medium, and terminal 技术领域Technical field
本发明涉及人工智能领域,特别是涉及人工智能计算辅助处理方法、装置、可读计算机存储介质、及终端。The present invention relates to the field of artificial intelligence, and in particular to an artificial intelligence computing auxiliary processing method, apparatus, readable computer storage medium, and terminal.
背景技术Background technique
现今,随着人工智能产业的发展,各种人工智能领域的技术随之兴起。其中,卷积神经网络已然成为众多人工智能领域的研究热点。Nowadays, with the development of the artificial intelligence industry, various technologies in the field of artificial intelligence have arisen. Among them, convolutional neural networks have become a research hotspot in many fields of artificial intelligence.
早在20世纪60年代,就有科学家在研究猫脑皮层中用于局部敏感和方向选择的神经元时发现其独特的网络结构可以有效地降低反馈神经网络的复杂性,继而提出了卷积神经网络。随后,又有更多的科研工作者投入到卷积神经网络的研究中。As early as the 1960s, when scientists studied the local sensitive and directional selection of neurons in the cat's cerebral cortex, they found that their unique network structure can effectively reduce the complexity of the feedback neural network, and then propose convolutional nerves. The internet. Later, more researchers were involved in the study of convolutional neural networks.
通常,为使卷积提取特征值后的矩阵大小与卷积之前的原数据矩阵大小一致,需对原数据矩阵进行补零操作。Generally, in order to make the matrix size after the convolution extraction feature value coincides with the original data matrix size before convolution, the original data matrix needs to be zero-padded.
但是,现有技术中的补零操作通常都只能通过软件技术进行补零,CPU的计算量非常大,进而导致补零效率非常低。However, the zero-padding operation in the prior art can usually only be zero-padded by software technology, and the calculation amount of the CPU is very large, which leads to very low zero-filling efficiency.
发明内容Summary of the invention
鉴于以上所述现有技术的缺点,本发明的目的在于提供一种工智能计算辅助处理方法、装置、可读计算机存储介质、及终端,用于解决现有技术中补零操作效率低、计算量大等技术问题。In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide an industrial intelligence calculation auxiliary processing method and apparatus, a readable computer storage medium, and a terminal, which are used for solving the low efficiency and calculation of the zero-padding operation in the prior art. A large amount of technical problems.
为实现上述目的及其他相关目的,本发明提供一种人工智能计算辅助处理装置,包括:多个存储模块,存储有待处理数据矩阵;内存,设有零矩阵;控制模块,用于将所述待处理数据矩阵从所述存储模块中取出,并置于所述内存零矩阵中,以令所述待处理数据矩阵能够以其任意一个第一矩阵元素为中心构成尺寸为W*W的待卷积矩阵,供卷积核矩阵按照预设步长进行卷积计算;其中,W为所述卷积核矩阵的尺寸。To achieve the above and other related objects, the present invention provides an artificial intelligence calculation auxiliary processing apparatus, comprising: a plurality of storage modules storing a data matrix to be processed; a memory, having a zero matrix; and a control module for using the The processing data matrix is taken out from the storage module and placed in the memory zero matrix, so that the to-be-processed data matrix can be constructed with a size of W*W to be convolved around any one of the first matrix elements. a matrix for convolution calculation of the convolution kernel matrix according to a preset step size; wherein W is the size of the convolution kernel matrix.
于本发明的一实施例中,所述待处理数据矩阵包括n*m矩阵,所述零矩阵包括N*M零矩阵;其中,
Figure PCTCN2018072662-appb-000001
In an embodiment of the invention, the data matrix to be processed includes an n*m matrix, and the zero matrix includes an N*M zero matrix;
Figure PCTCN2018072662-appb-000001
于本发明的一实施例中,所述将所述待处理数据矩阵从所述存储模块中取出,并置于所述内的零矩阵中,具体包括:所述控制模块将所述n*m矩阵从所述存储模块中取出并置于所 述内存的N*M零矩阵中以形成填充矩阵;其中,所述填充矩阵包括:第1至第
Figure PCTCN2018072662-appb-000002
行,第
Figure PCTCN2018072662-appb-000003
至第N行为零;第1至第
Figure PCTCN2018072662-appb-000004
列,第
Figure PCTCN2018072662-appb-000005
至第M列为零;其他区域为所述n*m矩阵。
In an embodiment of the present invention, the data matrix to be processed is taken out from the storage module and placed in the inner zero matrix, and the method further includes: the control module: the n*m a matrix is taken out from the storage module and placed in an N*M zero matrix of the memory to form a fill matrix; wherein the fill matrix includes: first to first
Figure PCTCN2018072662-appb-000002
Line, number
Figure PCTCN2018072662-appb-000003
To the Nth act zero; 1st to the first
Figure PCTCN2018072662-appb-000004
Column, number
Figure PCTCN2018072662-appb-000005
The column M is zero; the other regions are the n*m matrix.
于本发明的一实施例中,所述将所述待处理数据矩阵从所述存储模块中取出,并置于所述内的零矩阵中,具体方式包括:所述n*m矩阵分为多个尺寸相同的分块矩阵;所述控制模块依次取出各所述分块矩阵,并按照预设放置条件将各所述分块矩阵填入所述N*M零矩阵中。In an embodiment of the present invention, the data matrix to be processed is taken out from the storage module and placed in the inner zero matrix, and the specific manner includes: dividing the n*m matrix into multiple a block matrix of the same size; the control module sequentially takes out each of the block matrices, and fills each of the block matrices into the N*M zero matrix according to a preset placement condition.
于本发明的一实施例中,所述预设放置条件包括:所述控制模块根据所述内存中的预设起始地址以及所述分块矩阵的尺寸,依次将各所述分块矩阵置入所述N*M零矩阵中。In an embodiment of the present invention, the preset placement condition includes: the control module sequentially sets each of the block matrix according to a preset start address in the memory and a size of the block matrix Into the N*M zero matrix.
于本发明的一实施例中,所述存储模块包括双存储模块。In an embodiment of the invention, the storage module includes a dual storage module.
于本发明的一实施例中,所述人工智能计算辅助处理装置包括:乘法器,用于将每个所述待卷积矩阵分别与卷积核相乘,得到相应的乘法结果矩阵;其中,各所述乘法结果矩阵与各所述第一矩阵元素对位;加法器,用于将每个所述乘法结果矩阵中的第二矩阵元素相加,得到相应的卷积结果值;其中,各所述卷积结果值与各所述第一矩阵元素对位,以组成尺寸为n*m的卷积结果矩阵。In an embodiment of the present invention, the artificial intelligence calculation auxiliary processing device includes: a multiplier for multiplying each of the to-be-convolved matrices with a convolution kernel to obtain a corresponding multiplication result matrix; Each of the multiplication result matrices is aligned with each of the first matrix elements; an adder is configured to add the second matrix elements in each of the multiplication result matrices to obtain corresponding convolution result values; The convolution result value is aligned with each of the first matrix elements to form a convolution result matrix of size n*m.
为实现上述目的及其他相关目的,本发明提供一种人工智能计算辅助处理方法,应用于控制模块,所述方法包括:从存储模块中取出n*m矩阵;将所述n*m矩阵置于内存的N*M零矩阵中,以令所述n*m矩阵能够以其任意一个第一矩阵元素为中心构成尺寸为W*W的待卷积矩阵,供与所述卷积核进行卷积计算;其中,
Figure PCTCN2018072662-appb-000006
W为卷积核的阶数。
To achieve the above and other related objects, the present invention provides an artificial intelligence calculation auxiliary processing method, which is applied to a control module, the method comprising: taking an n*m matrix from a storage module; placing the n*m matrix In the N*M zero matrix of the memory, the n*m matrix is capable of forming a to-be-convoluted matrix of size W*W centered on any one of the first matrix elements for convolution calculation with the convolution kernel ;among them,
Figure PCTCN2018072662-appb-000006
W is the order of the convolution kernel.
于本发明的一实施例中,所述控制模块从所述存储模块中将所述n*m矩阵取出,具体包括:所述n*m矩阵分为多个尺寸相同的分块矩阵;所述控制模块依次取出各所述分块矩阵。In an embodiment of the present invention, the control module extracts the n*m matrix from the storage module, and specifically includes: the n*m matrix is divided into a plurality of block matrices of the same size; The control module sequentially takes out each of the block matrices.
于本发明的一实施例中,所述将所述n*m矩阵置于内存的N*M零矩阵中,具体包括:所述控制模块根据起始地址以及所述分块矩阵的尺寸,依次将各所述分块矩阵填入所述N*M零矩阵中。In an embodiment of the present invention, the placing the n*m matrix in the N*M zero matrix of the memory specifically includes: the control module sequentially according to the start address and the size of the block matrix Each of the block matrices is filled into the N*M zero matrix.
为实现上述目的及其他相关目的,本发明提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现所述人工智能计算辅助处理方法。To achieve the above and other related objects, the present invention provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the artificial intelligence calculation auxiliary processing method.
为实现上述目的及其他相关目的,本发明提供一种人工智能计算辅助处理终端,,包括:处理器及存储器;所述存储器用于存储计算机程序,所述处理器用于执行所述存储器存储的 计算机程序,以使所述终端执行所述人工智能计算辅助处理方法。To achieve the above and other related objects, the present invention provides an artificial intelligence computing auxiliary processing terminal, comprising: a processor and a memory; the memory is for storing a computer program, and the processor is configured to execute the computer for storing the memory a program for causing the terminal to execute the artificial intelligence calculation auxiliary processing method.
如上所述,本发明的人工智能计算辅助处理方法、装置、可读计算机存储介质、及终端,具有以下有益效果:本发明提供的人工智能计算辅助处理方法及装置,通过硬件结构搭建补零操作体系,且在内存中预设有零矩阵,供待处理数据矩阵置入即可实现补零操作,无需计算补零的数量或者补零的位置等参数,大大降低了系统的计算量,提升了补零操作的效率,加快了图像处理等操作的响应速度。所以,本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。As described above, the artificial intelligence calculation auxiliary processing method and apparatus, the readable computer storage medium, and the terminal of the present invention have the following beneficial effects: the artificial intelligence calculation auxiliary processing method and apparatus provided by the present invention, and the zero-fill operation is established through the hardware structure. The system, and pre-set zero matrix in the memory, can be used to realize the zero-padding operation when the data matrix to be processed is placed. It is not necessary to calculate the number of zero padding or the position of zero padding, which greatly reduces the calculation amount of the system and improves the calculation. The efficiency of the zero-padding operation speeds up the response speed of operations such as image processing. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.
附图说明DRAWINGS
图1显示为本发明一实施例中的人工智能计算辅助处理装置的示意图。FIG. 1 is a schematic diagram of an artificial intelligence calculation auxiliary processing apparatus according to an embodiment of the present invention.
图2显示为本发明一实施例中的人工智智能计算辅助处理过程的示意图。FIG. 2 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
图3显示为本发明一实施例中的人工智智能计算辅助处理过程的示意图。FIG. 3 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
图4显示为本发明一实施例中的人工智智能计算辅助处理过程的示意图。FIG. 4 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
图5显示为本发明一实施例中的人工智智能计算辅助处理过程的示意图。FIG. 5 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
图6显示为本发明一实施例中的人工智智能计算辅助处理过程的示意图。FIG. 6 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing process according to an embodiment of the present invention.
图7显示为本发明一实施例中的人工智智能计算辅助处理装置的示意图。FIG. 7 is a schematic diagram of an artificial intelligence intelligent calculation auxiliary processing device according to an embodiment of the present invention.
图8显示为本发明一实施例中的人工智智能计算辅助处理方法的示意图。FIG. 8 is a schematic diagram showing an artificial intelligence intelligent calculation auxiliary processing method according to an embodiment of the present invention.
元件标号说明Component label description
11          存储模块11 storage module
12          内存12 memory
13          控制模块13 control module
M1          待处理数据矩阵M1 pending data matrix
M2          卷积核矩阵M2 convolution kernel matrix
M3          零矩阵M3 zero matrix
M4          填充矩阵M4 fill matrix
M401~M425  待卷积矩阵M401~M425 to be convolution matrix
M501~M505  乘法结果矩阵M501~M505 multiplication result matrix
M6          卷积结果矩阵M6 convolution result matrix
M7          待处理数据矩阵M7 pending data matrix
M8          零矩阵M8 zero matrix
71          存储模块71 storage module
72          内存72 memory
R1          矩形虚线框R1 rectangular dotted frame
R2          矩形虚线框R2 rectangular dotted frame
R3          矩形虚线框R3 rectangular dotted frame
S801~S802  步骤S801~S802 steps
具体实施方式Detailed ways
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。The embodiments of the present invention are described below by way of specific examples, and those skilled in the art can readily understand other advantages and effects of the present invention from the disclosure of the present disclosure. The present invention may be embodied or applied in various other specific embodiments, and various modifications and changes can be made without departing from the spirit and scope of the invention. It should be noted that the features in the following embodiments and embodiments may be combined with each other without conflict.
需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,遂图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention in a schematic manner, and only the components related to the present invention are shown in the drawings, rather than the number and shape of components in actual implementation. Dimensional drawing, the actual type of implementation of each component's type, number and proportion can be a random change, and its component layout can be more complicated.
本发明提供一种人工智能计算辅助处理装置,用于对待处理数据矩阵进行补零操作。The invention provides an artificial intelligence calculation auxiliary processing device for performing a zero-padding operation on a data matrix to be processed.
如图1所示,展示本发明一实施例中的人工智能计算辅助处理装置。所述人工智能计算辅助处理装置包括多个存储模块11、内存12、以及控制模块13。其中,所述多个存储模块11存储有n*m矩阵,n和m均为大于等于1的自然数。所述内存12中存储有N*M零矩阵;所述控制模块13用于取出所述n*m矩阵并置于所述N*M零矩阵中。As shown in FIG. 1, an artificial intelligence calculation auxiliary processing apparatus in an embodiment of the present invention is shown. The artificial intelligence calculation auxiliary processing device includes a plurality of storage modules 11, a memory 12, and a control module 13. The plurality of storage modules 11 store an n*m matrix, and both n and m are natural numbers greater than or equal to 1. An N*M zero matrix is stored in the memory 12; the control module 13 is configured to take out the n*m matrix and place it in the N*M zero matrix.
优选的,所述存储模块11为双存储模块;所述双存储模块具体包括两个存储模块,当其中一个存储模块在扫描输出时另一个存储模块在处理数据;到下一个周期时,处理完数据的存储模块开始扫描输出,而原先在扫描输出的存储模块开始处理数据。也即,所述两个存储模块总是处于其中一个在扫描而另一个在处理数据的状态,其效果为最终输出的每一帧数据看似都经历了处理和扫描输出这两个步骤,而其实是通过两个存储器协同合作共同完成的,从而达到了成倍提升数据传输和处理效率的技术效果。Preferably, the storage module 11 is a dual storage module; the dual storage module specifically includes two storage modules. When one storage module scans the output, the other storage module processes the data; when the next cycle is processed, the processing is completed. The storage module of the data starts to scan the output, and the storage module that originally scanned the output starts processing the data. That is, the two storage modules are always in a state in which one is scanning and the other is processing data, and the effect is that each frame of the final output data appears to have undergone two steps of processing and scanning output. In fact, it is achieved through the cooperation of two memories, thus achieving the technical effect of multiplying the data transmission and processing efficiency.
优选的,所述控制模块通过DMA方式实现数据传输。具体的,所述DMA全称为Direct Memory Access,意为直接内存访问,是一种不经过CPU即可直接从内存存取数据的控制器。 在DMA模式下,CPU只需向控制模块下达指令,让控制模块处理数据的传送,待数据栓送完毕再将信息反馈至CPU,这样就大幅降低了CPU的资源占有率,大大节省了系统资源。Preferably, the control module implements data transmission by using a DMA method. Specifically, the DMA is called Direct Memory Access, which means direct memory access, and is a controller that can directly access data from the memory without going through the CPU. In DMA mode, the CPU only needs to issue instructions to the control module, let the control module process the data transmission, and then feed back the information to the CPU after the data is sent, thus greatly reducing the CPU resource occupancy rate and greatly saving system resources. .
所述N*M零矩阵,是由(N*M)个零值组成的矩阵。具体的,
Figure PCTCN2018072662-appb-000007
Figure PCTCN2018072662-appb-000008
所述W为W*W阶卷积核矩阵的阶数。其中,所述卷积核矩阵是一种权重矩阵,用于对矩阵数据进行加权平均计算,其作用相当于卷积计算中的过滤器。通常来说,所述权重矩阵的阶数取奇数,便于通过奇数阶矩阵的中心元素确定矩阵的位置。
The N*M zero matrix is a matrix composed of (N*M) zero values. specific,
Figure PCTCN2018072662-appb-000007
Figure PCTCN2018072662-appb-000008
The W is the order of the W*W order convolution kernel matrix. The convolution kernel matrix is a weight matrix for performing weighted average calculation on the matrix data, and the effect is equivalent to the filter in the convolution calculation. In general, the order of the weight matrix is odd, which facilitates determining the position of the matrix by the center element of the odd-order matrix.
具体的,所述控制模块将所述n*m阶矩阵从所述存储模块中取出,并置于所述内存的N*M零矩阵中以形成填充矩阵。所述填充矩阵的第1至第
Figure PCTCN2018072662-appb-000009
行,第
Figure PCTCN2018072662-appb-000010
至第N行为零;第1至第
Figure PCTCN2018072662-appb-000011
列,第
Figure PCTCN2018072662-appb-000012
至第M列为零;其他区域为所述n*m矩阵。
Specifically, the control module takes the n*m-order matrix out of the storage module and places it in an N*M zero matrix of the memory to form a padding matrix. The first to the first of the filling matrix
Figure PCTCN2018072662-appb-000009
Line, number
Figure PCTCN2018072662-appb-000010
To the Nth act zero; 1st to the first
Figure PCTCN2018072662-appb-000011
Column, number
Figure PCTCN2018072662-appb-000012
The column M is zero; the other regions are the n*m matrix.
将所述n*m矩阵从所述存储模块11中取出置入所述内存12中的N*M零矩阵中以形成填充矩阵,从而使所述n*m矩阵能够以其任意一个矩阵元素为中心都能构成尺寸为W*W的待卷积矩阵。下面以具体的实施例说明所述n*m矩阵与W*W阶卷积核进行卷积计算的过程。And extracting the n*m matrix from the storage module 11 into an N*M zero matrix in the memory 12 to form a fill matrix, so that the n*m matrix can be any one of its matrix elements The center can form a matrix to be convolved with a size of W*W. The process of convolution calculation of the n*m matrix and the W*W order convolution kernel is described below in a specific embodiment.
如图2~6所示,展示本发明一实施例中人工智智能计算辅助处理过程的示意图。其中:As shown in FIG. 2-6, a schematic diagram of an artificial intelligence intelligent calculation auxiliary processing process in an embodiment of the present invention is shown. among them:
图2展示的是该实施例中的待处理数据矩阵M1和卷积核矩阵M2。其中,所述待处理数据矩阵为5*5阶矩阵,卷积核矩阵为3*3阶矩阵,各矩阵中的数值即为该矩阵的矩阵元素。Figure 2 shows the data matrix M1 to be processed and the convolution kernel matrix M2 in this embodiment. The data matrix to be processed is a 5*5 order matrix, and the convolution kernel matrix is a 3*3 order matrix, and the values in each matrix are matrix elements of the matrix.
图3展示的是该实施例中的零矩阵M3。所述零矩阵设于内存中,且根据
Figure PCTCN2018072662-appb-000013
Figure PCTCN2018072662-appb-000014
n=5,W=3可知:所述零矩阵为7*7阶矩阵。
Figure 3 shows the zero matrix M3 in this embodiment. The zero matrix is set in the memory and is based on
Figure PCTCN2018072662-appb-000013
Figure PCTCN2018072662-appb-000014
n=5, W=3, the zero matrix is a 7*7 matrix.
图4展示的是该实施例中的填充矩阵M4。所述填充矩阵由所述待处理数据矩阵M1置入所述零矩阵M3之后形成。根据填充矩阵的第1至第
Figure PCTCN2018072662-appb-000015
行,第
Figure PCTCN2018072662-appb-000016
至第N行为零;第1至第
Figure PCTCN2018072662-appb-000017
列,第
Figure PCTCN2018072662-appb-000018
至第M列为零;其他区域为所述n*m阶矩阵可知:所述填充矩阵M4的第1、第7行、第1列、以及第7列均为0,第2~6行以及第2~6列的区域用于放置所述待处理数据矩阵M1,所述填充矩阵M4实则为所述待处理数据矩阵M1进行补零操作后的矩阵。
Figure 4 shows the fill matrix M4 in this embodiment. The padding matrix is formed after the to-be-processed data matrix M1 is placed in the zero matrix M3. According to the first to the first of the filling matrix
Figure PCTCN2018072662-appb-000015
Line, number
Figure PCTCN2018072662-appb-000016
To the Nth act zero; 1st to the first
Figure PCTCN2018072662-appb-000017
Column, number
Figure PCTCN2018072662-appb-000018
The M column is zero; the other regions are the n*m order matrix: the first, seventh, first, and seventh columns of the padding matrix M4 are all 0, and the second to sixth rows are The areas of the second to sixth columns are used to place the to-be-processed data matrix M1, and the padding matrix M4 is actually a matrix after the zero-padding operation of the to-be-processed data matrix M1.
图4中的矩形虚线框R1代表的是以矩阵元素18为中心构成的3*3阶待卷积矩阵M401, 位于图4的右侧。将所述矩形虚线框R1以步长1向右移动,依次得到以所述填充矩阵M4的第一行中各矩阵元素为中心的待卷积矩阵M401~M405。按照与第一行同样的方法依次对第二行至第七行执行同样的操作,从而最终得到数量为25个的待卷积矩阵M401~M425。The rectangular dotted frame R1 in Fig. 4 represents a 3*3 order to be convolved matrix M401 centered on the matrix element 18, which is located on the right side of Fig. 4. The rectangular dotted frame R1 is moved to the right by the step size 1, and the to-be-convolved matrices M401 to M405 centered on the matrix elements in the first row of the filling matrix M4 are sequentially obtained. The same operation is sequentially performed on the second to seventh rows in the same manner as in the first row, thereby finally obtaining a number of to-be-convolution matrices M401 to M425 of 25.
需要说明的是,所述待卷积矩阵M401~M425与所述待处理数据矩阵M1中的各个矩阵元素相互对位。此外,所述矩形虚线框R1虽于该实施例中移动的步长为1,也即每次只移动一个矩阵元素的距离,但本发明对所述虚线框移动的步长并不做限定。It should be noted that the matrixes to be convolved M401-M425 and the matrix elements in the data matrix M1 to be processed are aligned with each other. In addition, although the step of moving the rectangular dotted frame R1 in the embodiment is 1, that is, the distance of moving only one matrix element at a time, the step size of the moving of the dotted frame is not limited by the present invention.
图5展示的是该实施例中各所述待卷积矩阵与卷积核矩阵相乘的示意图。所述人工智能计算辅助处理装置包括一未图示的乘法器,用于将所述卷积核矩阵分别与每个所述待卷积矩阵进行乘法运算。具体而言,所述待卷积矩阵中的矩阵元素与所述卷积核矩阵中对位的矩阵元素进行相乘,得到相应的乘法结果矩阵M501~M525。需要说明的是,所述乘法结果矩阵M501~M525与所述待处理数据矩阵M1中各个矩阵元素是对位的。FIG. 5 is a schematic diagram showing the multiplication of each of the to-be-convolved matrices and the convolution kernel matrix in this embodiment. The artificial intelligence calculation auxiliary processing device includes a multiplier not shown for multiplying the convolution kernel matrix with each of the to-be-convolution matrices. Specifically, the matrix elements in the matrix to be convolved are multiplied with the matrix elements of the convolution kernel matrix to obtain corresponding multiplication result matrices M501 to M525. It should be noted that the multiplication result matrices M501 M M525 are aligned with the matrix elements in the to-be-processed data matrix M1.
图6展示的是该实施例中各乘法结果矩阵进行加法运算的示意图。所述人工智能计算辅助处理装置包括一未图示的加法器,用于将所述乘法结果矩阵M501~M525中的每个矩阵进行如下操作:将所述乘法结果矩阵中的矩阵元素进行相加,得到相应的卷积结果值。举例来说,将乘法结果矩阵M501中的矩阵元素相加得到卷积结果值32,以此类推将全部25个结果矩阵进行加法运算,并最终得到卷积结果矩阵M6。Fig. 6 is a diagram showing the addition of the multiplication result matrices in this embodiment. The artificial intelligence calculation auxiliary processing device includes an unillustrated adder for performing each of the multiplication result matrices M501 to M525 as follows: adding matrix elements in the multiplication result matrix , get the corresponding convolution result value. For example, the matrix elements in the multiplication result matrix M501 are added to obtain a convolution result value 32, and so on, all 25 result matrices are added, and finally a convolution result matrix M6 is obtained.
从上述具体实施例可知,本发明提供的人工智能计算辅助处理装置将存储模块中的n*m矩阵与卷积核进行卷积计算后,输出阶数为n*m的卷积结果矩阵。It can be seen from the above specific embodiment that the artificial intelligence calculation auxiliary processing device provided by the present invention convolutes the n*m matrix in the storage module with the convolution kernel, and outputs a convolution result matrix of order n*m.
值得注意的是,本发明提供的人工智能计算辅助处理装置则通过硬件结构搭建补零操作体系;此外,本发明在内存中预设有零矩阵,供待处理数据矩阵置入即可实现补零操作,无需计算补零的数量或者补零的位置等参数。本发明与现有技术通过软件经CPU运行处理实现补零的操作方式相比,大大降低了系统的计算量,提升了补零操作的效率,加快了图像处理等操作的响应速度。It should be noted that the artificial intelligence calculation auxiliary processing device provided by the present invention constructs a zero-padding operation system through a hardware structure; in addition, the present invention pre-sets a zero matrix in the memory for the data matrix to be processed to be filled with zeros. Operation, no need to calculate the number of zero padding or the position of zero padding. Compared with the prior art, the invention realizes the calculation of the zero-filling operation by the software running through the CPU, greatly reduces the calculation amount of the system, improves the efficiency of the zero-padding operation, and speeds up the response speed of the image processing and the like.
可选的,在一实施例中,所述n*m矩阵分为多个尺寸相同的分块矩阵,所述控制模块将所述n*m矩阵从所述存储模块中取出并置于所述内存的N*M零矩阵中的具体方式包括:所述控制模块依次取出各所述分块矩阵,并按照预设放置条件将各所述分块矩阵填入所述N*M零矩阵中,下面以一具体的实施例予以说明。Optionally, in an embodiment, the n*m matrix is divided into a plurality of block matrices of the same size, and the control module takes the n*m matrix out of the storage module and places the The specific manner of the N*M zero matrix of the memory includes: the control module sequentially fetching each of the block matrices, and filling each of the block matrices into the N*M zero matrix according to a preset placement condition. The following is described in a specific embodiment.
如图7所示,展示本发明一实施例中人工智能计算辅助处理装置的示意图。所述处理装置包括存储模块71,内设有待处理数据矩阵M7。所述待处理数据矩阵M7为4*4矩阵,以2*2矩阵为分块矩阵,可分为4个分块矩阵,图7中用矩形虚线框R2代表其中一个分块矩阵。As shown in FIG. 7, a schematic diagram of an artificial intelligence calculation auxiliary processing apparatus in an embodiment of the present invention is shown. The processing device includes a storage module 71 with a data matrix M7 to be processed. The data matrix M7 to be processed is a 4*4 matrix, and the 2*2 matrix is a block matrix, which can be divided into four block matrices. In FIG. 7, a rectangular dotted frame R2 represents one of the block matrices.
所述处理装置包括内存72,内设有零矩阵M8,所述零矩阵M8为6*6矩阵。所述零矩阵M8中用于存放所述待处理数据矩阵M7的区域起始于第二行第二列的矩阵元素,其存储地址为0x00220000。未图示的控制模块以该存储地址作为起始地址,将所述待处理数据矩阵M7的第一个分块矩阵置入矩形虚线框R3中。The processing device includes a memory 72 having a zero matrix M8 disposed therein, the zero matrix M8 being a 6*6 matrix. The area of the zero matrix M8 for storing the data matrix M7 to be processed starts from the matrix element of the second row and the second column, and its storage address is 0x00220000. The control module (not shown) uses the storage address as a starting address, and places the first block matrix of the data matrix M7 to be processed into a rectangular dotted frame R3.
所述控制模块依照所述初始地址以及所述分块矩阵的尺寸,依次将各个分块矩阵放置于所述零矩阵M8中相应的位置。举例来说,所述控制模块将第一个分块矩阵置入所述零矩阵M8之后,又以存储地址为0x00220004为起始地址,将第二个分块矩阵置入所述零矩阵M8中;以此类推,直至将全部的待处理数据矩阵置入所述零矩阵M8中。The control module sequentially places each of the block matrices in a corresponding position in the zero matrix M8 according to the initial address and the size of the block matrix. For example, the control module puts the first block matrix into the zero matrix M8, and further stores the second block matrix into the zero matrix M8 with the storage address being 0x00220004 as the starting address. And so on until all the data matrix to be processed is placed in the zero matrix M8.
本发明提供的人工智能计算辅助处理方法,通过控制模块从存储模块取出数据,且将待处理数据矩阵分为多个尺寸相同的分块矩阵,从而大幅提升了取出数据的效率,加快了系统的响应速度。The artificial intelligence calculation auxiliary processing method provided by the invention extracts data from the storage module through the control module, and divides the data matrix to be processed into a plurality of block matrices of the same size, thereby greatly improving the efficiency of extracting data and speeding up the system. responding speed.
本发明还提供一种人工智能计算辅助处理方法,应用于控制模块,具体包括:The invention also provides an artificial intelligence calculation auxiliary processing method, which is applied to the control module, and specifically includes:
S801:从存储模块中取出待处理数据矩阵;S801: Extract a data matrix to be processed from the storage module.
S802:将所述待处理数据矩阵置于内存中的零矩阵中,以令所述待处理数据矩阵能够以其任意一个第一矩阵元素为中心构成尺寸为W*W的待卷积矩阵,供卷积核矩阵按照预设步长进行卷积计算;其中,W为所述卷积核矩阵的尺寸。S802: The to-be-processed data matrix is placed in a zero matrix in the memory, so that the to-be-processed data matrix can form a to-be-convolution matrix of size W*W centering on any one of the first matrix elements. The convolution kernel matrix performs a convolution calculation according to a preset step size; where W is the size of the convolution kernel matrix.
所述人工智能计算辅助处理方法的实施方式与所述人工智能计算辅助处理装置的实施方式类似,故不再赘述。The implementation manner of the artificial intelligence calculation auxiliary processing method is similar to the implementation manner of the artificial intelligence calculation auxiliary processing device, and therefore will not be described again.
本领域普通技术人员可以理解:实现上述人工智能计算辅助处理方法实施例的全部或部分步骤可以通过计算机程序相关的硬件来完成。前述的计算机程序可以存储于一计算机可读存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that all or part of the steps of implementing the above-mentioned artificial intelligence calculation auxiliary processing method embodiment can be completed by computer program related hardware. The aforementioned computer program can be stored in a computer readable storage medium. The program, when executed, performs the steps including the foregoing method embodiments; and the foregoing storage medium includes various media that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
本发明还提供一种人工智能计算辅助处理终端,包括:处理器及存储器。所述存储器用于存储计算机程序,所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行所述人工智能计算辅助处理方法。The invention also provides an artificial intelligence calculation auxiliary processing terminal, comprising: a processor and a memory. The memory is for storing a computer program for executing the computer program stored by the memory to cause the terminal to execute the artificial intelligence calculation auxiliary processing method.
上述提到的存储器可能包含随机存取存储器(RandomAccessMemory,简称RAM),也可能还包括非易失性存储器(non-volatilememory),例如至少一个磁盘存储器。The memory mentioned above may include random access memory (RAM), and may also include non-volatile memory, such as at least one disk storage.
上述的处理器可以是通用处理器,包括中央处理器(CentralProcessingUnit,简称CPU)、网络处理器(NetworkProcessor,简称NP)等;还可以是数字信号处理器(DigitalSignalProcessing,简称DSP)、专用集成电路(ApplicationSpecificIntegratedCircuit,简称ASIC)、现场可编程门阵 列(Field-ProgrammableGateArray,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP for short), and the like; or a digital signal processor (DSP), an application specific integrated circuit (DSP). ApplicationSpecificIntegratedCircuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
综上所述,本发明提供的人工智能计算辅助处理方法及装置,通过硬件结构搭建补零操作体系,且在内存中预设有零矩阵,供待处理数据矩阵置入即可实现补零操作,无需计算补零的数量或者补零的位置等参数,大大降低了系统的计算量,提升了补零操作的效率,加快了图像处理等操作的响应速度。所以,本发明有效克服了现有技术中的种种缺点而具高度产业利用价值。In summary, the artificial intelligence calculation auxiliary processing method and device provided by the present invention build a zero-padding operation system through a hardware structure, and a zero matrix is pre-set in the memory for the data matrix to be processed to realize the zero-padding operation. There is no need to calculate parameters such as the number of zero padding or the position of zero padding, which greatly reduces the calculation amount of the system, improves the efficiency of the zero padding operation, and speeds up the response speed of operations such as image processing. Therefore, the present invention effectively overcomes various shortcomings in the prior art and has high industrial utilization value.
上述实施例仅例示性说明本发明的原理及其功效,而非用于限制本发明。任何熟悉此技术的人士皆可在不违背本发明的精神及范畴下,对上述实施例进行修饰或改变。因此,举凡所属技术领域中具有通常知识者在未脱离本发明所揭示的精神与技术思想下所完成的一切等效修饰或改变,仍应由本发明的权利要求所涵盖。The above-described embodiments are merely illustrative of the principles of the invention and its effects, and are not intended to limit the invention. Modifications or variations of the above-described embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention. Therefore, all equivalent modifications or changes made by those skilled in the art without departing from the spirit and scope of the invention will be covered by the appended claims.

Claims (13)

  1. 一种人工智能计算辅助处理装置,其特征在于,包括:An artificial intelligence calculation auxiliary processing device, comprising:
    多个存储模块,存储有待处理数据矩阵;a plurality of storage modules storing a data matrix to be processed;
    内存,设有零矩阵;Memory, with a zero matrix;
    控制模块,用于将所述待处理数据矩阵从所述存储模块中取出,并置于所述内存零矩阵中,以令所述待处理数据矩阵能够以其任意一个第一矩阵元素为中心构成尺寸为W*W的待卷积矩阵,供卷积核矩阵按照预设步长进行卷积计算;其中,W为所述卷积核矩阵的尺寸。a control module, configured to take the to-be-processed data matrix from the storage module, and place the data matrix in the memory, so that the to-be-processed data matrix can be centered on any one of the first matrix elements A convolution matrix of size W*W is used for convolution calculation of the convolution kernel matrix according to a preset step size; wherein W is the size of the convolution kernel matrix.
  2. 根据权利要求1所述的人工智能计算辅助处理装置,其特征在于,包括:The artificial intelligence calculation auxiliary processing device according to claim 1, comprising:
    所述待处理数据矩阵包括n*m矩阵,所述零矩阵包括N*M零矩阵;The data matrix to be processed includes an n*m matrix, and the zero matrix includes an N*M zero matrix;
    其中,
    Figure PCTCN2018072662-appb-100001
    among them,
    Figure PCTCN2018072662-appb-100001
  3. 根据权利要求2所述的人工智能计算辅助处理装置,其特征在于,所述将所述待处理数据矩阵从所述存储模块中取出,并置于所述内的零矩阵中,具体包括:The artificial intelligence calculation auxiliary processing device according to claim 2, wherein the data matrix to be processed is taken out from the storage module and placed in the zero matrix in the inner layer, and specifically includes:
    所述控制模块将所述n*m矩阵从所述存储模块中取出并置于所述内存的N*M零矩阵中以形成填充矩阵;其中,所述填充矩阵包括:The control module takes the n*m matrix out of the storage module and places it in an N*M zero matrix of the memory to form a padding matrix; wherein the padding matrix includes:
    第1至第
    Figure PCTCN2018072662-appb-100002
    行,第
    Figure PCTCN2018072662-appb-100003
    至第N行为零;第1至第
    Figure PCTCN2018072662-appb-100004
    列,第
    Figure PCTCN2018072662-appb-100005
    至第M列为零;其他区域为所述n*m矩阵。
    1st to the first
    Figure PCTCN2018072662-appb-100002
    Line, number
    Figure PCTCN2018072662-appb-100003
    To the Nth act zero; 1st to the first
    Figure PCTCN2018072662-appb-100004
    Column, number
    Figure PCTCN2018072662-appb-100005
    The column M is zero; the other regions are the n*m matrix.
  4. 根据权利要求2所述的人工智能计算辅助处理装置,其特征在于,所述将所述待处理数据矩阵从所述存储模块中取出,并置于所述内的零矩阵中,具体方式包括:The artificial intelligence calculation auxiliary processing device according to claim 2, wherein the data matrix to be processed is taken out from the storage module and placed in the zero matrix in the medium, and the specific manner includes:
    所述n*m矩阵分为多个尺寸相同的分块矩阵;所述控制模块依次取出各所述分块矩阵,并按照预设放置条件将各所述分块矩阵填入所述N*M零矩阵中。The n*m matrix is divided into a plurality of block matrices of the same size; the control module sequentially extracts each of the block matrices, and fills each of the block matrices into the N*M according to a preset placement condition. In the zero matrix.
  5. 根据权利要求4所述的人工智能计算辅助处理装置,其特征在于,所述预设放置条件包括:The artificial intelligence calculation auxiliary processing device according to claim 4, wherein the preset placement condition comprises:
    所述控制模块根据所述内存中的预设起始地址以及所述分块矩阵的尺寸,依次将各所述分块矩阵置入所述N*M零矩阵中。The control module sequentially places each of the block matrices into the N*M zero matrix according to a preset start address in the memory and a size of the block matrix.
  6. 根据权利要求1所述的人工智能计算辅助处理装置,其特征在于,所述存储模块包括双存储模块。The artificial intelligence calculation auxiliary processing device according to claim 1, wherein the storage module comprises a dual storage module.
  7. 根据权利要求1所述的人工智能计算辅助处理装置,其特征在于,包括:The artificial intelligence calculation auxiliary processing device according to claim 1, comprising:
    乘法器,用于将每个所述待卷积矩阵分别与卷积核相乘,得到相应的乘法结果矩阵;其中,各所述乘法结果矩阵与各所述第一矩阵元素对位;a multiplier for multiplying each of the to-be-convolution matrices with a convolution kernel to obtain a corresponding multiplication result matrix; wherein each of the multiplication result matrices is aligned with each of the first matrix elements;
    加法器,用于将每个所述乘法结果矩阵中的第二矩阵元素相加,得到相应的卷积结果值;其中,各所述卷积结果值与各所述第一矩阵元素对位,以组成尺寸为n*m的卷积结果矩阵。An adder, configured to add a second matrix element in each of the multiplication result matrices to obtain a corresponding convolution result value; wherein each of the convolution result values is aligned with each of the first matrix elements, To form a convolution result matrix of size n*m.
  8. 一种人工智能计算辅助处理方法,其特征在于,应用于控制模块,所述方法包括:An artificial intelligence calculation auxiliary processing method is characterized in that it is applied to a control module, and the method includes:
    从存储模块中取出待处理数据矩阵;Extracting the data matrix to be processed from the storage module;
    将所述待处理数据矩阵置于内存中的零矩阵中,以令所述待处理数据矩阵能够以其任意一个第一矩阵元素为中心构成尺寸为W*W的待卷积矩阵,供卷积核矩阵按照预设步长进行卷积计算;其中,W为所述卷积核矩阵的尺寸。And placing the to-be-processed data matrix in a zero matrix in the memory, so that the to-be-processed data matrix can form a to-be-convolved matrix of size W*W centered on any one of the first matrix elements, for convolution The kernel matrix performs a convolution calculation according to a preset step size; where W is the size of the convolution kernel matrix.
  9. 根据权利要求8所述的人工智能计算辅助处理方法,其特征在于,包括:The artificial intelligence calculation auxiliary processing method according to claim 8, comprising:
    所述待处理数据矩阵包括n*m矩阵,所述零矩阵包括N*M零矩阵;The data matrix to be processed includes an n*m matrix, and the zero matrix includes an N*M zero matrix;
    其中,
    Figure PCTCN2018072662-appb-100006
    among them,
    Figure PCTCN2018072662-appb-100006
  10. 根据权利要求8所述的人工智能计算辅助处理方法,其特征在于,所述将所述待处理数据矩阵从所述存储模块中取出,并置于所述内的零矩阵中,具体包括:The artificial intelligence calculation auxiliary processing method according to claim 8, wherein the data matrix to be processed is taken out from the storage module and placed in the inner zero matrix, and specifically includes:
    所述n*m矩阵分为多个尺寸相同的分块矩阵,供所述控制模块依次取出。The n*m matrix is divided into a plurality of block matrices of the same size, which are sequentially taken out by the control module.
  11. 根据权利要求8所述的人工智能计算辅助处理方法,其特征在于,所述将所述待处理数据矩阵从所述存储模块中取出,并置于所述内的零矩阵中,具体包括:The artificial intelligence calculation auxiliary processing method according to claim 8, wherein the data matrix to be processed is taken out from the storage module and placed in the inner zero matrix, and specifically includes:
    所述控制模块根据起始地址以及所述分块矩阵的尺寸,依次将各所述分块矩阵填入所述N*M零矩阵中。The control module sequentially fills each of the block matrices into the N*M zero matrix according to a starting address and a size of the block matrix.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求8至11中任一项所述的人工智能计算辅助处理方法。A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor to implement the artificial intelligence calculation auxiliary processing method according to any one of claims 8 to 11.
  13. 一种人工智能计算辅助处理终端,其特征在于,包括:处理器及存储器;An artificial intelligence computing auxiliary processing terminal, comprising: a processor and a memory;
    所述存储器用于存储计算机程序,所述处理器用于执行所述存储器存储的计算机程序,以使所述终端执行如权利要求8至11中任一项所述的人工智能计算辅助处理方法。The memory is for storing a computer program for executing the computer program stored by the memory to cause the terminal to execute the artificial intelligence calculation auxiliary processing method according to any one of claims 8 to 11.
PCT/CN2018/072662 2018-01-15 2018-01-15 Artificial intelligence-based computer-aided processing device and method, storage medium, and terminal WO2019136750A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/072662 WO2019136750A1 (en) 2018-01-15 2018-01-15 Artificial intelligence-based computer-aided processing device and method, storage medium, and terminal
CN201880002144.7A CN109313663B (en) 2018-01-15 2018-01-15 Artificial intelligence calculation auxiliary processing device, method, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/072662 WO2019136750A1 (en) 2018-01-15 2018-01-15 Artificial intelligence-based computer-aided processing device and method, storage medium, and terminal

Publications (1)

Publication Number Publication Date
WO2019136750A1 true WO2019136750A1 (en) 2019-07-18

Family

ID=65221779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072662 WO2019136750A1 (en) 2018-01-15 2018-01-15 Artificial intelligence-based computer-aided processing device and method, storage medium, and terminal

Country Status (2)

Country Link
CN (1) CN109313663B (en)
WO (1) WO2019136750A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111257913A (en) * 2019-11-29 2020-06-09 交通运输部长江通信管理局 Beidou satellite signal capturing method and device

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11481471B2 (en) * 2019-08-16 2022-10-25 Meta Platforms, Inc. Mapping convolution to a matrix processor unit
CN112825151A (en) * 2019-11-20 2021-05-21 上海商汤智能科技有限公司 Data processing method, device and equipment
CN114730331A (en) * 2019-12-18 2022-07-08 华为技术有限公司 Data processing apparatus and data processing method
CN111553224A (en) * 2020-04-21 2020-08-18 中国电子科技集团公司第五十四研究所 Large remote sensing image block distribution method
CN112561943B (en) * 2020-12-23 2022-11-22 清华大学 Image processing method based on data multiplexing of pulse array convolution operation
CN117574036B (en) * 2024-01-16 2024-04-12 北京壁仞科技开发有限公司 Computing device, method of operation, and machine-readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134571A (en) * 1998-04-29 2000-10-17 Hewlett-Packard Company Implicit DST-based filter operating in the DCT domain
CN1374759A (en) * 2001-03-09 2002-10-16 华为技术有限公司 High-efficiency convolution coding method
CN104574277A (en) * 2015-01-30 2015-04-29 京东方科技集团股份有限公司 Image interpolation method and image interpolation device
CN107301668A (en) * 2017-06-14 2017-10-27 成都四方伟业软件股份有限公司 A kind of picture compression method based on sparse matrix, convolutional neural networks

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100425017C (en) * 2005-12-08 2008-10-08 西安电子科技大学 Encoder of parallel-convolution LDPC code based on precoding and its fast encoding method
CN101192833B (en) * 2006-11-28 2011-12-07 华为技术有限公司 A device and method for low-density checksum LDPC parallel coding
CN103038660B (en) * 2010-03-23 2016-03-02 马克思-普朗克科学促进协会 Utilize the method and apparatus that the sequence of canonical non-linear inversion process of reconstruction to MR image is rebuild
CN104104394A (en) * 2014-06-13 2014-10-15 哈尔滨工业大学 Signal reconstruction method for acquiring random demodulation system perception matrix based on MLS sequence and system thereof
CN105334542B (en) * 2015-10-23 2017-07-07 中南大学 Any Density Distribution complex geologic body gravitational field is quick, high accuracy forward modeling method
CN106447030B (en) * 2016-08-30 2021-09-21 深圳市诺比邻科技有限公司 Method and system for optimizing computing resources of convolutional neural network
CN107451654B (en) * 2017-07-05 2021-05-18 深圳市自行科技有限公司 Acceleration operation method of convolutional neural network, server and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6134571A (en) * 1998-04-29 2000-10-17 Hewlett-Packard Company Implicit DST-based filter operating in the DCT domain
CN1374759A (en) * 2001-03-09 2002-10-16 华为技术有限公司 High-efficiency convolution coding method
CN104574277A (en) * 2015-01-30 2015-04-29 京东方科技集团股份有限公司 Image interpolation method and image interpolation device
CN107301668A (en) * 2017-06-14 2017-10-27 成都四方伟业软件股份有限公司 A kind of picture compression method based on sparse matrix, convolutional neural networks

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111257913A (en) * 2019-11-29 2020-06-09 交通运输部长江通信管理局 Beidou satellite signal capturing method and device

Also Published As

Publication number Publication date
CN109313663A (en) 2019-02-05
CN109313663B (en) 2023-03-31

Similar Documents

Publication Publication Date Title
WO2019136750A1 (en) Artificial intelligence-based computer-aided processing device and method, storage medium, and terminal
US20200234124A1 (en) Winograd transform convolution operations for neural networks
CN107862650B (en) Method for accelerating calculation of CNN convolution of two-dimensional image
CN111247527B (en) Method and device for determining characteristic images in convolutional neural network model
CN109416755B (en) Artificial intelligence parallel processing method and device, readable storage medium and terminal
GB2554711A (en) Buffer addressing for a convolutional neural network
WO2019136764A1 (en) Convolutor and artificial intelligent processing device applied thereto
TW201942808A (en) Deep learning accelerator and method for accelerating deep learning operations
WO2019127517A1 (en) Data processing method and device, dma controller, and computer readable storage medium
CN110989920B (en) Energy efficient memory system and method
WO2020199476A1 (en) Neural network acceleration method and apparatus based on pulsation array, and computer device and storage medium
US11164032B2 (en) Method of performing data processing operation
EP3093757A2 (en) Multi-dimensional sliding window operation for a vector processor
WO2019184888A1 (en) Image processing method and apparatus based on convolutional neural network
CN109313723B (en) Artificial intelligence convolution processing method and device, readable storage medium and terminal
WO2024027039A1 (en) Data processing method and apparatus, and device and readable storage medium
JP2023541350A (en) Table convolution and acceleration
US11874898B2 (en) Streaming-based artificial intelligence convolution processing method and apparatus, readable storage medium and terminal
JP7332722B2 (en) Data processing method, device, storage medium and electronic equipment
CN110738317A (en) FPGA-based deformable convolution network operation method, device and system
CN112016522B (en) Video data processing method, system and related components
CN106909320B (en) Method, device and system for expanding and transmitting multidimensional data
US20220405349A1 (en) Data processing method and apparatus, and related product
KR102470027B1 (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN114764615A (en) Convolution operation implementation method, data processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18899117

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18899117

Country of ref document: EP

Kind code of ref document: A1