WO2018077295A1 - 一种卷积神经网络的数据处理方法和装置 - Google Patents

一种卷积神经网络的数据处理方法和装置 Download PDF

Info

Publication number
WO2018077295A1
WO2018077295A1 PCT/CN2017/108468 CN2017108468W WO2018077295A1 WO 2018077295 A1 WO2018077295 A1 WO 2018077295A1 CN 2017108468 W CN2017108468 W CN 2017108468W WO 2018077295 A1 WO2018077295 A1 WO 2018077295A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
matrix
expanded
buffer space
preset
Prior art date
Application number
PCT/CN2017/108468
Other languages
English (en)
French (fr)
Inventor
张阳明
高剑林
章恒
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018077295A1 publication Critical patent/WO2018077295A1/zh
Priority to US16/250,204 priority Critical patent/US11222240B2/en
Priority to US17/522,891 priority patent/US11593594B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding

Definitions

  • the present invention relates to the field of neural network technologies, and in particular, to a data processing method and apparatus for a convolutional neural network.
  • Neural networks and deep learning algorithms have been very successful applications and are in the process of rapid development. It is widely expected in the industry that this new way of computing will help achieve more general and more complex smart applications.
  • CNN Convolutional Neural Network
  • the convolution operation of convolutional neural networks is mainly concentrated in the convolutional layer.
  • the convolution operation of convolutional neural networks can be divided into two processes: data expansion and matrix multiplication.
  • some data will be read repeatedly in the data expansion process of the convolutional neural network, which is likely to cause an increase in the data bandwidth or an increase in the storage space required for the convolution operation, and reduce the convolutional neural network processing system.
  • Data processing capabilities are mainly concentrated in the convolutional layer.
  • Embodiments of the present invention provide a data processing method and apparatus for a convolutional neural network and a non-transitory computer readable storage medium, which can improve data processing capability of a convolutional neural network processing system force.
  • An embodiment of the present invention provides a data processing method for a convolutional neural network, which is applied to a computing device, including performing the following steps on a processor or a coprocessor of the computing device:
  • Embodiments of the present invention also provide a data processing apparatus for a convolutional neural network, including one or more processors and one or more non-volatile storage media, the one or more non-volatile storage media storing one Or a plurality of computer readable instructions configured to be executed by the one or more processors; the one or more computer readable instructions comprising:
  • An obtaining unit configured to obtain a matrix parameter of the feature matrix
  • a reading unit configured to read corresponding data in the image data matrix from the first buffer space by using the first bus according to the matrix parameter, to obtain a data matrix to be expanded;
  • a saving unit configured to send the to-be-expanded data matrix to a second preset buffer space by using a second bus
  • a data expansion unit configured to read the data matrix to be expanded from the second preset buffer space by using a second bus, and perform data expansion on the data matrix to be expanded according to the matrix parameter to obtain expanded data;
  • an updating unit configured to read, by using the first bus, a corresponding quantity of unexpanded data in the image data matrix from the first buffer space, and send the unexpanded data to the second preset buffer space through the second bus and save Updating the to-be-expanded data matrix saved in the second preset buffer space according to the unexpanded data, and triggering the data expansion unit to perform reading from the second preset buffer space by using the second bus And the step of expanding the data matrix according to the matrix parameter and expanding the data matrix according to the matrix parameter.
  • Embodiments of the present invention provide a non-transitory computer readable storage medium storing computer readable instructions that enable at least one processor to perform a data processing method of a convolutional neural network as described above.
  • FIG. 1 is a flowchart of a data processing method of a convolutional neural network according to an embodiment of the present invention
  • FIGS. 2a to 2d are schematic diagrams showing data sliding deployment according to an embodiment of the present invention.
  • 3a to 3i are schematic diagrams of data expansion of a convolutional neural network
  • FIG. 4 is a schematic diagram of matrix multiplication of a convolutional neural network
  • FIG. 5 is a schematic structural diagram of a convolutional neural network processing system according to an embodiment of the present invention.
  • Figure 6a is a schematic diagram of data expansion of a convolutional neural network on a CPU
  • Figure 6b is a schematic diagram of data expansion of a convolutional neural network on an FPGA
  • FIG. 7a to 7c are schematic diagrams showing data expansion of a convolutional neural network according to an embodiment of the present invention.
  • FIG. 8a is another flowchart of a data processing method of a convolutional neural network according to an embodiment of the present invention.
  • FIG. 8b is a schematic diagram of reading and writing of a ring buffer according to an embodiment of the present invention.
  • FIG. 8c is a schematic structural diagram of a service scenario based on a convolutional neural network according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a data processing apparatus of a convolutional neural network according to an embodiment of the present invention.
  • FIG. 9b is another schematic structural diagram of a data processing apparatus of a convolutional neural network according to an embodiment of the present invention.
  • FIG. 9c is a schematic structural diagram of a coprocessor according to an embodiment of the present invention.
  • Embodiments of the present invention provide a data processing method and apparatus for a convolutional neural network. The details are described below separately.
  • a data processing apparatus of a convolutional neural network which may be integrated in a processor of a computing device, which may be a CPU or integrated in an FPGA (Field Programmable Gate Array, Field Programmable Gate Array), ASIC (Application Specific Integrated Circuit), GPU (Graphics Processing unit, graphics processor, etc.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • GPU Graphics Processing unit
  • a data processing method for a convolutional neural network obtains a matrix parameter of a feature matrix, and then reads corresponding data in an image data matrix according to the matrix parameter to obtain a data matrix to be expanded, and performs data on the data matrix to be expanded according to the matrix parameter Expanding, obtaining the expanded data, reading a preset number of unexpanded data in the image data matrix, and updating the to-be-expanded data matrix according to the unexpanded data, and returning to perform the step of expanding the data matrix to be expanded according to the matrix parameter.
  • a data processing method of a convolutional neural network is applied to a computing device, and a specific process performed by a processor or a coprocessor of the computing device may be as follows:
  • Step 101 Obtain a matrix parameter of the feature matrix.
  • the feature matrix is a convolution kernel of a convolution operation, also called a weight matrix, which can be set according to actual needs.
  • the matrix parameter of the feature matrix may include the number of matrix rows and columns, which may be referred to as the size of the convolution kernel.
  • Step 102 Read, according to the matrix parameter, the corresponding data in the image data matrix from the first buffer space through the first bus, obtain a data matrix to be expanded, and send the to-be-expanded data matrix to the second pre-via through the second bus. Set the buffer space and save it.
  • the elements in the image data matrix are pixel data corresponding to the image pixels, such as the processed pixel values.
  • the number of rows and columns of the image data matrix represents the size of the image.
  • the image data matrix may be stored in an acceleration card of a convolutional neural network processing system, for example, stored in a DDR (Double Rate Synchronous Dynamic Random Access Memory) memory of the acceleration card.
  • the first bus is a bus that is connected between the processor or the coprocessor and the DDR. That is, "reading corresponding data in the image data matrix from the first buffer space through the first bus according to the matrix parameter" in step 102 may include: a bus connected between the DDR and the DDR according to the matrix parameter. The corresponding data in the image data matrix is read from the DDR.
  • a matrix of corresponding row numbers or column numbers in the image data matrix can be read according to the matrix parameters.
  • the number of rows read may correspond to the number of rows of the feature matrix, or the number of columns read may correspond to the number of columns of the feature matrix.
  • the K-row data in the N*N image data matrix can be read to obtain a K*N data matrix to be expanded; the K and N Is a positive integer and K ⁇ N.
  • the starting position of the read data can be set according to actual needs.
  • the K line data can be read from the first line of the image data matrix, or the K line data can be read from the second line.
  • the M column data in the N*N image data matrix can be read to obtain an N*M data matrix to be expanded, and the M is A positive integer, and M ⁇ N.
  • the data matrix to be expanded may be sent to the second preset buffer space through the second bus and saved.
  • the second preset buffer space may be a preset buffer.
  • the preset buffer may be a buffer or DDR in the coprocessor, and the second bus is between the processor or the coprocessor and the preset buffer. Connected bus.
  • Step 103 Read the data matrix to be expanded from the second preset buffer space by using a second bus, and perform data expansion on the data matrix to be expanded according to the matrix parameter, to obtain expanded data.
  • data may be expanded according to the number of rows and columns of the feature matrix, and a plurality of data groups may be obtained after the expansion.
  • a data matrix may be formed according to the data group, that is, the expanded data. matrix. Subsequent matrix multiplication according to the expanded data matrix and the feature matrix can be used to obtain corresponding data, and the convolution operation of the data is completed.
  • the K*N data matrix to be expanded may be expanded according to the number of rows and columns of the feature matrix.
  • the “data expansion of the data matrix to be expanded according to the matrix parameter” in step 103 may include:
  • Data expansion is performed on the data matrix to be expanded according to the matrix parameter and the storage address of the data of the data matrix to be expanded in the second preset buffer space.
  • the K*N data matrix to be expanded is written into the second preset buffer space, and then, according to the number of rows and columns of the K*K feature matrix and the data in the K*N to-be-expanded data matrix, in the second preset buffer space.
  • the storage address is used to perform data expansion on the K*N data matrix to be expanded.
  • the data matrix to be expanded may be subjected to sliding data expansion. Specifically, the window is swept on the data matrix to be expanded, and then the data in the window after each sliding is expanded. After the expansion, several data groups can be obtained. That is, the step of “expanding the data matrix to be expanded according to the matrix parameter and the storage address of the data of the data matrix to be expanded in the preset second buffer space” may include:
  • a sliding window of a corresponding size may be determined according to the row and column data of the feature matrix. For example, when the matrix of the feature matrix is K*K, a sliding window of K*K may be determined. The sliding window is used to select corresponding data from the data matrix to be expanded for expansion.
  • the preset sliding direction may include: a row direction, a column direction, and the like of the image data matrix.
  • the preset sliding direction may correspond to the data reading mode of step 102.
  • the preset sliding direction when reading a plurality of rows of data of the image data matrix, the preset sliding direction may be a row direction of the image data matrix;
  • the preset sliding direction when reading a plurality of columns of data of the image data matrix, the preset sliding direction may be a column direction of the image data matrix.
  • the preset sliding step is a distance that needs to be slid, and the demand setting can be expanded according to the actual data, which can be represented by the number of data that needs to be slid on the data matrix.
  • the preset sliding step size is 1, 2, 3 data, and the like.
  • the window After obtaining the preset sliding step, the preset sliding direction, and the sliding window, the window can be swept on the data matrix to be expanded with the preset sliding step along the preset sliding direction, and the window can be acquired after each sliding window.
  • the address of the data in the second preset buffer space, and then, according to the address and the preset reading order, the corresponding data is read from the preset buffer to complete the data expansion, that is, the data is read by using the hop address method. Expand.
  • the data sliding matrix is taken as an example.
  • the image data matrix is assumed to be a matrix of 5*5 and the feature matrix is 3*3.
  • three rows of data are read from the 5*5 image data matrix to obtain a 3*5 to-be-expanded data matrix, that is, the matrix in FIG. 2b to FIG. 2d, and written into the second preset buffer space, and then according to the 3*3 feature matrix.
  • the number of rows and columns determines the sliding window, ie the dashed box in Figures 2b - 2d.
  • the sliding window can be slid over the 3*5 to-be-expanded data matrix with a sliding step of data in the row direction, i.e., sliding the sliding window from left to right.
  • the storage address of the data in the sliding window in the second preset buffer space may be acquired, and then the address is jumped from the second preset buffer space according to the storage address.
  • the corresponding data is read to obtain a data group (11, 12, 13, 21, 22, 23, 31, 32, 33), which is called a first data group.
  • the window is slid in a row by a sliding step of data, and then the storage address of the data in the sliding window is obtained in the second preset buffer space, and according to the storage address Jump address from the first
  • the corresponding data is read in the two preset buffer spaces, and the data group (12, 13, 14, 22, 23, 24, 32, 33, 34) is obtained, which is called the second data group.
  • the window is further slid in the row direction by a sliding step of data, and then the storage address of the data in the sliding window is obtained in the second preset buffer space, and according to the storage
  • the address hop address reads the corresponding data from the second preset buffer space, and obtains the data group (13, 14, 15, 23, 24, 25, 33, 34, 35), which is called the third data group, and thus the completion is completed.
  • the data expansion of the 3*5 data matrix to be expanded is performed.
  • the initial position of the sliding window on the data matrix to be expanded in this embodiment may be set according to actual requirements. For example, referring to FIG. 2b, the first column of the data matrix to be expanded may be started to slide. In other embodiments, It is also possible to start sliding in the second or third column of the data matrix to be expanded.
  • the sliding window in the case of reading the data of the corresponding column from the image data matrix to form the data matrix to be expanded, it is also possible to determine the sliding window according to the number of rows and columns of the feature matrix, and then to slide along the column direction of the data matrix to be expanded.
  • the step slides the window, and after each sliding window, acquires a storage address of the data in the window in the second preset buffer space, and reads corresponding data from the second preset buffer space based on the storage address.
  • the data sliding development process is similar to the data sliding deployment described in the foregoing embodiment. For details, refer to FIG. 2a-2d, and details are not described herein again.
  • Step 104 Read, by using the first bus, a preset number of unexpanded data in the image data matrix from the first buffer space, and send the unexpanded data to the second preset buffer space through the second bus and save the data. And updating the to-be-expanded data matrix saved in the second preset buffer space according to the unexpanded data, and returning to step 103.
  • a preset number of unexpanded data in the image data matrix is read from the first buffer space through the first bus, and the read unexpanded data is sent to the second preset buffer space through the second bus and saved. And updating the to-be-expanded data matrix saved in the second preset buffer space according to the unexpanded data.
  • the number of unexpanded data can be set according to actual needs, such as 1, 5, 1 row, 2 rows, or 1 column, 2 columns, and the like.
  • a preset amount of undeployed data in the image data matrix may be read from the first buffer space through the first bus based on the convolution step.
  • the convolution step represents the number of rows or columns of unexpanded data that need to be read from the image data matrix after the expanded data matrix is expanded.
  • the image data matrix as the matrix of N*N and the matrix whose feature matrix is K*K as an example
  • the N*N image data matrix reads a certain amount of data, such as unexpanded data of the corresponding number of rows or columns.
  • the data matrix to be expanded saved in .
  • the K+1th row data is read in the data matrix, and the to-be-expanded data matrix stored in the second preset buffer space is updated according to the K+1th row data.
  • the updating the to-be-expanded data matrix saved in the second preset buffer space according to the unexpanded data in step 104 may include: emptying from the second preset buffer by using the second bus
  • the unexpanded data is read in the middle, and a preset number of target data is selected from the unexpanded data, and the corresponding data in the to-be-expanded data matrix saved in the second preset buffer space is covered according to the target data. For example, when reading at least two rows or two columns of unexpanded data in the image data matrix, one row or one column of data may be selected from the two rows of data to update the data matrix to be expanded.
  • the data matrix to be expanded may be directly updated according to the unexpanded data, for example, the preset number corresponding to the target data is one row.
  • the data matrix to be expanded can be directly updated according to the row data after reading one row of data in the image data matrix. For example, after reading the K+1th row data, the data matrix to be expanded is directly updated according to the K+1th row data.
  • the manner of updating the data matrix to be expanded may include a data coverage manner, that is, covering corresponding data in the data matrix to be expanded according to the selected target data to complete the update.
  • the data processing method of this embodiment since some data in the data matrix to be expanded that has been saved to the second preset buffer space can be multiplexed, it is only necessary to read the preset number in the image data matrix from the first buffer space.
  • the unexpanded data that is, the data that is not saved to the second preset buffer space, avoids repeated reading of some data during data expansion, and reduces the storage space in the convolutional neural network processing system.
  • the unexpanded data is sent to the second preset buffer space through the second bus, thereby reducing the amount of data transmission and saving the first
  • the transmission bandwidth of the bus and the second bus further increases the data processing capability of the processing system.
  • FIGS. 3a-3i are processes for data expansion using a data processing method. After the data is expanded, the expanded data matrix and the convolution kernel multiplication can be performed with reference to FIG. 4 to complete the convolution operation.
  • the specific implementation process of the data expansion mode shown in FIG. 3a to FIG. 3i is illustrated by taking the convolutional neural network processing system shown in FIG. 5 as an example.
  • the processing system includes a coprocessor, a CPU and a memory of the server, and a DDR memory on the acceleration card; the CPU and the coprocessor of the server generally pass PCI-e (Peripheral Component Interconnect Express, bus and interface standard)
  • PCI-e Peripheral Component Interconnect Express, bus and interface standard
  • the bus is connected for data interaction and command interaction, such as command interaction through CMD path (command path), and data interaction through data path.
  • the coprocessor may be an FPGA or other auxiliary processor, and the coprocessor may include: a DDR controller, an InputBuf (input buffer, an input data buffer unit), an OutputBuf (output buffer, an output data buffer unit), and a PE (Processing Element). , processing unit).
  • the PE is the unit in the coprocessor that is used to complete the data convolution.
  • the data expansion in current convolution operations can be done in the CPU or coprocessor of the processing system. as follows:
  • data expansion is accomplished by the CPU of the system.
  • the CPU data expansion scheme includes: the processing system CPU expands the data by using the method of FIG. 3a to FIG. 3i, and after the complete expansion, the expanded data is stored in the CPU memory and transmitted to the acceleration card through PCI-e DMA.
  • the coprocessor loads the data from the DDR RAM on the accelerator card to the PE processing unit through the load logic.
  • FIGS. 3a-3i the use of the data processing method shown in FIGS. 3a-3i in the CPU of the system results in lower data expansion efficiency and increased data transmission, so that the required read bandwidth of PCI-e and DDR will be Will increase, reducing the processing power of the system.
  • the data processing method shown in FIGS. 2b to 2d of the present embodiment is applied to the CPU of the system, the data expansion efficiency can be improved since it is not necessary to repeatedly read the data. If the data processing method shown in FIGS. 2b to 2d of this embodiment is applied to the coprocessor of the system, the amount of data transmission can be reduced, and the required transmission bandwidth of PCI-e and DDR can be reduced.
  • data expansion is accomplished by a coprocessor of the system.
  • the solution for data expansion in the coprocessor includes: storing unexpanded data in a server (server) memory, an acceleration card DDR memory, and an FPGA.
  • the FPGA expands the data in the manner of FIGS. 3a-3i.
  • the scheme performs data expansion in the manner shown in FIG. 3a to FIG. 3i, and some data is repeatedly read, resulting in low efficiency of data expansion, increased DDR data transmission, and consumption of a large number of FPGAs.
  • On-chip memory unit On-chip memory unit.
  • the data processing device is integrated in the coprocessor.
  • the image data matrix is stored in the DDR memory of the accelerator card.
  • the specific data expansion process is as follows:
  • the sliding window may be determined according to the row and column data of the feature matrix, and then the sliding window is performed in a sliding step of one data along the row direction of the data matrix to be expanded, and the memory in the memory is based on the data in the window after each sliding window
  • the storage address reads the corresponding data from the hop address in the memory to implement data expansion.
  • the data expansion schemes shown in FIG. 7a to FIG. 7c can improve the efficiency of data expansion; in addition, the scheme provided by the embodiment of the present invention performs data expansion in the coprocessor and uses multiplexing read. The way data is expanded, so the amount of read data can be reduced, which reduces the need for PCI-e and DDR read bandwidth and improves the processing power of the system.
  • the data expansion scheme shown in FIG. 7a to FIG. 7c can improve the efficiency of data expansion, save the storage space of the coprocessor, reduce the requirement for the data to expand the corresponding storage space, and improve the processing capability of the system.
  • the method of the embodiment can read a preset amount of data from the image data matrix according to the fixed data size; that is, the read image data matrix in step 104.
  • the preset number of unexpanded data in can include:
  • the first predetermined data amount may be set according to actual requirements, such as 8Kbyte, 16Kbyte, etc., and the unexpansion data of the first predetermined data amount may be referred to as a data packet.
  • the first preset data amount may be set based on the row data of the image data matrix or the data amount of the column data, for example, may be an integral multiple of the data amount of the row data or the column data.
  • the embodiment of the present invention can perform data reading and loading when the remaining space of the preset buffer space is sufficient to load a new packet, that is, the step “reads according to the first predetermined data amount.
  • the preset number of unexpanded data in the image data matrix may include:
  • a preset number of unexpanded data in the image data matrix is read according to the first predetermined data amount.
  • each loaded packet is larger than the data amount of one row or one column of the image data matrix. Therefore, after loading a new packet, a certain amount of target data can be selected from the packet to be updated. Expand the data matrix. That is, the step of "updating the data matrix to be expanded according to the unexpanded data" may include:
  • the first predetermined data amount is the data amount of 8 data, that is, when one packet contains data in 8 image data matrices.
  • 8 unexpanded data can be read from the image data matrix, assuming [41, 42, 43, 44, 45, 51, 52, 53] in Fig. 7a, and then, from 8
  • the target data in the same row is selected from the unexpanded data, that is, [41, 42, 43, 44, 45], and the data matrix to be expanded is updated according to the target data.
  • the method in this embodiment can perform data expansion when the data currently buffered in the preset buffer space is sufficient for data expansion, that is, the step “predetermined according to the matrix parameter and the data of the data matrix to be expanded.
  • the storage address in the buffer space, and the data expansion of the data matrix to be expanded may include:
  • the data to be expanded is expanded according to the matrix parameter and the storage address of the data of the data matrix to be expanded in the preset buffer space.
  • the second predetermined amount of data may be determined according to the number of rows and columns of the feature matrix and the image data matrix. For example, a matrix in which the image data matrix is N*N and a matrix in which the feature matrix is K*K is taken as an example.
  • the amount of data can be the amount of data of K*N data.
  • the embodiment of the present invention adopts the matrix parameter of the feature matrix, and then reads the corresponding data in the image data matrix according to the matrix parameter to obtain a data matrix to be expanded, and performs data expansion on the data matrix to be expanded according to the matrix parameter.
  • Obtaining the expanded data reading a preset number of unexpanded data in the image data matrix, and updating the to-be-expanded data matrix according to the unexpanded data, and returning to perform the step of expanding the data matrix to be expanded according to the matrix parameter.
  • the program can reuse the read image data to realize data expansion, avoiding repeated reading of some data, and reducing the requirement of data bandwidth or storage space for convolutional neural network data expansion, thereby improving Convolutional neural network processing system data processing capabilities and data deployment efficiency.
  • the data processing device of the convolutional neural network is integrated into the coprocessor of the computing device, and the system architecture shown in FIG. 5 is taken as an example for description.
  • the coprocessor can be an FPGA, an ASIC, or other type of coprocessor.
  • the image data matrix is stored in the processing system DDR memory.
  • a data processing method for a convolutional neural network may be as follows:
  • Step 201 The coprocessor acquires system parameters, where the system parameters include matrix parameters of the feature matrix.
  • the matrix parameter can include the number of rows and columns of the feature matrix.
  • the system parameters in this embodiment may further include the number of rows and columns of the graphics data matrix, the predetermined data amount B, the predetermined data amount A, the sliding direction, the sliding step, and the like.
  • Step 202 The coprocessor reads the data of the corresponding row number from the DDR memory according to the matrix parameter of the feature matrix, and obtains the data matrix Q to be expanded.
  • the K line data of the N*N image data matrix is read from the DDR memory to obtain the K*N data matrix Q to be expanded.
  • the 1-K line data of the N*N image data matrix can be read.
  • the FPGA can read the data of the 1-3th row from the 5*5 image data matrix to form a 3*5 data matrix to be expanded Q. .
  • Step 203 The coprocessor writes the data matrix Q to be expanded into the buffer of the coprocessor.
  • the FPGA writes the 3*5 data matrix Q to be expanded into a buffer in the FPGA.
  • Step 204 When the data volume currently buffered by the coprocessor is greater than the predetermined data amount A, the coprocessor performs data sliding expansion on the data matrix Q according to the matrix parameter of the feature matrix to obtain the expanded data.
  • the predetermined data amount A may be a data amount of 3*5 data, and may be specifically set according to actual needs.
  • the buffer may be a ring buffer.
  • the ring buffer has two indicators, one LenBufSpaceReady, which is used to indicate the remaining space of the ring buffer or the remaining available capacity; the other is LenBufDataValid, Indicates the amount of cached data currently buffered by the ring buffer.
  • LenBufSpaceReady is decremented by 1 after the data is written, LenBufDataValid is incremented by 1, LenBufSpaceReady is incremented by 1 when the data is expanded, and LenBufDataValid is decremented by 1.
  • the load writing and the unfolding reading of data in this embodiment can be performed in parallel to improve data expansion efficiency.
  • the coprocessor determines that the LenBufDataValid is greater than the predetermined data amount A, the data matrix Q is slid and expanded according to the matrix parameter of the feature matrix; otherwise, the data sliding expansion is not performed.
  • the FPGA can perform data sliding development on the 3*5 data matrix Q to be expanded by the manner shown in FIG. 2b to FIG. 2d. Subsequent data (11, 12, 13, 21, 22, 23, 31, 32, 33), (12, 13, 14, 22, 23, 24, 32, 33, 34), (13, 14, 15, 23, 24, 25, 33, 34, 35).
  • Step 205 When the remaining available capacity of the buffer is greater than the predetermined data amount B, the coprocessor reads a corresponding amount of unexpanded data from the DDR memory according to the predetermined data amount B, and writes the buffer.
  • the coprocessor determines that the LenBufSpaceReady is larger than the predetermined data amount B
  • the unexpanded data of the predetermined data amount B is read from the DDR memory and written into the buffer.
  • the predetermined data amount B is a fixed data amount, that is, a fixed data size, and may be set according to actual needs.
  • the predetermined data amount B may be 8 Kbyte or the like, and the predetermined data amount B may be according to one row or one column in the image data matrix. Data volume setting.
  • the 4th row of unexpanded data [41, 42, 43, 44, 45] can be read and written to the buffer.
  • the coprocessor can be from the DDR memory. Reading one row or one column of unexpanded data of the image data matrix; for example, the unexpanded data of the K+1th row, that is, N unexpanded data can be read.
  • the number of image data corresponding to the predetermined data amount B is greater than N, and Not an integer multiple of N, for example, maybe N+1, and so on.
  • N an integer multiple of N
  • seven unexpanded data [41, 42, 43, 44, 45, 51, 52] can be read into the buffer according to the predetermined data amount B.
  • Step 206 The coprocessor updates the data matrix Q to be expanded according to the written unexpanded data, and returns to step 204.
  • the data matrix Q to be expanded may be updated based on the unexpanded data of the K+1th row, for example, according to the data covering matrix Q of the K+1 row.
  • One line of data may be used to update the unexpanded data of the K+1th row.
  • N+1 unexpanded data may be selected, and then the first row of data of the matrix Q is covered according to the selected N unexpanded data.
  • the data processing method of the embodiment of the present invention is applied to all services that can be implemented by a heterogeneous processing system with an FPGA as a coprocessor or a pure CPU processing system, for example, the method can be applied to a business scenario targeting pornographic image detection and filtering.
  • FIG. 8c it is generally implemented by an open source deep learning platform such as Caffe and Tensor Flow.
  • the learning platform calls the BLAS library (Basic Linear Algebra Subprograms) for matrix operations. These matrix operations are performed in a pure CPU processing system. Calculated by the CPU; in heterogeneous processing systems, matrix operations can be offloaded to the FPGA for computation (typically through PCI-e interaction).
  • the CPU and FPGA interact with each other by sharing DDR RAM.
  • the embodiment of the present invention uses a coprocessor to obtain a matrix parameter of the feature matrix, and then reads corresponding data in the image data matrix according to the matrix parameter to obtain a data matrix to be expanded, and performs a data matrix to be expanded according to the matrix parameter.
  • the data is expanded to obtain the expanded data, the corresponding number of unexpanded data in the image data matrix is read, and the to-be-expanded data matrix is updated according to the unexpanded data, and the step of performing expansion according to the matrix parameter to be expanded is performed.
  • the scheme can multiplex the read image data during the convolution process to realize data expansion and avoidance. Eliminating the repeated reading of certain data reduces the need for data bandwidth or storage space for convolutional neural network data expansion. Therefore, the data processing capability and data expansion efficiency of the convolutional neural network processing system can be improved.
  • the embodiment of the present invention further provides a data processing device for a convolutional neural network, which may be integrated into a processor of a computing device, which may be an FPGA, an ASIC, or a GPU. Or other types of coprocessors.
  • the data processing apparatus of the convolutional neural network may include an obtaining unit 301, a reading unit 302, a data expanding unit 303, and an updating unit 304, as follows:
  • the obtaining unit 301 is configured to acquire a matrix parameter of the feature matrix.
  • the feature matrix is a convolution kernel of a convolution operation, also called a weight matrix, which can be set according to actual needs.
  • the matrix parameter of the feature matrix may include the number of matrix rows and columns, which may be referred to as the size of the convolution kernel.
  • the reading unit 302 is configured to read corresponding data in the image data matrix from the first buffer space through the first bus according to the matrix parameter, to obtain a data matrix to be expanded.
  • the elements in the image data matrix are pixel data corresponding to the image pixels, such as the processed pixel values.
  • the number of rows and columns of the image data matrix represents the size of the image.
  • the reading unit 302 can be configured to read a matrix of corresponding row numbers or column numbers in the image data matrix from the first buffer space by using the first bus according to the matrix parameter.
  • the number of rows read may correspond to the number of rows of the feature matrix, or the number of columns read may correspond to the number of columns of the feature matrix.
  • the saving unit 305 is configured to send the to-be-expanded data matrix to the second preset buffer space through the second bus and save the data to be expanded after the data expansion unit 303 expands the data. .
  • a data expansion unit 303 configured to read the data matrix to be expanded from the second preset buffer space by using a second bus, and perform data exhibition on the data matrix to be expanded according to the matrix parameter Open, get the expanded data.
  • the data expansion unit 303 is configured to perform data sliding expansion on the data matrix to be expanded according to the matrix parameter.
  • the data expansion unit is configured to perform data expansion on the data matrix to be expanded according to the matrix parameter and the storage address of the data of the data matrix to be expanded in the second preset buffer space.
  • the data expansion unit 303 may include:
  • a sliding subunit configured to slide the sliding window on the data matrix to be expanded according to the preset sliding direction and the preset sliding step
  • An address obtaining subunit configured to acquire, after each sliding, a storage address of the data in the sliding window in the second preset buffer space
  • the reading subunit is configured to read out corresponding data from the second preset buffer space according to the storage address to complete data expansion.
  • the determining subunit may be used to determine a sliding window of a corresponding size according to the row and column data of the feature matrix. For example, when the feature matrix is a matrix of K*K, a sliding window of K*K may be determined. The sliding window is used to select corresponding data from the data matrix to be expanded for expansion.
  • the preset sliding direction may include: a row direction, a column direction, and the like of the image data matrix.
  • the preset sliding step is a distance that needs to be slid, and the demand setting can be expanded according to the actual data, which can be represented by the number of data that needs to be slid on the data matrix.
  • the preset sliding step size is 1, 2, 3 data, and the like.
  • the sliding subunit may be specifically configured to slide the window on the data matrix to be expanded with a preset sliding step along a preset sliding direction.
  • the initial position of the sliding window on the data matrix to be expanded may be set according to actual requirements. For example, referring to FIG. 2b, the first column of the data matrix to be expanded may be started to slide. In other embodiments, Second of the data matrix to be expanded Or the three columns start to slide.
  • the updating unit 304 is configured to read, by using the first bus, a preset number of unexpanded data in the image data matrix from the first buffer space, and send the unexpanded data to the second preset buffer space by using the second bus And saving, updating the to-be-expanded data matrix saved in the second preset buffer space according to the unexpanded data, and triggering the data expansion unit 303 to perform reading from the second preset buffer space by using the second bus The step of expanding the data matrix and expanding the data matrix according to the matrix parameter.
  • the update unit 304 can include:
  • a reading subunit configured to read a preset number of unexpanded data in the image data matrix from the first buffer space through the first bus according to the first predetermined data amount, and send the unexpanded data to the second bus through the second bus
  • the second preset buffer space is saved and saved
  • An update subunit configured to update the to-be-expanded data matrix saved in the second preset buffer space according to the unexpanded data
  • a triggering subunit configured to trigger the data expansion unit 303 to perform the reading of the to-be-expanded data matrix from the second preset buffer space by using the second bus after the update subunit updates the data matrix to be expanded.
  • reading subunit is specifically used for:
  • the update subunit is specifically used to:
  • the data expansion unit 303 can be specifically configured to:
  • the foregoing units may be implemented as a separate entity, or may be implemented in any combination, and may be implemented as the same or a plurality of entities.
  • the foregoing method embodiments and details are not described herein.
  • the function of the obtaining unit 301 in the actual application may be implemented by the data expansion controller, and the function of the reading unit 302 may be implemented by the data expansion controller and the DDR read data controller, and the function of the data expansion unit 303 may be performed by the data expansion controller.
  • the data scan controller and the address generator are implemented, and the function of the update unit 304 can be implemented by a data expansion controller and a DDR read data controller.
  • the embodiment further provides a coprocessor, including: a data expansion controller 401, a DDR read data controller 402, a data buffer unit 403, a data scan controller 404, an address generator 405, and a processing unit ( PE) 406.
  • a coprocessor including: a data expansion controller 401, a DDR read data controller 402, a data buffer unit 403, a data scan controller 404, an address generator 405, and a processing unit ( PE) 406.
  • the data expansion controller 401 is configured to obtain a matrix parameter of the feature matrix, and control the DDR read data controller 402 to read corresponding data in the image data matrix according to the matrix parameter, obtain a data matrix to be expanded, and write the data matrix to be expanded.
  • the data buffer unit 403 is entered.
  • the data expansion controller 401 is further configured to control the data scan controller 404 and the address generator 405 to perform data expansion on the data matrix to be expanded according to system parameters (such as matrix parameters of the feature matrix) to obtain expanded data; and control DDR read data.
  • the controller 402 reads a preset number of unexpanded data in the image data matrix, and controls the DDR read data controller 402 to update the to-be-expanded data matrix according to the unexpanded data, and triggers the data scan controller 404 and the address generator 405 to treat Expand the data matrix to expand.
  • the data expansion controller 401 can control the data scan controller 404 and the address generator 405 to expand the data matrix according to system parameters (such as matrix parameters of the feature matrix) and the state of the data cache unit 403 (such as the amount of data currently cached). Perform data expansion.
  • the data expansion controller 401 can also control the DDR read data controller 402 to read a predetermined amount of undeployed data in the image data matrix based on the state of the data cache unit 403 (e.g., remaining available capacity).
  • the DDR read data controller 402 is configured to read corresponding data in the image data matrix under the control of the data expansion controller 401, obtain a data matrix to be expanded, and read a preset number of unexpanded data in the image data matrix, according to the The data to be expanded is updated by the unexpanded data, and the read data is written to the data buffer unit 403.
  • the data buffer unit 403 is configured to buffer the data read by the DDR read data controller 402 and output the expanded data to the processing unit.
  • the data scanning controller 404 and the address generator 405 are configured to perform data expansion on the expanded data matrix under the control of the data expansion controller 401.
  • a processing unit (PE) 406 is configured to perform multiplication on the expanded data and the feature matrix to implement a convolution operation.
  • the data processing device of the convolutional neural network in this embodiment may be specifically integrated into a coprocessor device such as a CPU or an FPGA, an ASIC, or a GPU.
  • Embodiments of the present invention also provide a data processing apparatus for a convolutional neural network, the apparatus comprising one or more processors and a storage medium.
  • the processor includes a CPU or a coprocessor device such as an FPGA, ASIC, GPU, etc., which may be a non-transitory computer readable storage medium for storing one or more computer readable instructions.
  • the one or more computer readable instructions include an acquisition unit, a read unit, a data expansion unit, and an update unit.
  • the one or more computer readable instructions further comprise a saving unit.
  • the processor is configured to read one or more computer readable instructions stored in the storage medium to implement the steps of the data processing method of the convolutional neural network in the above embodiment and the data processing device in the convolutional neural network Meta function.
  • the obtaining unit 301 acquires the matrix parameter of the feature matrix, and then the reading unit 302 reads the corresponding data in the image data matrix according to the matrix parameter to obtain the data matrix to be expanded, and the data expanding unit.
  • the data expanding unit 303: performing data expansion on the data matrix to be expanded according to the matrix parameter to obtain expanded data, and reading, by the updating unit 304, a corresponding quantity of unexpanded data in the image data matrix, and updating the data matrix to be expanded according to the unexpanded data, and returning The step of expanding the data matrix to be expanded according to the matrix parameter is performed.
  • the program can reuse the read image data to realize data expansion, avoiding repeated reading of some data, and reducing the requirement of data bandwidth or storage space for convolutional neural network data expansion, thereby improving Convolutional neural network processing system data processing capabilities and data deployment efficiency.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: Read Only Memory (ROM), Random Access Memory (RAM), disk or optical disk.
  • ROM Read Only Memory
  • RAM Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)
  • Image Input (AREA)

Abstract

一种卷积神经网络的数据处理方法和装置,该方法应用于计算设备,包括在所述计算设备的处理器或协处理器上执行如下步骤:获取特征矩阵的矩阵参数(101),根据该矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据,得到待展开数据矩阵,并通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存(102);通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据该矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据(103),通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,通过第二总线将所述未展开数据发送至第二预设缓冲空间并保存,并根据该未展开数据更新第二预设缓冲空间中保存的该待展开数据矩阵(104),返回执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据该矩阵参数对待展开数据矩阵进行展开的步骤(103)。

Description

一种卷积神经网络的数据处理方法和装置
本申请要求于2016年10月31日提交中国专利局、申请号为201610933471.7、发明名称为“一种卷积神经网络的数据处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及神经网络技术领域,具体涉及一种卷积神经网络的数据处理方法和装置。
发明背景
神经网络及深度学习算法已经获得了非常成功的应用,并处于迅速发展的过程中。业界普遍预期这种新的计算方式有助于实现更为普遍和更为复杂的智能应用。
其中,卷积神经网络(Convolutional Neural Network,CNN)因为其在图像领域的突出效果,在深度学习中有着重要的位置,是运用最为广泛的神经网络之一。
卷积神经网络的卷积运算主要集中在卷积层,卷积神经网络的卷积运算可以分为数据展开和矩阵乘法两个过程。然而,在卷积神经网络的数据展开过程中有些数据会被重复读取多次,容易造成卷积运算所需的数据带宽的增加或者存储空间的增大,降低了卷积神经网络处理系统的数据处理能力。
发明内容
本发明实施例提供一种卷积神经网络的数据处理方法和装置及非易失性计算机可读存储介质,可以提升卷积神经网络处理系统的数据处理能 力。
本发明实施例提供一种卷积神经网络的数据处理方法,应用于计算设备,包括在所述计算设备的处理器或协处理器上执行如下步骤:
获取特征矩阵的矩阵参数;
根据所述矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据,得到待展开数据矩阵,并通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存;
通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据;
通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,通过第二总线将所述未展开数据发送至第二预设缓冲空间并保存,并根据所述未展开数据更新第二预设缓冲空间中保存的所述待展开数据矩阵;
返回执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行展开的步骤。
本发明实施例还提一种卷积神经网络的数据处理装置,包括一个或一个以上处理器和一个或一个以上非易失性存储介质,所述一个或一个以上非易失性存储介质存储一个或多个计算机可读指令,经配置由所述一个或者一个以上处理器执行;所述一个或一个以上计算机可读指令包括:
获取单元,用于获取特征矩阵的矩阵参数;
读取单元,用于根据所述矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据,得到待展开数据矩阵;
保存单元,用于通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存;
数据展开单元,用于通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据;
更新单元,用于通过第一总线从所述第一缓冲空间中读取图像数据矩阵中相应数量的未展开数据,通过第二总线将所述未展开数据发送至第二预设缓冲空间并保存,根据所述未展开数据更新第二预设缓冲空间中保存的所述待展开数据矩阵,并触发所述数据展开单元执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行展开的步骤。
本发明实施例提供一种非易失性计算机可读存储介质,存储有计算机可读指令,所述计算机可读指令能够使至少一个处理器执行如上所述的卷积神经网络的数据处理方法。
附图简要说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的卷积神经网络的数据处理方法的流程图;
图2a至图2d是本发明实施例提供的数据滑动展开的示意图;
图3a至图3i是卷积神经网络的数据展开的示意图;
图4是卷积神经网络的矩阵乘法的示意图;
图5是本发明实施例提供的卷积神经网络处理系统的架构示意图;
图6a是在CPU上进行卷积神经网络的数据展开的示意图;
图6b是在FPGA上进行卷积神经网络的数据展开的示意图;
图7a至图7c是本发明实施例提供的卷积神经网络的数据展开的示意图;
图8a是本发明实施例提供的卷积神经网络的数据处理方法的另一流程图;
图8b是本发明实施例提供的环形缓冲器的读写示意图;
图8c是本发明实施例提供的基于卷积神经网络的业务场景的架构示意图;
图9a是本发明实施例提供的卷积神经网络的数据处理装置的结构示意图;
图9b是本发明实施例提供的卷积神经网络的数据处理装置的另一结构示意图;
图9c是本发明实施例提供的一种协处理器的结构示意图。
实施本发明的方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明实施例提供一种卷积神经网络的数据处理方法和装置。以下分别进行详细说明。
本实施例将从卷积神经网络的数据处理装置的角度进行描述,该数据处理装置具体可以集成在计算设备的处理器中,该处理器可以为CPU,或者集成在FPGA(Field Programmable Gate Array,现场可编程门阵列)、ASIC(Application Specific Integrated Circuit,专用集成电路)、GPU(Graphics  Processing Unit,图形处理器)等协处理器中。
一种卷积神经网络的数据处理方法,获取特征矩阵的矩阵参数,然后,根据该矩阵参数读取图像数据矩阵中相应的数据,得到待展开数据矩阵,根据该矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据,读取图像数据矩阵中预设数量的未展开数据,并根据该未展开数据更新该待展开数据矩阵,返回执行根据该矩阵参数对待展开数据矩阵进行展开的步骤。
如图1所示,一种卷积神经网络的数据处理方法,应用于计算设备,所述计算设备的处理器或协处理器执行的具体流程可以如下:
步骤101、获取特征矩阵的矩阵参数。
该特征矩阵为卷积运算的卷积核,也称为权重矩阵,该特征矩阵可以根据实际需求设定。其中,特征矩阵的矩阵参数可以包括矩阵行列数,可称为卷积核的尺寸。
步骤102、根据该矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据,得到待展开数据矩阵,并通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存。
该图像数据矩阵中的元素为图像像素对应的像素数据,如经过处理后的像素值。图像数据矩阵的行列数表示图像的大小。
其中,该图像数据矩阵可以存储在卷积神经网络处理系统的加速卡中,比如,存储在加速卡的DDR(Double Data Rate,双倍速率同步动态随机存储器)存储器中。如果图像数据矩阵存储在DDR中,则上述第一总线为处理器或协处理与DDR之间连接的总线。也即步骤102中的“根据该矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据”可以包括:根据矩阵参数通过处理器或协处理与DDR之间连接的总线从DDR读取图像数据矩阵中相应的数据。
本实施例中,可以根据矩阵参数读取图像数据矩阵中相应行数或者列数的矩阵。
在矩阵参数包括特征矩阵的行列数时,读取的行数可以与特征矩阵行数对应,或者读取的列数可以与特征矩阵列数对应。
比如,图像数据矩阵为N*N的矩阵,特征矩阵为K*K的矩阵时,可以读取N*N图像数据矩阵中K行数据,得到K*N的待展开数据矩阵;该K和N为正整数,且K≤N。
该读取数据的起始位置可以根据实际需求设定,比如,可以从图像数据矩阵的第一行开始读取K行数据,也可以是从第2行开始读取K行数据等等。
又比如,图像数据矩阵为N*N的矩阵,特征矩阵为K*M的矩阵时,可以读取N*N图像数据矩阵中M列数据,得到N*M的待展开数据矩阵,该M为正整数,且M≤N。
在得到待展开数据矩阵后,可以通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存。该第二预设缓冲空间可以为预设缓冲器,例如,该预设缓冲器可以为协处理器中的缓冲器或者DDR,第二总线为处理器或协处理器与预设缓冲器之间连接的总线。
步骤103、通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据该矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据。
具体地,可以根据特征矩阵的行列数对待展开数据矩阵进行数据展开,展开之后可以得到若干数据组,在完成图像数据矩阵的数据展开后,可以根据该数据组形成一个数据矩阵,即展开后数据矩阵。后续可以根据展开后数据矩阵和特征矩阵作矩阵乘法,得到相应的数据,完成数据的卷积运算。
比如,在得到K*N的待展开数据矩阵之后,可以根据特征矩阵的行列数对该K*N的待展开数据矩阵进行展开。
此时,步骤103中的“根据该矩阵参数对待展开数据矩阵进行数据展开”可以包括:
根据该矩阵参数和该待展开数据矩阵的数据在第二预设缓冲空间内的存储地址,对该待展开数据矩阵进行数据展开。
如,将K*N的待展开数据矩阵写入第二预设缓冲空间内,然后,根据K*K特征矩阵的行列数以及K*N待展开数据矩阵内数据在第二预设缓冲空间内的存储地址,对该K*N待展开数据矩阵进行数据展开。
本实施例中对待展开数据矩阵可以进行滑动数据展开,具体地,在待展开数据矩阵上滑动窗口,然后对每次滑动后的窗口内数据进行数据展开,展开之后可以得到若干数据组。也即步骤“根据该矩阵参数和该待展开数据矩阵的数据在预设第二缓冲空间内的存储地址,对该待展开数据矩阵进行数据展开”可以包括:
根据该矩阵参数确定滑动窗口;
根据预设滑动方向和预设滑动步长在该待展开数据矩阵上滑动该滑动窗口;
在每次滑动后获取该滑动窗口内数据在该第二预设缓冲空间内的存储地址;
根据该存储地址从该第二预设缓冲空间读出相应的数据,以完成数据展开。
具体地,根据特征矩阵的行列数据可以确定相应尺寸的滑动窗口,比如,特征矩阵为K*K的矩阵时,可以确定一个K*K的滑动窗口。该滑动窗口用于从待展开数据矩阵中选取相应的数据进行展开。
其中,预设滑动方向可以包括:图像数据矩阵的行方向、列方向等等。 实际应用中,该预设滑动方向可以与步骤102的数据读取方式对应,比如,在读取图像数据矩阵的若干行数据时,该预设滑动方向可以为图像数据矩阵的行方向;又比如,在读取图像数据矩阵的若干列数据时,该预设滑动方向可以为图像数据矩阵的列方向。
该预设滑动步长为需要滑动的距离,可以根据实际数据展开需求设定,其可用在数据矩阵上需要滑动的数据个数来表示。比如,该预设滑动步长为1个、2个、3个数据等等。
在得到预设滑动步长、预设滑动方向和滑动窗口之后,可以沿着预设滑动方向以预设滑动步长在待展开数据矩阵上滑动窗口,在每次滑动窗口之后,可以获取窗口内数据在第二预设缓冲空间内的地址,然后,根据该地址和预设读取顺序从预设缓冲内读取相应的数据,以完成数据展开,也即采用跳地址方式读取数据实现数据展开。
如图2a-图2d所示的数据滑动展开的示意图,以从图像数据矩阵读取相应行的数据为例,参考图2a,假设图像数据矩阵为5*5的矩阵、特征矩阵为3*3的矩阵。首先从5*5图像数据矩阵中读取3行数据得到3*5待展开数据矩阵,即图2b-图2d中的矩阵,并写入第二预设缓冲空间,之后根据3*3特征矩阵的行列数确定滑动窗口,即图2b-图2d中的虚线框。参考图2b-图2d,可沿着行方向以一个数据的滑动步长在3*5待展开数据矩阵上滑动该滑动窗口,即从左到右滑动该滑动窗口。
参考图2b,在初始滑动位置,即第0次滑动之后,可以获取该滑动窗口内数据在第二预设缓冲空间内的存储地址,然后,根据该存储地址跳地址从第二预设缓冲空间中读取相应的数据,得到数据组(11、12、13、21、22、23、31、32、33),称为第一数据组。参考图2c,在得到第一数据组之后,在行方向以一个数据的滑动步长滑动该窗口,接着获取该滑动窗口内数据在第二预设缓冲空间内的存储地址,并根据该存储地址跳地址从第 二预设缓冲空间中读取相应的数据,得到数据组(12、13、14、22、23、24、32、33、34),称为第二数据组。参考图2d,在得到第二数据组之后,继续在行方向以一个数据的滑动步长滑动该窗口,接着获取该滑动窗口内数据在第二预设缓冲空间内的存储地址,并根据该存储地址跳地址从第二预设缓冲空间中读取相应的数据,得到数据组(13、14、15、23、24、25、33、34、35),称为第三数据组,至此便完成了对3*5待展开数据矩阵的数据展开。
其中,本实施例中滑动窗口在待展开数据矩阵上的初始位置,可以根据实际需求设定,比如,参考图2b,可以自待展开数据矩阵的第一列开始滑动,在其他实施例中,还可以自待展开数据矩阵的第二或者三列开始滑动。
同理,在从图像数据矩阵读取相应列的数据形成待展开数据矩阵的情况下,其也可以根据特征矩阵的行列数确定滑动窗口,然后,沿着待展开数据矩阵的列方向以预定滑动步长滑动该窗口,在每次滑动窗口后获取窗口内数据在第二预设缓冲空间内的存储地址,基于该存储地址从第二预设缓冲空间中读取相应的数据。其数据滑动展开的过程与上述实施例介绍的数据滑动展开类似,可以参考图2a-2d,此处不再赘述。
步骤104、通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,通过第二总线将所述未展开数据发送至第二预设缓冲空间并保存,并根据该未展开数据更新第二预设缓冲空间中保存的该待展开数据矩阵,返回执行步骤103。
具体地,通过第一总线从第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,通过第二总线将将读取的未展开数据发送至第二预设缓冲空间内并保存,并根据该未展开数据更新第二预设缓冲空间中保存的该待展开数据矩阵。
其中,未展开数据的数量可以根据实际需求设定,如1个、5个、1行、2行、或者1列、2列等等。
具体地,可以基于卷积步进通过第一总线从第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据。该卷积步进表示在对待展开数据矩阵展开后需要从图像数据矩阵中读取的未展开数据的行数或列数。
以图像数据矩阵为N*N的矩阵,特征矩阵为K*K的矩阵为例,在对K*N待展开数据矩阵进行数据展开之后,可以基于卷积步进从第一缓冲空间中保存的N*N图像数据矩阵中读取若干数量的数据,如相应行数或者列数的未展开数据。比如当卷积步进S=1时,可以根据卷积步进从N*N图像数据矩阵中读取一行或者一列未展开数据,然后,根据读取的未展开数据更新第二预设缓冲空间中保存的该待展开数据矩阵。
具体地,以从图像数据矩阵读取行数据组成待展开数据矩阵为例;当卷积步进S=1时,在对待展开数据矩阵进行数据展开之后,可以从第一缓冲空间中保存的图像数据矩阵中读取第K+1行数据,并根据该第K+1行数据更新第二预设缓冲空间中保存的待展开数据矩阵。在更新完待展开数据矩阵之后返回步骤102通过第二总线从第二预设缓冲空间中读取更新后的待展开数据矩阵,并对更新后的待展示数据矩阵进行数据展开,在展开完成之后,再次从第一缓冲空间中保存的图像数据矩阵中读取第K+2行数据,并根据该K+2行数据更新当前待展开数据矩阵,返回步骤102通过第二总线从第二预设缓冲空间中读取更新后的待展开数据矩阵,并对更新后的待展示数据矩阵进行数据展开。在展开完成之后再次从第一缓冲空间中保存的图像数据矩阵中读取第K+3行数据……以此类推,直到读取完图像数据矩阵内所有的数据为止。
其中,步骤104中的“根据该未展开数据更新第二预设缓冲空间中保存的该待展开数据矩阵”可以包括:通过第二总线从所述第二预设缓冲空 间中读取所述未展开数据,从该未展开数据中选取预设数量的目标数据,根据该目标数据覆盖第二预设缓冲空间中保存的该待展开数据矩阵内对应的数据。比如,在读取图像数据矩阵中至少两行或者两列未展开数据时,可以从两行数据中选取一行或者一列数据来更新待展开数据矩阵。
在其他实施例中,如果读取的未展开数据的数量与目标数据对应的预设数量相等时,可以直接根据该未展开数据更新待展开数据矩阵,比如,目标数据对应的预设数量为一行数量的数据时,可以在读取图像数据矩阵中一行数据之后,直接根据该行数据更新待展开数据矩阵。例如,在读取第K+1行数据后,直接根据该第K+1行数据更新待展开数据矩阵。
本实施例中,更新待展开数据矩阵的方式可以包括数据覆盖方式,即根据选取的目标数据覆盖待展开数据矩阵中相应的数据,以完成更新。
在本实施例数据处理方法中,因为已经保存至第二预设缓冲空间的待展开数据矩阵中的一些数据可以复用,因此只需要从第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,即未被保存到第二预设缓冲空间的数据,从而避免了在数据展开时对某些数据重复读取,减小了卷积神经网络处理系统内存储空间。而且,因为只需要通过第一总线读取图像数据矩阵中预设数量的未展开数据,通过第二总线将未展开数据发送至第二预设缓冲空间,因此可以减少数据传输量,节省第一总线和第二总线的传输带宽,进而提升了处理系统的数据处理能力。下面将介绍本发明实施例的几种卷积运算中数据处理方法的具体实现过程。
以图2a所示的图像数据矩阵和特征矩阵为例,图3a-3i为采用一数据处理方法进行数据展开的过程。在数据展开之后,参考图4可以作展开后数据矩阵和卷积核乘法,以完成卷积运算。
由图3a-图3i可知,该数据处理方法在数据展开时会有些数据会被重复读取多次,容易造成数据带宽的增加和存储空间的增大,降低了处理系 统的处理能力。
实际情况中,如果在卷积神经网络处理系统采用图3a-图3i所示的数据展开方式,会导致数据展开所需的数据传输带宽增加以及存储空间增大,降低了处理系统的数据处理能力。
以图5所示的卷积神经网络处理系统为例来说明采用图3a-图3i所示的数据展开方式的具体实现过程。该处理系统包括协处理器,服务器的CPU和记忆单元(memory),以及加速卡上的DDR存储器;服务器的CPU和协处理器之间一般通过PCI-e(Peripheral Component Interconnect Express,总线和接口标准)总线进行连接,进行数据交互和命令交互,如通过CMD path(命令路径)进行命令交互,通过data path(数据路径)进行数据交互。
该协处理器可以为FPGA或者其他辅助处理器,该协处理器可以包括:DDR控制器、InputBuf(input buffer,输入数据缓存单元),OutputBuf(output buffer,输出数据缓存单元),PE(Processing Element,处理单元)。该PE为协处理器中用于完成数据卷积的单元。
目前卷积运算中数据展开可以在处理系统的CPU、或者协处理器完成。如下:
在本发明一实施例中,通过系统的CPU完成数据展开。
参考图6a,CPU数据展开的方案包括:处理系统CPU采用图3a-图3i的方式对数据进行展开,完整展开后将展开后的数据存放在CPU内存中,通过PCI-e DMA传送到加速卡上的DDR RAM中,协处理器通过加载逻辑再从加速卡上的DDR RAM将数据加载到PE处理单元。由图6a可知,在系统的CPU中采用图3a-图3i所示的数据处理方法会导致数据展开效率比较低以及数据传输量增大,这样所需的PCI-e和DDR的读取带宽将会增大,降低了系统的处理能力。
然而,如果在系统的CPU上应用本实施例图2b-图2d所示的数据处理方法,由于不需要重复读取数据,可以提升数据展开效率。如果在系统的协处理器应用本实施例图2b-图2d所示的数据处理方法,可以减小数据传输量,降低所需的PCI-e和DDR的传输带宽。
在本发明另一实施例中,通过系统的协处理器完成数据展开。
参考图6b,在协处理器进行数据展开的方案包括:将未展开的数据存放在server(服务器)内存、加速卡DDR存储器和FPGA中,FPGA采用图3a-图3i的方式对数据进行展开。由图6b可知,该方案由于采用了图3a-图3i所示的方式进行数据展开,会重复读取某些数据,导致数据展开的效率较低、DDR数据传输量增大、需要消耗大量FPGA片上的存储单元。
以下将以图5所示的系统以及图2a所示的图像数据矩阵、特征矩阵为例,来介绍本发明实施例提供的数据处理方法,其中,假设卷积步进S=1,本实施例数据处理装置集成在协处理器中。
参考图7a-图7c,图像数据矩阵保存在加速卡的DDR存储器。具体地数据展开过程如下:
(1)读取图像数据矩阵中K=3行数据[11,12,13,14,15]、[21,22,23,24,25]、[31,32,33,34,35],得到待展开数据矩阵,将该待展开数据矩阵加载到协处理器的存储器中,如图7a所示。
(2)对存储器中的待展开数据矩阵进行数据滑动展开,得到展开后的数据[11,12,13,21,22,23,31,32,33]、[12,13,14,22,23,24,32,33,34]、13,14,15,23,24,25,33,34,35],如图7a所示。
具体地,可以采用根据特征矩阵的行列数据确定滑动窗口,然后,沿着待展开数据矩阵的行方向以一个数据的滑动步长进行滑动窗口,在每次滑动窗口后基于窗口内数据在存储器的存储地址从该存储器中跳地址读取相应的数据,以实现数据展开。
(3)将图像数据矩阵中第4行数据[41,42,43,44,45]加载到存储器,并覆盖当前待展开数据矩阵中第一行数据[11,12,13,14,15](即当前存储时间最早的一行数据),以更新待展开数据矩阵,如图7b所示。
(4)对更新后的待展开数据矩阵进行数据滑动展开,得到展开后的数据[21,22,23,31,32,33,41,42,43]、[22,23,24,32,33,34,42,43,44]、[23,24,25,33,34,35,43,44,45],如图7b所示。
(5)将图像数据矩阵中第5行数据[51,52,53,54,55]加载到存储器,并覆盖当前待展开数据矩阵中第一行数据[21,22,23,24,25],以更新待展开数据矩阵,如图7c所示。
(6)对更新后的待展开数据矩阵进行数据滑动展开,得到展开后的数据[31,32,33,41,42,43,51,52,53]、[32,33,34,42,43,44,52,53,54]、[33,34,35,43,44,45,53,54,55],如图7c所示。
基于上述对数据展开方案的介绍,图7a-图7c所示的数据展开方案可以提高数据展开的效率;另外由于本发明实施例提供的方案在协处理器中进行数据展开以及采用复用读取方式进行数据展开,因此还可以降低读取数据量,进而降低了对PCI-e和DDR的读取带宽的需求,提升了系统的处理能力。
因此,图7a-图7c所示的数据展开方案可以提高数据展开的效率,还可以节省协处理器的存储空间,降低数据展开对应存储空间的需求,提升系统处理能力。
为了提高从DDR读取数据的效率以及提高数据展开效率,本实施例方法可以根据固定数据大小来从图像数据矩阵中读取预设数量的数据;也即步骤104中的“读取图像数据矩阵中预设数量的未展开数据”可以包括:
根据第一预定数据量读取图像数据矩阵中预设数量的未展开数据;
将该未展开数据保存至该预设缓冲空间内。
其中,第一预定数据量可以根据实际需求设定,比如为8Kbyte、16Kbyte等,第一预定数据量的未展开数据可以称为一个数据包(packet)。第一预设数据量可以基于图像数据矩阵的行数据或者列数据的数据量设定,比如,可以是行数据或者列数据的数据量的整数倍。
为了提高数据展开效率以及缓冲空间的利用率,本发明实施例可以在预设缓冲空间的剩余空间足够加载新的packet时进行数据的读取加载,也即步骤“根据第一预定数据量读取图像数据矩阵中预设数量的未展开数据”可以包括:
获取该预设缓冲空间的剩余可用容量;
当该预设缓冲空间的剩余可用容量大于或等于该第一预定数据量时,根据第一预定数据量读取图像数据矩阵中预设数量的未展开数据。
由于第一预定数据量根据实际需求设定,每次加载的packet大于图像数据矩阵一行或者一列的数据量,因此,在加载新的packet之后,可以从该packet选取一定数量的目标数据来更新待展开数据矩阵。也即步骤“根据该未展开数据更新该待展开数据矩阵”可以包括:
从该未展开数据中选取预设数量的目标数据;
根据该目标数据更新该待展开数据矩阵。
具体地,可以选取在图像数据矩阵中属于同一行或者同一列的数据;比如,第一预定数据量为8个数据的数据量,即一个packet包含8个图像数据矩阵中的数据的情况下,对待展开数据矩阵展开完成之后,可以从图像数据矩阵中读取8个未展开数据,假设为图7a中的[41,42,43,44,45,51,52,53],然后,从8个未展开数据中选取位于同一行的目标数据,即[41,42,43,44,45],根据该目标数据更新待展开数据矩阵。此外,在选取目标数据时还需要考虑与待展开数据矩阵的最后一行或者最后一列数据之间的关系。
为了提高数据展开速度,本实施例方法可以在预设缓冲空间当前缓存的数据足够进行数据展开时便可以进行数据展开,也即步骤“根据该矩阵参数和该待展开数据矩阵的数据在预设缓冲空间内的存储地址,对该待展开数据矩阵进行数据展开”可以包括:
获取该预设缓冲空间当前的缓存数据量;
当该缓存数据量大于或等于第二预定数据量时,根据该矩阵参数和该待展开数据矩阵的数据在预设缓冲空间内的存储地址,对该待展开数据矩阵进行数据展开。
其中,第二预定数据量可以根据特征矩阵和图像数据矩阵的行列数来确定,比如,以图像数据矩阵为N*N的矩阵,特征矩阵为K*K的矩阵为例,该第二预设数据量可以为K*N个数据的数据量。
由上可知,本发明实施例采用获取特征矩阵的矩阵参数,然后,根据该矩阵参数读取图像数据矩阵中相应的数据,得到待展开数据矩阵,根据该矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据,读取图像数据矩阵中预设数量的未展开数据,并根据该未展开数据更新该待展开数据矩阵,返回执行根据该矩阵参数对待展开数据矩阵进行展开的步骤。该方案在卷积过程中可以复用读出的图像数据以实现数据展开,避免对某些数据重复读取,降低了卷积神经网络数据展开对数据带宽或存储空间的需求,因此,可以提升卷积神经网络处理系统的数据处理能力和数据展开的效率。
根据图1所示的实施例所描述的方法,以下将举例对卷积神经网络的数据处理方法作进一步详细说明。
在本实施例中,将以该卷积神经网络的数据处理装置集成在计算设备的协处理器中,以图5所示的系统架构为例进行说明。该协处理器可以为FPGA、ASIC、或其他类型的协处理器。
本实施例中,图像数据矩阵保存在处理系统DDR存储器中。
如图8a所示,一种卷积神经网络的数据处理方法,具体流程可以如下:
步骤201、协处理器获取系统参数,该系统参数包括特征矩阵的矩阵参数。
该矩阵参数可以包括特征矩阵的行列数。本实施例中该系统参数还可以包括图形数据矩阵的行列数、预定数据量B、预定数据量A、滑动方向、滑动步进等等。
步骤202、协处理器根据特征矩阵的矩阵参数从DDR存储器中读取相应行数的数据,得到待展开数据矩阵Q。
比如,从DDR存储器中读取N*N图像数据矩阵的K行数据,得到K*N的待展开数据矩阵Q,具体地,可以读取N*N图像数据矩阵的第1-K行数据。
以图2a所示的5*5图像数据矩阵以及3*3特征矩阵为例,FPGA可以从5*5图像数据矩阵读取第1-3行的数据,组成3*5的待展开数据矩阵Q。
步骤203、协处理器将该待展开数据矩阵Q写入协处理器的缓冲器内。
比如,FPGA将该3*5的待展开数据矩阵Q写入FPGA内的缓冲器。
步骤204、协处理器在缓冲器当前缓存的数据量大于预定数据量A时,根据特征矩阵的矩阵参数对该数据矩阵Q进行数据滑动展开,得到展开后的数据。
其中,预定数据量A可以为3*5个数据的数据量,具体可以根据实际需求设定。
本实施例中,缓冲器可以为环形缓冲器,参考图8b,该环形缓冲器具有两个指标,一个LenBufSpaceReady,用于表示环形缓冲器剩余空间大小或者剩余可用容量大小;另一个是LenBufDataValid,用于表示环形缓冲器当前缓存的缓存数据量。
其中,在写入数据后LenBufSpaceReady减1,LenBufDataValid加1,展开读出数据时LenBufSpaceReady加1,LenBufDataValid减1。本实施例中数据的加载写入和展开读取可以并行地进行,以提升数据展开效率。
在协处理器确定LenBufDataValid大于预定数据量A时,则根据特征矩阵的矩阵参数对该数据矩阵Q进行数据滑动展开;否则,不进行数据滑动展开。
以图2a所示的5*5图像数据矩阵以及3*3特征矩阵为例,FPGA可以通过图2b-图2d所示的方式对3*5的待展开数据矩阵Q进行数据滑动展开,得到展开后的数据(11、12、13、21、22、23、31、32、33)、(12、13、14、22、23、24、32、33、34)、(13、14、15、23、24、25、33、34、35)。
步骤205、协处理器在缓冲器的剩余可用容量大于预定数据量B时,根据预定数据量B从DDR存储器中读取相应数量的未展开数据,并写入该缓冲器。
比如,协处理器在确定LenBufSpaceReady大于预定数据量B时,则从DDR存储器中读取预定数据量B的未展开数据,并写入该缓冲器。
该预定数据量B为一个固定数据量,即固定数据大小,可以根据实际需求设定,比如,该预定数据量B可以为8Kbyte等,该预定数据量B可以根据图像数据矩阵中一行或者一列的数据量设定。
例如,参考图7a-图7b,在对3*3待展开数据矩阵进行数据滑动展开之后,可以读取第4行未展开数据[41,42,43,44,45],写入缓冲器。
本实施例中,在预定数据量B为图像数据矩阵中一行或者一列的数据量,即预定数据量B对应的图像数据个数等于矩阵的列数或者行数时,协处理器可以从DDR存储器中读取图像数据矩阵的一行或者一列未展开数据;比如,可以读取第K+1行的未展开数据,即N个未展开数据。
在其他实施例中,预定数据量B对应的图像数据个数会大于N个,且 不为N的整数倍,比如,可能为N+1个等等。例如,在3*3待展开数据矩阵进行数据滑动展开之后,可以根据预定数据量B读取7个未展开数据[41,42,43,44,45,51,52]写入缓冲器中。
步骤206、协处理器根据写入的未展开数据更新待展开数据矩阵Q,返回步骤204。
例如,在写入第K+1行的未展开数据之后,可以基于该第K+1行的未展开数据更新待展开数据矩阵Q,比如,根据该K+1行的数据覆盖矩阵Q的第一行数据。
又例如,在写入N+1个未展开数据之后,可以选取相应的N个未展开数据,然后,根据该选取的N个未展开数据覆盖矩阵Q的第一行数据。
本发明实施例数据处理方法应用在以FPGA为协处理器的异构处理系统或者纯CPU的处理系统所能实现的所有服务中,比如该方法可以应用在以色情图片检测过滤为目标的业务场景中。参考图8c,一般通过Caffe,Tensor Flow等开源深度学习平台进行实现。在实现卷积神经网络模型(如AlexNet,Googlenet,VGG等)时,学习平台会调用BLAS库(Basic Linear Algebra Subprograms,基础线性代数程序集)进行矩阵运算,在纯CPU的处理系统中这些矩阵运算由CPU计算;在异构处理系统中,矩阵运算可以offload到FPGA中进行计算(一般通过PCI-e进行交互)。计算过程中,CPU和FPGA通过共享DDR RAM的方式进行数据交互。
由上可知,本发明实施例采用协处理器获取特征矩阵的矩阵参数,然后,根据该矩阵参数读取图像数据矩阵中相应的数据,得到待展开数据矩阵,根据该矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据,读取图像数据矩阵中相应数量的未展开数据,并根据该未展开数据更新该待展开数据矩阵,返回执行根据该矩阵参数对待展开数据矩阵进行展开的步骤。该方案在卷积过程中可以复用读出的图像数据以实现数据展开,避 免对某些数据重复读取,降低了卷积神经网络数据展开对数据带宽或存储空间的需求,因此,可以提升卷积神经网络处理系统的数据处理能力和数据展开的效率。
为了更好地实施上述方法,本发明实施例还提供一种卷积神经网络的数据处理装置,该数据处理装置具体可以集成在计算设备的处理器中,该处理器可以为FPGA、ASIC、GPU或者其它类型的协处理器。如图9a所示,该卷积神经网络的数据处理装置可以包括获取单元301、读取单元302、数据展开单元303和更新单元304,如下:
获取单元301,用于获取特征矩阵的矩阵参数。
该特征矩阵为卷积运算的卷积核,也称为权重矩阵,该特征矩阵可以根据实际需求设定。其中,特征矩阵的矩阵参数可以包括矩阵行列数,可称为卷积核的尺寸。
读取单元302,用于根据该矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据,得到待展开数据矩阵。
该图像数据矩阵中的元素为图像像素对应的像素数据,如经过处理后的像素值。图像数据矩阵的行列数表示图像的大小。
比如,读取单元302,可以用于根据矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应行数或者列数的矩阵。
在矩阵参数包括特征矩阵的行列数时,读取的行数可以与特征矩阵行数对应,或者读取的列数可以与特征矩阵列数对应。
保存单元305,用于在读取单元302得到该待展开数据矩阵之后,该数据展开单元303对数据展开之前,通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存。
数据展开单元303,用于通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据该矩阵参数对待展开数据矩阵进行数据展 开,得到展开后的数据。
比如,数据展开单元303,用于根据矩阵参数对待展开数据矩阵进行数据滑动展开。
数据展开单元,具体用于根据该矩阵参数和该待展开数据矩阵的数据在第二预设缓冲空间内的存储地址,对该待展开数据矩阵进行数据展开。
具体地,数据展开单元303可以包括:
确定子单元,用于根据该矩阵参数确定滑动窗口;
滑动子单元,用于根据预设滑动方向和预设滑动步长在该待展开数据矩阵上滑动该滑动窗口;
地址获取子单元,用于在每次滑动后获取该滑动窗口内数据在该第二预设缓冲空间内的存储地址;
读出子单元,用于根据该存储地址从该第二预设缓冲空间读出相应的数据,以完成数据展开。
其中,确定子单元,可以用于根据特征矩阵的行列数据可以确定相应尺寸的滑动窗口,比如,特征矩阵为K*K的矩阵时,可以确定一个K*K的滑动窗口。该滑动窗口用于从待展开数据矩阵中选取相应的数据进行展开。
其中,预设滑动方向可以包括:图像数据矩阵的行方向、列方向等等。该预设滑动步长为需要滑动的距离,可以根据实际数据展开需求设定,其可用在数据矩阵上需要滑动的数据个数来表示。比如,该预设滑动步长为1个、2个、3个数据等等。
该滑动子单元,可以具体用于沿着预设滑动方向以预设滑动步长在待展开数据矩阵上滑动窗口。本实施例中滑动窗口在待展开数据矩阵上的初始位置,可以根据实际需求设定,比如,参考图2b,可以自待展开数据矩阵的第一列开始滑动,在其他实施例中,还可以自待展开数据矩阵的第二 或者三列开始滑动。
更新单元304,用于通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,通过第二总线将所述未展开数据发送至第二预设缓冲空间并保存,根据该未展开数据更新第二预设缓冲空间中保存的该待展开数据矩阵,并触发该数据展开单元303执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据该矩阵参数对待展开数据矩阵进行展开的步骤。
比如,更新单元304可以包括:
读取子单元,用于根据第一预定数据量通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,通过第二总线将该未展开数据发送至该第二预设缓冲空间内并保存;
更新子单元,用于根据该未展开数据更新第二预设缓冲空间中保存的该待展开数据矩阵;
触发子单元,用于在更新子单元更新该待展开数据矩阵后触发该数据展开单元303执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据该矩阵参数对待展开数据矩阵进行展开的步骤。
其中,读取子单元,具体用于:
获取该第二预设缓冲空间的剩余可用容量;
当该第二预设缓冲空间的剩余可用容量大于或等于该第一预定数据量时,根据第一预定数据量通过第一总线从第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据。
该更新子单元,具体用于:
通过第二总线从第二预设缓冲空间中读取所述未展开数据,从该未展开数据中选取预设数量的目标数据;
根据该目标数据更新第二预设缓冲空间中保存的该待展开数据矩阵。
本实施例中,数据展开单元303可以具体用于:
获取该第二预设缓冲空间当前的缓存数据量;
当该缓存数据量大于或等于第二预定数据量时,根据该矩阵参数和该待展开数据矩阵的数据在第二预设缓冲空间内的存储地址,对该待展开数据矩阵进行数据展开。
具体实施时,以上各个单元可以作为独立的实体来实现,也可以进行任意组合,作为同一或若干个实体来实现,以上各个单元的具体实施可参见前面的方法实施例,在此不再赘述。
比如,实际应用中获取单元301的功能可以由数据展开控制器实现,读取单元302的功能可以由数据展开控制器和DDR读数据控制器实现,数据展开单元303的功能可以由数据展开控制器、数据扫描控制器和地址生成器实现,更新单元304的功能可以由数据展开控制器、DDR读数据控制器实现。
如图9c,本实施例还提供了一种协处理器,包括:数据展开控制器401、DDR读数据控制器402、数据缓存单元403、数据扫描控制器404、地址生成器405、处理单元(PE)406。
数据展开控制器401,用于获取特征矩阵的矩阵参数,根据该矩阵参数控制DDR读数据控制器402读取图像数据矩阵中相应的数据,得到待展开数据矩阵,并将该待展开数据矩阵写入数据缓存单元403中。
该数据展开控制器401,还用于根据系统参数(如特征矩阵的矩阵参数)控制数据扫描控制器404和地址生成器405对待展开数据矩阵进行数据展开,得到展开后的数据;控制DDR读数据控制器402读取图像数据矩阵中预设数量的未展开数据,以及控制DDR读数据控制器402根据该未展开数据更新该待展开数据矩阵,并触发数据扫描控制器404和地址生成器405对待展开数据矩阵进行展开。
比如,数据展开控制器401可以根据系统参数(如特征矩阵的矩阵参数)和数据缓存单元403的状态(如当前存缓存的数据量)控制数据扫描控制器404和地址生成器405对待展开数据矩阵进行数据展开。数据展开控制器401还可以根据数据缓存单元403的状态(如剩余可用容量)控制DDR读数据控制器402读取图像数据矩阵中预设数量的未展开数据。
DDR读数据控制器402,用于在数据展开控制器401的控制下读取图像数据矩阵中相应的数据,得到待展开数据矩阵,读取图像数据矩阵中预设数量的未展开数据,根据该未展开数据更新该待展开数据矩阵,以及读取的数据写入数据缓存单元403。
数据缓存单元403,用于缓存DDR读数据控制器402读取的数据,以及输出展开后的数据给处理单元。
数据扫描控制器404和地址生成器405,用于在数据展开控制器401的控制下对待展开数据矩阵进行数据展开。
处理单元(PE)406,用于对展开后的数据和特征矩阵作乘法运算,以实现卷积运算。
本实施例中卷积神经网络的数据处理装置具体可以集成在CPU或者FPGA、ASIC、GPU等协处理器设备中。
本发明实施例还提供了一种卷积神经网络的数据处理装置,该装置包括一个或多个处理器以及存储介质。该处理器包括CPU或者FPGA、ASIC、GPU等协处理器设备,该存储介质可以为非易失性计算机可读存储介质,用于存储一个或多个计算机可读指令。所述一个或多个计算机可读指令包括获取单元、读取单元、数据展开单元和更新单元。在另一个实施例中,所述一个或多个计算机可读指令还包括保存单元。该处理器用于读取存储介质中存储的一个或多个计算机可读指令,以实现上述实施例中卷积神经网络的数据处理方法的步骤以及卷积神经网络的数据处理装置中各单 元的功能。
由上可知,本发明实施例采用获取单元301获取特征矩阵的矩阵参数,然后,由读取单元302根据该矩阵参数读取图像数据矩阵中相应的数据,得到待展开数据矩阵,由数据展开单元303根据该矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据,由更新单元304读取图像数据矩阵中相应数量的未展开数据,并根据该未展开数据更新该待展开数据矩阵,返回执行根据该矩阵参数对待展开数据矩阵进行展开的步骤。该方案在卷积过程中可以复用读出的图像数据以实现数据展开,避免对某些数据重复读取,降低了卷积神经网络数据展开对数据带宽或存储空间的需求,因此,可以提升卷积神经网络处理系统的数据处理能力和数据展开的效率。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。
以上对本发明实施例所提供的一种卷积神经网络的数据处理方法和装置进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (15)

  1. 一种卷积神经网络的数据处理方法,应用于计算设备,包括在所述计算设备的处理器或协处理器上执行如下步骤:
    获取特征矩阵的矩阵参数;
    根据所述矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据,得到待展开数据矩阵,并通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存;
    通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据;
    通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,通过第二总线将所述未展开数据发送至第二预设缓冲空间并保存,并根据所述未展开数据更新第二预设缓冲空间中保存的所述待展开数据矩阵;
    返回执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行展开的步骤。
  2. 如权利要求1所述的数据处理方法,其中,所述根据所述矩阵参数对待展开数据矩阵进行数据展开,包括:
    根据所述矩阵参数和所述待展开数据矩阵的数据在第二预设缓冲空间内的存储地址,对所述待展开数据矩阵进行数据展开。
  3. 如权利要求2所述的数据处理方法,其中,根据所述矩阵参数和所述待展开数据矩阵的数据在第二预设缓冲空间内的存储地址,对所述待展开数据矩阵进行数据展开,包括:
    根据所述矩阵参数确定滑动窗口;
    根据预设滑动方向和预设滑动步长在所述待展开数据矩阵上滑动所述滑动窗口;
    在每次滑动后获取所述滑动窗口内数据在所述第二预设缓冲空间内的存储地址;
    根据所述存储地址从所述第二预设缓冲空间读出相应的数据,以完成数据展开。
  4. 如权利要求2所述的数据处理方法,其中,所述通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,包括:
    根据第一预定数据量通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据。
  5. 如权利要求4所述的数据处理方法,其中,所述根据第一预定数据量通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据,包括:
    获取所述第二预设缓冲空间的剩余可用容量;
    当所述第二预设缓冲空间的剩余可用容量大于或等于所述第一预定数据量时,根据第一预定数据量通过第一总线从所述第一缓冲空间中读取图像数据矩阵中预设数量的未展开数据。
  6. 如权利要求4所述的数据处理方法,其中,根据所述未展开数据更新第二预设缓冲空间中保存的所述待展开数据矩阵,包括:
    通过第二总线从所述第二预设缓冲空间中读取所述未展开数据,从所述未展开数据中选取预设数量的目标数据;
    根据所述目标数据更新第二预设缓冲空间中保存的所述待展开数据矩阵。
  7. 如权利要求5所述的数据处理方法,其中,根据所述矩阵参数和所述待展开数据矩阵的数据在预设第二缓冲空间内的存储地址,对所述待展 开数据矩阵进行数据展开,包括:
    获取所述第二预设缓冲空间当前的缓存数据量;
    当所述缓存数据量大于或等于第二预定数据量时,根据所述矩阵参数和所述待展开数据矩阵的数据在第二预设缓冲空间内的存储地址,对所述待展开数据矩阵进行数据展开。
  8. 一种卷积神经网络的数据处理装置,包括一个或一个以上处理器和一个或一个以上非易失性存储介质,所述一个或一个以上非易失性存储介质存储一个或多个计算机可读指令,经配置由所述一个或者一个以上处理器执行;所述一个或一个以上计算机可读指令包括:
    获取单元,用于获取特征矩阵的矩阵参数;
    读取单元,用于根据所述矩阵参数通过第一总线从第一缓冲空间中读取图像数据矩阵中相应的数据,得到待展开数据矩阵;
    保存单元,用于通过第二总线将所述待展开数据矩阵发送至第二预设缓冲空间并保存;
    数据展开单元,用于通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行数据展开,得到展开后的数据;
    更新单元,用于通过第一总线从所述第一缓冲空间中读取图像数据矩阵中相应数量的未展开数据,通过第二总线将所述未展开数据发送至第二预设缓冲空间并保存,根据所述未展开数据更新第二预设缓冲空间中保存的所述待展开数据矩阵,并触发所述数据展开单元执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行展开的步骤。
  9. 如权利要求8所述的数据处理装置,其中,所述数据展开单元,具体用于根据所述矩阵参数和所述待展开数据矩阵的数据在第二预设缓冲 空间内的存储地址,对所述待展开数据矩阵进行数据展开。
  10. 如权利要求9所述的数据处理装置,其中,所述数据展开单元包括:
    确定子单元,用于根据所述矩阵参数确定滑动窗口;
    滑动子单元,用于根据预设滑动方向和预设滑动步长在所述待展开数据矩阵上滑动所述滑动窗口;
    地址获取子单元,用于在每次滑动后获取所述滑动窗口内数据在所述第二预设缓冲空间内的存储地址;
    读出子单元,用于根据所述存储地址从所述第二预设缓冲空间读出相应的数据,以完成数据展开。
  11. 如权利要求9所述的数据处理装置,其中,所述更新单元包括:
    读取子单元,用于根据第一预定数据量通过第一总线从所述第一缓冲空间中读取图像数据矩阵中相应数量的未展开数据,通过第二总线将所述未展开数据发送至所述第二预设缓冲空间内并保存;
    更新子单元,用于根据所述未展开数据更新第二预设缓冲空间中保存的所述待展开数据矩阵;
    触发子单元,用于在更新子单元更新所述待展开数据矩阵后触发所述数据展开单元执行所述通过第二总线从所述第二预设缓冲空间中读取所述待展开数据矩阵,并根据所述矩阵参数对待展开数据矩阵进行展开的步骤。
  12. 如权利要求11所述的数据处理装置,其中,所述读取子单元,具体用于:
    获取所述第二预设缓冲空间的剩余可用容量;
    当所述第二预设缓冲空间的剩余可用容量大于或等于所述第一预定数据量时,根据第一预定数据量通过第一总线从所述第一缓冲空间中读取图 像数据矩阵中相应数量的未展开数据。
  13. 如权利要求11所述的数据处理装置,其中,所述更新子单元,具体用于:
    通过第二总线从所述第二预设缓冲空间中读取所述未展开数据,从所述未展开数据中选取相应数量的目标数据;
    根据所述目标数据更新第二预设缓冲空间中保存的所述待展开数据矩阵。
  14. 如权利要求12所述的数据处理装置,其中,所述数据展开单元具体用于:
    获取所述第二预设缓冲空间当前的缓存数据量;
    当所述缓存数据量大于或等于第二预定数据量时,根据所述矩阵参数和所述待展开数据矩阵的数据在第二预设缓冲空间内的存储地址,对所述待展开数据矩阵进行数据展开。
  15. 一种非易失性计算机可读存储介质,存储有计算机可读指令,所述计算机可读指令能够使至少一个处理器执行如权利要求1-7任一项所述的方法。
PCT/CN2017/108468 2016-10-31 2017-10-31 一种卷积神经网络的数据处理方法和装置 WO2018077295A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/250,204 US11222240B2 (en) 2016-10-31 2019-01-17 Data processing method and apparatus for convolutional neural network
US17/522,891 US11593594B2 (en) 2016-10-31 2021-11-09 Data processing method and apparatus for convolutional neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610933471.7A CN107742150B (zh) 2016-10-31 2016-10-31 一种卷积神经网络的数据处理方法和装置
CN201610933471.7 2016-10-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/250,204 Continuation US11222240B2 (en) 2016-10-31 2019-01-17 Data processing method and apparatus for convolutional neural network

Publications (1)

Publication Number Publication Date
WO2018077295A1 true WO2018077295A1 (zh) 2018-05-03

Family

ID=61235101

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/108468 WO2018077295A1 (zh) 2016-10-31 2017-10-31 一种卷积神经网络的数据处理方法和装置

Country Status (3)

Country Link
US (2) US11222240B2 (zh)
CN (1) CN107742150B (zh)
WO (1) WO2018077295A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447256A (zh) * 2018-09-12 2019-03-08 上海交通大学 基于FPGA的Tensorflow系统加速的设计方法

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388537B (zh) * 2018-03-06 2020-06-16 上海熠知电子科技有限公司 一种卷积神经网络加速装置和方法
CN109375952B (zh) * 2018-09-29 2021-01-26 北京字节跳动网络技术有限公司 用于存储数据的方法和装置
CN109785944B (zh) * 2018-12-13 2023-02-10 平安医疗健康管理股份有限公司 基于数据分析的医院评价方法及相关产品
CN109886395B (zh) * 2019-03-06 2020-11-24 上海熠知电子科技有限公司 一种面向多核图像处理卷积神经网络的数据读取方法
CN110032538B (zh) * 2019-03-06 2020-10-02 上海熠知电子科技有限公司 一种数据读取系统和方法
CN110309088B (zh) * 2019-06-19 2021-06-08 北京百度网讯科技有限公司 Zynq fpga芯片及其数据处理方法、存储介质
CN110428358A (zh) * 2019-08-07 2019-11-08 上海安路信息科技有限公司 特征图像数据读写方法及读写系统
CN110543849B (zh) * 2019-08-30 2022-10-04 北京市商汤科技开发有限公司 检测器的配置方法及装置、电子设备和存储介质
CN111507178B (zh) * 2020-03-03 2024-05-14 平安科技(深圳)有限公司 数据处理的优化方法及装置、存储介质、计算机设备
CN114065905A (zh) * 2020-08-07 2022-02-18 深圳先进技术研究院 数据批处理方法及其批处理装置、存储介质和计算机设备
CN114519166A (zh) * 2020-11-18 2022-05-20 英业达科技有限公司 数据传递及合并的方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077233A (zh) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 单通道卷积层及多通道卷积层处理方法和装置
CN104915322A (zh) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 一种卷积神经网络硬件加速方法及其axi总线ip核
CN105678378A (zh) * 2014-12-04 2016-06-15 辉达公司 间接访问样本数据以在并行处理系统中执行多卷积操作

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1560727A (zh) * 2004-02-16 2005-01-05 复旦大学 一种新型“一间隔跳跃”高效矩阵乘法器
CN105447566B (zh) * 2014-05-30 2018-07-20 富士通株式会社 训练装置、训练方法以及检测装置
EP3035204B1 (en) * 2014-12-19 2018-08-15 Intel Corporation Storage device and method for performing convolution operations
CN104899182B (zh) * 2015-06-09 2017-10-31 中国人民解放军国防科学技术大学 一种支持可变分块的矩阵乘加速方法
WO2017011702A1 (en) * 2015-07-15 2017-01-19 Cylance Inc. Malware detection
CN105260773B (zh) * 2015-09-18 2018-01-12 华为技术有限公司 一种图像处理装置以及图像处理方法
US9904874B2 (en) * 2015-11-05 2018-02-27 Microsoft Technology Licensing, Llc Hardware-efficient deep convolutional neural networks
CN105844653B (zh) * 2016-04-18 2019-07-30 深圳先进技术研究院 一种多层卷积神经网络优化系统及方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077233A (zh) * 2014-06-18 2014-10-01 百度在线网络技术(北京)有限公司 单通道卷积层及多通道卷积层处理方法和装置
CN105678378A (zh) * 2014-12-04 2016-06-15 辉达公司 间接访问样本数据以在并行处理系统中执行多卷积操作
CN104915322A (zh) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 一种卷积神经网络硬件加速方法及其axi总线ip核

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LU, ZHIJIAN: "The Research on Parallel Architecture for FPGA-based Convolutional Neural Networks", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, no. 04, 15 April 2014 (2014-04-15), ISSN: 1674-022X *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447256A (zh) * 2018-09-12 2019-03-08 上海交通大学 基于FPGA的Tensorflow系统加速的设计方法

Also Published As

Publication number Publication date
US11593594B2 (en) 2023-02-28
CN107742150A (zh) 2018-02-27
US20220067447A1 (en) 2022-03-03
US20190147299A1 (en) 2019-05-16
US11222240B2 (en) 2022-01-11
CN107742150B (zh) 2020-05-12

Similar Documents

Publication Publication Date Title
WO2018077295A1 (zh) 一种卷积神经网络的数据处理方法和装置
CN110546611B (zh) 通过跳过处理操作来减少神经网络处理器中的功耗
US11449576B2 (en) Convolution operation processing method and related product
KR102499396B1 (ko) 뉴럴 네트워크 장치 및 뉴럴 네트워크 장치의 동작 방법
EP3098762B1 (en) Data-optimized neural network traversal
US11200724B2 (en) Texture processor based ray tracing acceleration method and system
KR101667508B1 (ko) 그래픽 프로세싱 유닛 기반 메모리 전송 동작들을 수행하는 다중모드 메모리 액세스 기법들
KR20210036715A (ko) 뉴럴 프로세싱 장치 및 뉴럴 프로세싱 장치에서 뉴럴 네트워크의 풀링을 처리하는 방법
US10762401B2 (en) Image processing apparatus controlling the order of storing decompressed data, and method thereof
JP5706754B2 (ja) データ処理装置及びデータ処理方法
US9172839B2 (en) Image forming apparatus, control method and storage medium
CN110377874B (zh) 卷积运算方法及系统
KR20210014561A (ko) 다수 컨벌루션 윈도우 중의 이미지 데이터를 추출하는 방법, 장치, 기기 및 컴퓨터 판독 가능한 저장매체
EP4071619A1 (en) Address generation method, related device and storage medium
JP2017010255A (ja) 画像認識装置および画像認識方法
CN111125628A (zh) 人工智能处理器处理二维数据矩阵的方法和设备
US9019284B2 (en) Input output connector for accessing graphics fixed function units in a software-defined pipeline and a method of operating a pipeline
CN107977923B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
JP6214367B2 (ja) 画像合成装置及び画像合成プログラム
CN108491546A (zh) 一种页面切换方法及电子设备
JP6160317B2 (ja) 画像処理装置及びプログラム
US7928987B2 (en) Method and apparatus for decoding video data
KR20220046794A (ko) 신경망 처리 시스템에서 효율적 메모리 접근을 위한 방법 및 장치
JP2020131547A (ja) 画像処理装置、画像形成装置、及びプログラム
CN108696670A (zh) 在成像时进行瓦片重用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17865903

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17865903

Country of ref document: EP

Kind code of ref document: A1