WO2020042771A9 - 图像识别处理方法和装置 - Google Patents

图像识别处理方法和装置 Download PDF

Info

Publication number
WO2020042771A9
WO2020042771A9 PCT/CN2019/095449 CN2019095449W WO2020042771A9 WO 2020042771 A9 WO2020042771 A9 WO 2020042771A9 CN 2019095449 W CN2019095449 W CN 2019095449W WO 2020042771 A9 WO2020042771 A9 WO 2020042771A9
Authority
WO
WIPO (PCT)
Prior art keywords
parameters
unit
neural network
convolutional neural
convolution
Prior art date
Application number
PCT/CN2019/095449
Other languages
English (en)
French (fr)
Other versions
WO2020042771A1 (zh
Inventor
刘敏丽
张楠赓
Original Assignee
嘉楠明芯(北京)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 嘉楠明芯(北京)科技有限公司 filed Critical 嘉楠明芯(北京)科技有限公司
Priority to US17/272,557 priority Critical patent/US12026105B2/en
Publication of WO2020042771A1 publication Critical patent/WO2020042771A1/zh
Publication of WO2020042771A9 publication Critical patent/WO2020042771A9/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the invention belongs to the field of data processing, and in particular relates to an image recognition processing method and device.
  • CNN Convolutional Neural Network
  • CNN is a feedforward artificial neural network in which individual neurons are collaged in such a way that they respond to overlapping sections in the field of view.
  • CNN is inspired by biological optic nerve behavior.
  • CNN uses multiple layers of neuron connections to process image data to achieve high accuracy in image recognition.
  • a single processor is limited in terms of computing power. Therefore, other computing configurations need to be explored in order to meet the needs of supporting CNN.
  • the use of hardware-specific CNN accelerators is implemented in the form of general-purpose computing on graphics processing units (GPU), multi-core processors, field programmable gate arrays (FPGA), and application-specific integrated circuits (ASIC) . It should be noted that because software cannot meet the requirements for image data processing speed, in the field of image data processing, the CNN accelerator is implemented by hardware.
  • the CNN operation needs to perform multiple operations on image data, and the operation process is performed in a serial manner, which has technical problems of low operation efficiency and poor real-time performance.
  • the present invention provides an image recognition processing method and device to implement parallel processing of convolutional neural network operations and improve the real-time performance of image recognition processing.
  • an embodiment of the present invention provides an image recognition processing method, including:
  • the original image data includes M pixel data, where M is a positive integer;
  • the convolutional neural network operation module performs convolutional neural network operations on the original image data according to the convolutional neural network configuration parameters and the convolutional neural network operation parameters.
  • the convolutional neural network operation module includes N parallel operation components, each The arithmetic components include a convolution arithmetic unit, a batch arithmetic unit, and an activation arithmetic unit that are sequentially connected, and the N arithmetic components respectively perform convolution, batch, and activation simultaneously on the N pixel data in the original image data. Operation, N is a positive integer less than or equal to M.
  • obtaining original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters include:
  • obtaining the original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters through the data transmission bus include:
  • obtaining the original image data includes obtaining the original image data through the first interface, and writing the original image data into the first storage unit;
  • Obtaining convolutional neural network configuration parameters includes obtaining parameter configuration instructions through the first interface, and sending the parameter configuration instructions to the parameter distribution module, where the parameter configuration instructions include convolutional neural network configuration parameters;
  • Obtaining convolutional neural network operation parameters includes obtaining convolution operation parameters, batch processing operation parameters, and activation operation parameters through the second interface, and writing the convolution operation parameters and batch processing operation parameters into the second storage unit to activate the operation The parameters are sent to the parameter distribution module.
  • it also includes:
  • the operation control module obtains the control configuration parameters in the convolutional neural network configuration parameters from the parameter distribution module;
  • the operation control module controls the acquisition of the original image data from the first interface, the parameter configuration instructions from the first interface, and the convolution operation parameters, batch operation parameters, and activation operation parameters from the second interface according to the control configuration parameters;
  • the operation control module sends the control configuration parameters in the convolutional neural network configuration parameters to the convolutional neural network operation module.
  • the convolutional neural network operation module further includes an operation control unit, and the method further includes:
  • the operation control unit receives the control configuration parameters in the convolutional neural network configuration parameters.
  • the control configuration parameters include the input or output original image size, the number of input or output channels of each layer of the convolutional neural network, and control the slave according to the control configuration parameters.
  • Read the original image data from the first storage unit read the convolution operation parameters and batch operation parameters from the second storage unit, and send the original image data, convolution operation parameters and batch operation parameters to the convolution operation unit .
  • the method also includes:
  • the operation control module obtains the activation operation parameter and the operation configuration parameter in the convolutional neural network configuration parameter from the parameter distribution module, and the operation configuration parameter includes a convolution budget configuration parameter and a convolution kernel size And pooling method;
  • the operation control module sends the activation operation parameters and the operation configuration parameters in the convolutional neural network configuration parameters to the operation control unit of the convolutional neural network operation module;
  • the operation control unit sends the activation operation parameters to the activation operation unit, and sends the convolution operation configuration parameters and the convolution kernel size to the convolution operation unit, and sends the pooling mode to the pooling unit ;or,
  • the parameter distribution module directly sends the activation operation parameters to the activation operation unit, and sends the convolution operation configuration parameters and the convolution kernel size to the convolution operation unit, and sends the pooling mode to Pooling unit.
  • the method also includes:
  • the second storage unit includes a first storage, a second storage, and a third storage.
  • the convolution operation parameters and batch processing operation parameters are written to the second storage unit, and the convolution operation is read from the second storage unit.
  • Parameters and batch operation parameters include:
  • the batch operation parameters are written into the third memory, and the batch operation parameters are read from the third memory.
  • an embodiment of the present invention also provides an image recognition processing device, including:
  • the parameter acquisition module is used to acquire original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters.
  • the original image data includes M pixel data, and M is a positive integer;
  • the convolutional neural network operation module connected to the parameter acquisition module, is used to perform convolutional neural network operations on the original image data according to the convolutional neural network configuration parameters and the convolutional neural network operation parameters.
  • the convolutional neural network operation module includes N Arithmetic components arranged in parallel, each arithmetic component includes a convolution arithmetic unit, a batch arithmetic unit, and an activation arithmetic unit connected in sequence, and the N arithmetic components simultaneously convolve the N pixel data in the original image data.
  • N is a positive integer less than or equal to M.
  • the parameter acquisition module includes a data transmission bus, a first interface, a second interface, a first storage unit, and a second storage unit;
  • the data transmission bus is used to transmit original image data, convolutional neural network configuration parameters and convolutional neural network operation parameters;
  • the first end of the first interface is connected to the data transmission bus, and the second end of the first interface is respectively connected to the parameter distribution module and the first storage unit.
  • the first interface is used to obtain original image data from the data transmission bus and write it to the first Storage unit, and obtain parameter configuration instructions from the data transmission bus, and send them to the parameter distribution module, the parameter configuration instructions include convolutional neural network configuration parameters;
  • the first end of the second interface is connected to the data transmission bus
  • the second end of the second interface is respectively connected to the parameter distribution module and the second storage unit
  • the second interface is used to obtain convolution operation parameters and batch operations from the data transmission bus.
  • Parameters and activation operation parameters write the convolution operation parameters and batch processing operation parameters into the second storage unit, and send the activation operation parameters to the parameter distribution module.
  • it also includes an arithmetic control module, which is respectively connected to the parameter distribution module and the convolutional neural network arithmetic module;
  • the arithmetic control module is used to obtain control configuration parameters in the convolutional neural network configuration parameters from the parameter distribution module; and used to control the acquisition of original image data from the first interface and the acquisition of parameter configuration instructions from the first interface according to the control configuration parameters; Obtain convolution operation parameters, batch processing operation parameters, and activation operation parameters from the second interface; and used to send the control configuration parameters in the convolution neural network configuration parameters to the convolution neural network operation module.
  • the convolutional neural network operation module further includes an operation control unit, the parameter input end of the operation control unit is connected to the control module, and the control end of the operation control unit is respectively connected to the convolution operation unit, batch operation unit and activation operation unit connection;
  • the arithmetic control unit is used to receive control configuration parameters in the convolutional neural network configuration parameters.
  • the control configuration parameters include input or output original image size, the number of input or output channels of each layer of convolutional neural network, and according to the control type
  • the configuration parameter controls the reading of the original image data from the first storage unit, and sends the original image data to the convolution operation unit, and controls the reading of the convolution operation parameters and batch operation parameters from the second storage unit, and the volume
  • the product operation parameters and batch processing operation parameters are sent to the convolution operation unit; and used to send the activation operation parameters to the activation operation unit.
  • the convolutional neural network operation module further includes a pooling unit and a write-back unit, which are respectively connected to the control end of the operation control unit; the operation control module is also used to obtain the activation operation parameters from the parameter distribution module And the calculation configuration parameters in the convolutional neural network configuration parameters, the calculation configuration parameters include convolution budget configuration parameters, convolution kernel size and pooling mode, and used to combine the activation calculation parameters with the
  • the arithmetic configuration parameters in the convolutional neural network configuration parameters are sent to the arithmetic control unit of the convolutional neural network arithmetic module, and the arithmetic control unit is also used to send activation arithmetic parameters to the activation arithmetic unit, and send the volume
  • the product operation configuration parameters, the size of the convolution kernel are sent to the convolution operation unit, and the pooling mode is sent to the pooling unit; or,
  • the parameter distribution module is directly connected to the activation operation unit, the convolution operation unit, and the pooling unit, and the parameter distribution module is configured to directly send the activation operation parameters to the activation operation unit, and send all the parameters to the activation operation unit.
  • the convolution operation configuration parameters and the convolution kernel size are sent to the convolution operation unit, and the pooling mode is sent to the pooling unit.
  • the convolutional neural network operation module also includes:
  • the image preprocessing unit is arranged between the first storage unit and the convolution operation unit, and is used to perform image filling processing on the original image data and send it to the convolution operation unit;
  • the parameter preprocessing unit is arranged between the second storage unit and the convolution operation unit, and is used for accumulating and summing the convolution operation parameters and sending them to the convolution operation unit.
  • the parameter acquisition module further includes a data reading and writing unit, which is connected to the first interface and the first storage unit, respectively, is connected to the second interface and the second storage unit, and is respectively connected to the image preprocessing unit and the parameter Connection to the preprocessing unit and to the write-back unit;
  • the data reading and writing unit is used to obtain the original image data from the first interface and write it into the first storage unit, and read the original image data from the first storage unit and send it to the image preprocessing unit;
  • the data reading and writing unit is also used to obtain convolution operation parameters and batch processing operation parameters from the second interface, and write them to the second storage unit, and read the convolution operation parameters and batch processing operation parameters from the second storage unit, and Send to the parameter preprocessing unit
  • the data reading and writing unit is also used for writing the image data after the pooling operation sent by the writing back unit to the first storage unit.
  • the second storage unit includes a first storage, a second storage, and a third storage
  • the parameter reading unit is specifically configured to write the convolution operation parameters into the first storage or the second storage, and read from the first storage.
  • the second memory reads the convolution operation parameters, and when writing the convolution operation parameters to the first memory, reads the convolution operation parameters from the second memory, or When writing the convolution operation parameter to the second memory, read the convolution operation parameter from the first memory;
  • the batch processing operation parameter is written to the third memory, and the batch processing operation parameter is read from the third memory.
  • it also includes:
  • the data temporary storage unit is connected to the convolution operation unit of each operation component, and is used to store the convolution operation result of each input channel after the operation of the convolution operation unit.
  • the data transmission bus is an advanced extensible interface AXI bus.
  • the convolutional neural network operation module when performing CNN operations, includes N arithmetic components arranged in parallel, and each arithmetic component includes a convolution operation unit connected in sequence , Batch processing operation unit and activation operation unit, and the N operation components respectively perform convolution operation, batch processing operation and activation operation on the N pixel data in the original image data at the same time, where N is less than or equal to M
  • N is less than or equal to M
  • a positive integer means that the arithmetic component can perform the above operations on N pixel data at the same time, and achieve the technical effect of performing parallel operations on the original image data, which is equivalent to sequentially performing the operations on each pixel data in a serial manner in the prior art.
  • CNN operation provides computational efficiency and improves the real-time performance of the image recognition process.
  • FIG. 1 is a schematic flowchart of an image recognition processing method provided by an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a convolutional neural network operation module provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an image recognition processing device provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of another image recognition processing device provided by an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of yet another image recognition processing device provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of storage in a memory used by a first storage unit according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a convolution operation provided by an embodiment of the invention.
  • FIG. 8 is a schematic structural diagram of a second storage unit provided by an embodiment of the present invention.
  • the convolutional neural network operation module when performing image recognition processing, needs to perform multiple operations on the original image data, and the operation process is to perform serial operations on each pixel. Points are calculated separately, which has technical problems such as low calculation efficiency and poor real-time performance.
  • FIG. 1 is a schematic flowchart of an image recognition processing method provided by an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:
  • Step 101 Obtain original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters.
  • the original image data includes M pixel data, and M is a positive integer;
  • convolutional neural network operation parameters convolutional neural network operations can include multi-layer operations, and the above configuration parameters and operation parameters are process control parameters or operation parameters required for each layer of convolutional neural network operations, for example, including convolution operations , The operation parameters used in the batch operation and activation operation.
  • a special parameter acquisition module can be set to acquire the above-mentioned original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters.
  • Step 102 The convolutional neural network operation module performs convolutional neural network operation on the original image data according to the above-mentioned convolutional neural network configuration parameters and convolutional neural network operation parameters, where the convolutional neural network operation module includes N parallel settings Arithmetic components.
  • Each arithmetic component includes a convolution operation unit, a batch processing operation unit and an activation operation unit connected in sequence.
  • the N arithmetic components respectively perform convolution operations and batch operations on the N pixel data in the original image data. Processing operation and activation operation, N is a positive integer less than or equal to M.
  • FIG. 2 is a schematic structural diagram of a convolutional neural network computing module provided by an embodiment of the present invention. As shown in FIG. 2, the module includes N computing components 21 arranged in parallel, and each computing component includes sequential The convolution operation unit 22, the batch processing operation unit 23 and the activation operation unit 24 are connected. When performing operations, each pixel data is input into one operation component, and then N operation components can process N pixel data at the same time.
  • the obtaining of original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters in step 101 may specifically be: obtaining original image data through a data transmission bus, convolutional neural network Configuration parameters and convolutional neural network operation parameters, that is, in addition to obtaining the original image data, the convolutional neural network configuration parameters and convolutional neural network operation parameters can be obtained from the data transmission bus in real-time acquisition, and through this method
  • the obtained configuration parameters and operation parameters can be adjusted in real time according to actual operation requirements to realize the programmability and configurability of convolutional neural network operations, and can also support convolutional neural network architectures of different scales.
  • the aforementioned data transmission bus may be an advanced extensible interface AXI bus, that is, the aforementioned original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters can be obtained through the AXI bus.
  • the AXI bus may be a data transmission bus that conforms to the ARM architecture, and the technical solution provided in this embodiment can be compatible with the existing ARM architecture.
  • the AXI bus is also a high-performance, high-bandwidth, and low-latency on-chip bus, which can meet the design requirements of ultra-high performance and complex system-on-chip, and meets the requirements for real-time operations of convolutional neural networks using hardware systems in the embodiments of the present invention. .
  • a dedicated interface can be set to cooperate with the above-mentioned data transmission bus to obtain the original image data and convolution from the data transmission bus.
  • the neural network configuration parameters and the convolutional neural network operation parameters, and further, the corresponding storage units can be set for the original image data and the convolutional neural network operation parameters, respectively.
  • two interfaces can be set, namely a first interface and a second interface, wherein the original image data is obtained through the first interface, and the original image data is written into the first storage unit, and the original image data is obtained through the first interface.
  • Parameter configuration instructions and send the parameter configuration instructions to the parameter distribution module.
  • the parameter configuration instructions include convolutional neural network configuration parameters. Specifically, it can match multiple (for example, 12) parameters for each layer of convolutional neural network. Configuration instructions.
  • the first interface supports at most receiving parameter configuration instructions of several layers (for example, 13 layers) at a time. After the parameter configuration instructions of a layer of convolutional neural network are used, it can receive parameter configuration instructions of a new layer of network.
  • the above-mentioned convolutional neural network configuration parameters may include, but are not limited to, the input or output original image size, the number of input or output channels of each layer of the convolutional neural network, the size of the convolution kernel, the pooling method, etc., and may also include the convolution operation process.
  • the multiplication coefficients, shift coefficients, and addition coefficients that are relatively fixed for each layer of convolutional neural network operations used in, can be called convolution operation configuration parameters.
  • Each parameter configuration command can be a 64-bit fixed length, and different bits have different meanings.
  • the parameter distribution module can parse each parameter configuration command to obtain the configuration parameters therein, and distribute them to other modules for use.
  • the convolution operation parameters, batch processing operation parameters and activation operation parameters can also be obtained through the second interface, and the convolution operation parameters and batch processing operation parameters can be written into the second storage unit, and the activation operation parameters can be sent to the parameters. Distribution module.
  • the above two types of data can be stored separately, which can be executed in the convolutional neural network operation module.
  • the original image data can be read from the first storage unit, and the convolutional neural network operation parameters can be read from the second storage unit, and the above two types of data can be sent to the convolutional neural network operation module.
  • the image recognition processing device may further include an arithmetic control module, and the above method may further include:
  • the calculation control module obtains the control configuration parameters in the convolutional neural network configuration parameters from the parameter distribution module.
  • the calculation control module can also perform the entire process of the image recognition processing method involved in the embodiment of the present invention according to the above control configuration parameters.
  • Control for example, can control the acquisition of original image data from the first interface, the acquisition of parameter configuration instructions from the first interface, and the acquisition of convolution operation parameters, batch processing operation parameters, and activation operation parameters from the second interface.
  • the operation control module sends the control configuration parameters in the convolutional neural network configuration parameters to the convolutional neural network operation module, which can control when to start the convolutional neural network operation.
  • the convolutional neural network operation module may also include an operation control unit.
  • the method provided in the embodiment of the present invention further includes: the operation control unit receives control configuration parameters in the convolutional neural network operation configuration parameters, and may be based on The control configuration parameters in the convolutional neural network operation configuration parameters control the operation process of the convolutional neural network.
  • the control configuration parameters include but are not limited to the input or output original image size, and the number of input or output channels of each layer of the convolutional neural network. Specifically, it includes controlling the reading of the original image data from the first storage unit, reading the convolution operation parameters and batch processing operation parameters from the second storage unit according to the control configuration parameters, and combining the original image data and the convolution operation parameters And batch processing operation parameters are sent to the convolution operation unit.
  • the convolution operation unit and the batch processing operation unit in each operation component can be combined into the same unit.
  • the calculation configuration parameters and activation calculation parameters in the convolutional neural network configuration parameters can be sent to each calculation unit in two ways.
  • the first method is the calculation by the calculation control module and the convolutional neural network calculation module.
  • the control unit forwards, and the second way is to distribute directly by the parameter distribution module.
  • the first method further includes:
  • the operation control module obtains the operation configuration parameters in the activation operation parameters and the convolutional neural network configuration parameters from the parameter distribution module.
  • the operation configuration parameters include convolution budget configuration parameters, convolution kernel size, and pooling mode;
  • the operation control module sends the operation configuration parameters in the activation operation parameters and the convolutional neural network configuration parameters to the operation control unit of the convolutional neural network operation module;
  • the operation control unit sends the activation operation parameters to the activation operation unit, and sends the convolution operation configuration parameters and the convolution kernel size to the convolution operation unit, and sends the pooling mode to the pooling unit.
  • the second method further includes: the parameter distribution module directly sends the convolution operation configuration parameters and the convolution kernel size to the convolution operation unit, and sends the activation operation parameters to the activation operation unit, and sends the pooling mode to the pooling unit. .
  • the convolution operation unit can perform convolution operations according to the convolution operation configuration parameters and the size of the convolution kernel, and the activation operation unit can The activation operation is performed according to the activation operation parameters, and the pooling unit can perform the pooling operation according to the pooling mode, that is, the convolutional neural network operation may further include a pooling operation step and a write-back step, that is, the above method may further include: pooling The conversion unit performs a pooling operation on the image data processed by the activation operation according to the pooling mode, and writes the image data after the pooling operation back to the first storage unit.
  • the image data processed by the operation of the layer can be written back to the first storage unit to perform the convolutional neural network operation of the next layer.
  • the above-mentioned N arithmetic components can also be used for parallel processing to improve the operation efficiency of the convolutional neural network.
  • the convolutional neural network operation may further include a preprocessing step, which includes performing image filling processing on the original image data, that is, performing image filling processing on the original image data read from the first storage unit and then sending it.
  • the image filling process in this step can be to fill the upper, lower, left, and right borders of the image to meet the requirements of the convolution kernel size for image data during the convolution operation; and to obtain from the second storage unit
  • the convolution operation parameters of is accumulated and summed and sent to the convolution operation unit.
  • the initially acquired convolution operation parameters may also be sent to the convolution operation unit.
  • the convolution operation parameters and batch processing operation parameters are stored in the second storage unit, and the second storage unit may include a first memory, a second memory, and a third memory.
  • Writing the convolution operation parameters and batch processing operation parameters to the second storage unit, and reading the convolution operation parameters and batch processing operation parameters from the second storage unit may specifically include:
  • the convolution operation parameter into the first memory or the second memory, and read the convolution operation parameter from the first memory or the second memory, and when writing the convolution operation parameter to the first memory,
  • the convolution operation parameter is read from the second memory, or, when the convolution operation parameter is written to the second memory, the convolution operation parameter is read from the first memory. That is, a ping-pong operation method is adopted to alternately write the convolutional neural network operation parameters to the first memory and the second memory, and alternately read the convolutional neural network operation parameters from the first memory and the second memory to improve the parameters Reading and writing efficiency improves the real-time performance of the entire image recognition process.
  • the batch processing operation parameters may be written into the third memory, and the batch processing operation parameters may be read from the third memory.
  • FIG. 3 is a schematic structural diagram of an image recognition processing device provided by an embodiment of the present invention.
  • the image recognition processing device includes a parameter acquisition module 1 and a convolutional neural network operation module 2.
  • the above-mentioned parameters Acquisition module 1 is used to acquire original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters.
  • the original image data includes M pixel data, where M is a positive integer; convolutional neural network operation module 2 and parameter acquisition module 1 connection, used to perform convolutional neural network operations on the original image data according to the convolutional neural network configuration parameters and the convolutional neural network operation parameters; among them, refer to the above-mentioned figure 2 shown, wherein the convolutional neural network operation module 2 includes N arithmetic components 21 arranged in parallel, each arithmetic component 21 includes a convolution arithmetic unit 22, a batch arithmetic unit 23, and an activation arithmetic unit 24 that are connected in sequence.
  • the N arithmetic components 21 are used to separately compare N in the original image data.
  • Each pixel data is subjected to convolution operation, batch processing operation and activation operation at the same time, and N is a positive integer less than or equal to M.
  • the convolutional neural network operation module includes N operation components arranged in parallel.
  • each pixel data is input into one operation component, and the N operation components can be simultaneously Process N pixel data so that N pixel data can perform convolution operation, batch operation and activation operation in the corresponding arithmetic component at the same time, to achieve the effect of parallel operation, and then to improve the operation efficiency of convolutional neural network operation , Improve the real-time performance of the image recognition process.
  • FIG. 4 is a schematic structural diagram of another image recognition processing device provided by an embodiment of the present invention.
  • the device further includes a parameter distribution module 3 and a parameter acquisition module 1. It includes a data transmission bus 11, a first interface 12, a second interface 13, a first storage unit 14 and a second storage unit 15.
  • the data transmission bus 11 is used to transmit original image data, convolutional neural network configuration parameters, and convolutional neural network operation parameters; the first end of the first interface 12 is connected to the data transmission bus 11, and the second end of the first interface 12 They are respectively connected to the parameter distribution module 3 and the first storage unit 14.
  • the first interface 12 is used to obtain original data from the data transmission bus 11 and write to the first storage unit 14, and obtain parameter configuration instructions from the data transmission bus 11, and Sent to the parameter distribution module 3, the parameter configuration instruction includes convolutional neural network configuration parameters; the first end of the second interface 13 is connected to the data transmission bus 11, and the second end of the second interface 13 is respectively connected to the parameter distribution module 3 and the second end
  • the second storage unit 15 is connected, and the second interface 13 is used to obtain convolution operation parameters, batch processing operation parameters, and activation operation parameters from the data transmission bus 11, and write the convolution operation parameters and batch processing operation parameters to the second storage unit In 15, the activation operation parameters are sent to the parameter distribution module 3.
  • both the first storage unit 14 and the second storage unit 15 may use random access memories.
  • the aforementioned data transmission bus may be an advanced extensible interface AXI bus.
  • the AXI bus may be a data transmission bus that conforms to the ARM architecture, and the technical solution provided in this embodiment can be compatible with the existing ARM architecture.
  • the AXI bus is also a high-performance, high-bandwidth, and low-latency on-chip bus, which can meet the design requirements of ultra-high performance and complex system-on-chip, and meets the requirements for real-time operations of convolutional neural networks using hardware systems in the embodiments of the present invention. .
  • it may further include an arithmetic control module 4, and the arithmetic control module 4 is respectively connected to the parameter distribution module 3 and the convolutional neural network arithmetic module 2;
  • the arithmetic control module 4 is used to obtain the control configuration parameters in the convolutional neural network configuration parameters from the parameter distribution module 3, and can control the entire flow of the image recognition processing involved in the embodiment of the present invention according to the control configuration parameters.
  • the operation control module 4 is used to control the acquisition of original image data from the first interface, the acquisition of parameter configuration instructions from the first interface, and the acquisition of convolution operation parameters, batch processing operation parameters, and activation operation parameters from the second interface; and To send the control configuration parameters in the convolutional neural network configuration parameters to the convolutional neural network operation module 2.
  • the operation control module 4 sends the control configuration parameters in the convolutional neural network configuration parameters to the convolutional neural network operation module 2, which can control the convolutional neural network operation process and control when to start the convolutional neural network operation .
  • the convolutional neural network operation module 2 further includes an operation control unit 25.
  • the parameter input terminal of the operation control unit 25 is connected to the operation control module 4, and the control terminals of the operation control unit 25 are respectively Connected to the convolution operation unit 22, the batch processing operation unit 23 and the activation operation unit 24; the operation control unit 25 is used to receive control configuration parameters in the convolutional neural network configuration parameters, the control configuration parameters include input or output The size of the original image, the number of input or output channels of each layer of the convolutional neural network, and the control type configuration parameters in the convolutional neural network configuration parameters to control the reading of the original image data from the first storage unit 14, and the original image
  • the data is sent to the convolution operation unit 22, which controls the reading of convolution operation parameters and batch operation parameters from the second storage unit 15, and sends the above convolution operation parameters and batch operation parameters to the convolution operation unit 22.
  • the convolution operation unit 22 and the batch processing operation unit 23 in each operation component they can be combined into the same unit,
  • the convolutional neural network operation module 2 further includes a pooling unit 26 and a write-back unit 27, which are respectively connected to the control end of the operation control unit 25.
  • the activation operation parameters and the operation configuration parameters in the convolutional neural network configuration parameters can be distributed to each operation unit in two ways.
  • the first method is distributed by the operation control module 4 and the operation control unit 25 of the convolutional neural network operation module 2 in turn, that is, the operation control module 4 is also used to obtain the activation operation parameters and the convolutional neural network configuration from the parameter distribution module 3.
  • the arithmetic configuration parameters include convolution budget configuration parameters, convolution kernel size and pooling mode, and are used to send the activation calculation parameters and the calculation configuration parameters in the convolutional neural network configuration parameters to The operation control unit 25 of the convolutional neural network operation module 2.
  • the operation control unit 25 is also used to send the activation operation parameters to the activation operation unit 24, and send the convolution operation configuration parameters and the size of the convolution kernel to the convolution operation unit 22 , And send the pooling mode to the pooling unit 26.
  • the second method is to distribute directly by the parameter distribution module, that is, the parameter distribution module 3 can also be directly connected to the activation operation unit 24, the convolution operation unit 22, and the pooling unit 26.
  • the parameter distribution module 3 is used to directly distribute the activation operation parameters. It is sent to the activation operation unit 24, and the convolution operation configuration parameter and the convolution kernel size are sent to the convolution operation unit 22, and the pooling mode is sent to the pooling unit 26.
  • the convolution operation unit 22 can perform convolution operations according to the convolution operation configuration parameters and the size of the convolution kernel to activate the operation unit 24 can perform the activation operation according to the activation operation parameters, and the pooling unit 26 can perform the pooling operation according to the pooling mode.
  • the pooling unit 26 is configured to perform a pooling operation on the image data processed by the activation operation according to the pooling mode in the convolutional neural network configuration parameters, and control the write-back unit 27 to write back the image data after the pooling operation.
  • the image data processed by the operation of the layer can be written back to the first storage unit to perform the operation of the convolutional neural network of the next layer.
  • the above-mentioned N arithmetic components can also be used for parallel operation during the operation of the product neural network.
  • FIG. 5 is a schematic structural diagram of another image recognition processing device provided by an embodiment of the present invention.
  • the convolutional neural network operation module 2 of the device further includes an image preprocessing unit 28 and a parameter preprocessing unit 29.
  • the image preprocessing unit 28 is arranged between the first storage unit 14 and the convolution operation unit 22, and is used to perform image filling processing on the original image data and then send it to the convolution operation unit 22, where the image filling processing can be It fills the upper, lower, left, and right boundaries of the image to meet the requirements of the convolution kernel size for image data during the convolution operation;
  • the parameter preprocessing unit 29 is arranged between the second storage unit 15 and the convolution operation unit 22 for After accumulating and summing the convolution operation parameters, the convolution operation unit 22 is sent. Specifically, the initially acquired convolution operation parameters may also be sent to the convolution operation unit.
  • the parameter acquisition module 1 may also include a data reading and writing unit 16.
  • the data reading and writing unit 16 is connected to the first interface 12 and the first storage unit 14, respectively, and is connected to the second interface 12 and the first storage unit 14, respectively.
  • the interface 13 is connected to the second storage unit 15, respectively connected to the image preprocessing unit 28 and the parameter preprocessing unit 29, and connected to the write-back unit 27.
  • the data read-write unit 16 is used to obtain the original image data from the first interface 12 , And write to the first storage unit 14, and read the original image data from the first storage unit 14, and send it to the image preprocessing unit 28; and the data reading and writing unit 16 is also used to obtain the convolution from the second interface 13
  • the operation parameters and batch processing operation parameters are written to the second storage unit 15, and the convolution operation parameters and batch processing operation parameters are read from the second storage unit 15 and sent to the parameter preprocessing unit 29, the data reading and writing unit 16 is also used to write the image data after the pooling operation sent by the write-back unit 27 to the first storage unit 14.
  • the data reading and writing unit 16 in the embodiment of the present invention can classify the reading and writing commands of the first storage unit 14 and the second storage unit 15, and the original image data received from the first interface 12 will pass through the data reading
  • the writing unit 16 writes to the first storage unit 14.
  • the image preprocessing unit 28 can read the first storage unit 14 through the data reading and writing unit 16, and the convolution operation parameters and batch operation parameters received by the second interface are also changed.
  • the data reading and writing unit 16 is used to write into the second storage unit 15, and the parameter preprocessing unit 29 can also be used to read the second storage unit 15 through the data reading and writing unit 16.
  • the write-back unit 27 can also write the image data of each output channel after the calculation is completed into the first storage unit 14 through the data read-write unit 16.
  • the first interface may be an active write type interface, that is, the data transmission bus actively writes original image data and convolutional neural network configuration parameters through the first interface; and the second interface may be passive.
  • the read type interface that is, the convolution operation parameter, the batch operation parameter, and the activation operation parameter need to be read through the second interface.
  • the first storage unit 14 may store original image data and image data after operation.
  • the depth of the memory used by the first storage unit 14 can be 32K
  • the bit width can be 64Byte, which means that each row of the memory used by the first storage unit 14 can store 64 pixel data. If the width of the image data exceeds 64 pixels, one line of the original image data will be stored in multiple lines in the memory used by the first storage unit 14.
  • the 64 pixel data stored in the first storage unit can participate in the operation in the convolutional neural network operation module at the same time, that is to say, the convolutional neural network operation module includes 64 operation components.
  • the 64 pixel data in the memory used by a storage unit 14 will participate in the convolution operation, batch operation, and activation operation of each arithmetic component, respectively.
  • the memory bit width used by the first storage unit 14 can be expanded from 64Byte to 66Byte, and 66Byte is defined as B0, B1, B2...B65, where B1-B64 still stores Each row of original image data 64Byte, B0 stores B64 of the previous row in the memory used by the first storage unit 14, and B65 stores B1 of the next row in the memory used by the first storage unit 14.
  • the schematic diagram of storage in the memory used by the first storage unit 14 is shown in FIG.
  • FIG. 6 where D idx represents a pixel, and the meaning is the same as B0 to B65.
  • Figure 7 is a schematic diagram of a convolution operation provided by an embodiment of the invention.
  • the convolution kernel size is 3x3 as an example. That is, the sliding window size in the convolution operation is 3x3, and the convolution operation is A 3x3 sliding window is a unit, the pixels in the sliding window and the convolution operation parameters are multiplied one by one and then accumulated and summed, and then the convolution operation in other steps is performed.
  • FIG. 8 is a schematic structural diagram of a second storage unit provided by an embodiment of the present invention.
  • the second storage unit 15 in an embodiment of the present invention may include a first memory 151, a second memory 152, and The third memory 153, the data reading and writing unit 16 is specifically configured to write the convolution operation parameters into the first memory 151 or the second memory 152, and read the convolution operation parameters from the first memory 151 or the second memory 152 , And when the convolution operation parameter is written to the first memory 151, the convolution operation parameter is read from the second memory 152, or when the convolution operation parameter is written to the second memory 152, the convolution operation parameter is read from the first memory 151 Reading the convolution operation parameters in the middle, that is, using a ping-pong operation method, alternately writing the convolutional neural network operation parameters to the first memory 151 and the second memory 152, and alternately reading the convolutional neural network operation parameters from the first memory 151 and the second memory 152.
  • the aforementioned data reading and writing unit 16 is also used to write batch processing operation parameters to the third memory 153 and read batch processing operation parameters from the third memory 153.
  • the image preprocessing unit 28 first initiates a data reading request to the data reading and writing unit 16, and the data reading request is sent by the data unit 16 to the first storage unit 14.
  • the image preprocessing unit 28 The original image data fed back is subjected to image filling processing and sent to the convolution operation unit 22 to meet the requirement of the convolution kernel size for image data in the convolution operation process, for example, the convolution kernel size is 3 ⁇ 3.
  • the parameter preprocessing unit 29 initiates a parameter reading request to the data reading and writing unit 16. The parameter reading request is sent by the data unit 16 to the second storage unit 15 to obtain convolution operation parameters.
  • the parameter preprocessing unit 29 After the operation parameters are accumulated and summed, the accumulated sum result and the read convolution operation parameters are sent to the convolution operation unit 22 at the same time. In addition, the batch operation parameters are also obtained, and the batch operation parameters are also sent to the convolution operation unit 22.
  • the convolution operation unit 22 receives the preprocessed image data and convolution operation parameters, and at the same time obtains the multiplication coefficients, shift coefficients, and addition coefficients used in the convolution operation process sent by the operation control unit 24 or the parameter distribution module 3 Then, the convolution operation can be performed, and the specific operation process includes: multiplying the pixel point data in the convolution sliding window by the corresponding position of the convolution operation parameter, and then accumulating and summing.
  • the specific implementation process is as follows: the pixel data in the convolution sliding window is multiplied by the corresponding position of the convolution operation parameter one by one, and the cumulative sum sum0 is calculated for all the multiplication results.
  • the sliding window size is the convolution kernel size. Refer to Figure 7 .
  • the sliding window pixel data accumulation and the convolution operation parameter accumulation and the multiplication and shift operations are carried out respectively to obtain the accumulation sum sum1 and sum2, and finally sum0, sum1, sum2 and the offset of the addition coefficient are summed, and finally the volume is obtained.
  • the multiplication coefficient, the shift coefficient and the addition coefficient can be obtained from the operation control unit 24 or the parameter distribution module 3 with reference to the foregoing.
  • the batch processing operation unit 23 should normalize the convolution operation result, where the mean value is 0 and the variance is 1.
  • the specific implementation process is to multiply the convolution operation result by the batch multiplication coefficient, after shifting and accumulating with the addition offset, the batch processing result is obtained, where the multiplication coefficient, the shift coefficient, and the addition coefficient are all determined by the second
  • the interface reads the data transmission bus to obtain and stores it in the second storage unit 16. Since the batch operation directly processes the convolution operation result, the convolution operation unit 22 and the batch operation unit 23 can be combined in the same functional unit.
  • the activation calculation unit 24 can perform activation calculations.
  • This unit can implement segmented non-linear mapping of input data, that is to say, compare the input data with several (for example, 16) activation parameters to find the activation parameter closest to the input data. Find the difference between the two, and then perform multiplication, shift, and addition operations on the difference in sequence.
  • the 16 comparison coefficients, multiplication coefficients, shift coefficients, and addition coefficients are all obtained by reading the data transmission bus through the second interface It is sent to the parameter distribution module 3, and is directly distributed by the parameter distribution module 3, or finally sent by the operation control unit 24 to the activation operation unit.
  • the convolution operation unit 22, the batch processing unit 23, and the activation unit 24 that are connected in sequence may constitute an operation component. A total of N operation components can be set to realize the calculation of N pixels at the same time.
  • the data undergoes convolutional neural network operations to achieve the technical effect of parallel operations and improve the operational efficiency of the image recognition process.
  • the pooling unit 26 can perform pooling operations, shrink the input image, and retain important information. It can calculate the maximum or average value in the 2x2 or 4x4 pixel area, and what kind of pooling is required for each layer of convolutional neural network The type will be determined by configuration parameters. Specifically, the pooling type may include, but is not limited to, average value, maximum value, upper left value, and upper right value in a 2x2 pixel area, and average value, maximum value, and upper left value in a 4x4 pixel area.
  • the data temporary storage unit 20 is connected to each convolution operation unit 22, and can temporarily store the convolution operation result of each input channel, that is to say, after the convolution operation of the current input channel is completed, the convolution operation unit 22 can store The result of the previous convolution operation is read from the data temporary storage unit 20, and then accumulated with the convolution operation result of the current input channel, and then sent to the data temporary storage unit 20 for temporary storage.
  • the write-back unit 27 can write the image data of each output channel after convolution operation, batch operation, activation operation, and pooling processing back to the first storage unit 14. At this time, the write request will be sent to the data first. The reading and writing unit 16 is then sent to the first storage unit 14. In addition, the write-back unit 27 also writes the image data of each output channel to the data transmission bus, and then reads the data transmission bus in a direct memory access mode.
  • the operation efficiency can be significantly improved. Improve the real-time performance of image recognition processing methods.
  • the first interface and the second interface connected to the data transmission bus are separated from the internal convolutional neural network operation module, which is convenient for the convolutional neural network operation module. Upgrade and optimize.
  • each module for example, when designing the first interface, the first storage unit, and the arithmetic control module, different original image sizes and convolutional neural networks can be fully considered.
  • the requirements of the number of operational layers and the size of the convolution kernel can support convolutional neural network architectures of different scales.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the addition and multiplication operations can be implemented by hardware such as adders and multipliers.
  • some logic controllers can be added to implement basic logic control.
  • the units or modules involved in the implementation manner described in the embodiments of the present invention may be implemented in a software manner, or may be implemented in a hardware manner.
  • the described units or modules may also be provided in the processor, and the names of these units or modules do not constitute a limitation on the units or modules themselves under certain circumstances.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement without creative work.
  • each implementation manner can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solution essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic A disc, an optical disc, etc., include a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种图像识别处理方法和装置。该方法包括:获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,原始图像数据包括M个像素点数据 (101);由卷积神经网络运算模块(2)根据卷积神经网络配置参数和卷积神经网络运算参数对所述原始图像数据进行卷积神经网络运算,其中,卷积神经网络运算模块(2)包括N个并行设置的运算组件(21),每个运算组件(21)包括依次连接的卷积运算单元(22)、批处理运算单元(23)和激活运算单元(24),所述N个运算组件(21)分别对所述原始图像数据中的N个像素点数据同时进行卷积运算、批处理运算和激活运算,N为小于或等于M的正整数(102)。所述方法提升了图像识别处理的实时性。

Description

图像识别处理方法和装置 技术领域
本发明属于数据处理领域,具体涉及一种图像识别处理方法和装置。
背景技术
卷积神经网络(Convolutional Neural Network,简称CNN)最早由Yann Lecun提出,并且应用于手写数字识别并一直保持了其在该领域的霸主地位。近年来卷积神经网络在多个方向持续发力,在语音识别、人脸识别、通用物体识别、运动分析、自然语言处理甚至脑电波分析方面均有突破。CNN可以被扩大规模并且被配置成支持对学习处理的数据集添加标签。在这些条件下,已经发现CNN在学习复杂并且鲁棒的图像特征方面是成功的。
CNN是一种前馈人工神经网络,其中各个单独的神经元以使得其对视场中的重叠区段作出响应的方式被拼贴。CNN是受到生物的视神经行为的启发。CNN利用多层神经元连接处理图像数据,从而在图像识别中实现高准确度。
单一处理器在计算能力方面受到限制,因此,需要对其他计算配置进行探索以便满足支持CNN的需求。在各个探索领域当中,对于以图形处理单元(GPU)、多核处理器、现场可编程门阵列(FPGA)和专用集成电路(ASIC)上的通用计算的形式,利用硬件专门化的CNN加速器进行实现。需要说明的是,因为软件无法满足图像数据处理速度方面的需求,所以在图像数据处理领域,CNN加速器是通过硬件来实现的。
现有技术中在进行图像识别处理时,其中的CNN运算需要对图像数据进行多次运算,且运算过程采用串行方式进行,存在运算效率低,实时性差的技术问题。
发明内容
针对现有技术中的图像识别处理方法实时性较差的问题,本发明提供了一种图像识别处理方法和装置,实现卷积神经网络运算的并行处理,提升图像识别处理的实时性。
为了达到上述目的,本发明的技术方案是这样实现的:
第一方面,本发明实施例提供了一种图像识别处理方法,包括:
获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,原始图像数据包括M个像素点数据,M为正整数;
由卷积神经网络运算模块根据卷积神经网络配置参数和卷积神经网络运算参数对原始图像数据进行卷积神经网络运算,其中,卷积神经网络运算模块包括N个并行设置的运算组件,每个运算组件包括依次连接的卷积运算单元、批处理运算单元和激活运算单元,所述N个运算组件分别对原始图像数据中的N个像素点数据同时进行卷积运算、批处理运算和激活运算,N为小于或等于M的正整数。
可选的,获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数包括:
通过数据传输总线获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数。
可选的,通过数据传输总线获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数包括:
通过先进可扩展接口AXI总线获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数。
可选的,获取原始图像数据包括通过第一接口获取原始图像数据,并将原始图像数据写入到第一存储单元中;
获取卷积神经网络配置参数包括通过第一接口获取参数配置指令,并将参数配置指令发送给参数分发模块,参数配置指令包括卷积神经网络配置参数;
获取卷积神经网络运算参数包括通过第二接口获取卷积运算参数、批处理运算参数和激活运算参数,并将卷积运算参数和批处理运算参数写入到第二存储单元中,将激活运算参数发送给参数分发模块。
可选的,还包括:
运算控制模块从参数分发模块获取卷积神经网络配置参数中的控制类配置参数;
运算控制模块根据控制类配置参数控制从第一接口获取原始图像数据、从第一接口获取参数配置指令以及从第二接口获取卷积运算参数、批处理运算参数和激活运算参数;
运算控制模块将卷积神经网络配置参数中的控制类配置参数发送给卷积神经网络运算模块。
可选的,卷积神经网络运算模块还包括运算控制单元,方法还包括:
运算控制单元接收卷积神经网络配置参数中的控制类配置参数,控制类配置参数包括输入或者输出原始图像尺寸、每层卷积神经网络的输入或者输出通道数目,并根据控制类配置参数控制从第一存储单元中读取原始图像数据,从第二存储单元中读取卷积运算参数和批处理运算参数,并将原始图像数据、卷积运算参数和批处理运算参数发送给卷积运算单元。
可选的,方法还包括:
所述运算控制模块从所述参数分发模块获取所述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数,所述运算类配置参数包括卷积预算配置参数、卷积内核大小和池化方式;
所述运算控制模块将所述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数发送给所述卷积神经网络运算模块的运算控制单元;
所述运算控制单元将激活运算参数发送给激活运算单元,以及将所述卷积运算配置参数、所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元;或者,
所述参数分发模块直接将所述激活运算参数发送给激活运算单元,以及将 所述卷积运算配置参数和所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元。
可选的,方法还包括:
对从第一存储单元中读取的原始图像数据进行图像填充处理后发送给卷积运算单元;以及,对从第二存储单元中获取的卷积运算参数进行累加求和处理后发送给卷积运算单元。
可选的,第二存储单元包括第一存储器、第二存储器和第三存储器,将卷积运算参数和批处理运算参数写入到第二存储单元,以及从第二存储单元读取卷积运算参数和批处理运算参数包括:
将卷积运算参数写入到第一存储器或第二存储器中,以及从第一存储器或第二存储器中读取卷积运算参数,且在向第一存储器写入卷积运算参数时,从第二存储器中读取卷积运算参数,或者,在向第二存储器写入卷积运算参数时,从第一存储器中读取卷积运算参数;
将批处理运算参数写入到第三存储器中,以及从第三存储器中读取批处理运算参数。
第二方面,本发明实施例还提供了一种图像识别处理装置,包括:
参数获取模块,用于获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,原始图像数据包括M个像素点数据,M为正整数;
卷积神经网络运算模块,与参数获取模块连接,用于根据卷积神经网络配置参数和卷积神经网络运算参数对原始图像数据进行卷积神经网络运算,其中,卷积神经网络运算模块包括N个并行设置的运算组件,每个运算组件包括依次连接的卷积运算单元、批处理运算单元和激活运算单元,所述N个运算组件分别对原始图像数据中的N个像素点数据同时进行卷积运算、批处理运算和激活运算,N为小于或等于M的正整数。
可选的,还包括参数分发模块,参数获取模块包括数据传输总线、第一接口、第二接口、第一存储单元和第二存储单元;
数据传输总线用于传输原始图像数据、卷积神经网络配置参数和卷积神经 网络运算参数;
第一接口的第一端与数据传输总线连接,第一接口的第二端分别与参数分发模块和第一存储单元连接,第一接口用于从数据传输总线获取原始图像数据并写入第一存储单元,以及从数据传输总线获取参数配置指令,并发送给参数分发模块,参数配置指令包括卷积神经网络配置参数;
第二接口的第一端与数据传输总线连接,第二接口的第二端分别与参数分发模块和第二存储单元连接,第二接口用于从数据传输总线获取卷积运算参数、批处理运算参数和激活运算参数,并将卷积运算参数和批处理运算参数写入到第二存储单元中,将激活运算参数发送给参数分发模块。
可选的,还包括运算控制模块,运算控制模块分别与参数分发模块和卷积神经网络运算模块连接;
运算控制模块用于从参数分发模块获取卷积神经网络配置参数中的控制类配置参数;以及用于根据控制类配置参数控制从第一接口获取原始图像数据、从第一接口获取参数配置指令以及从第二接口获取卷积运算参数、批处理运算参数和激活运算参数;以及用于将卷积神经网络配置参数中的控制类配置参数发送给卷积神经网络运算模块。
可选的,卷积神经网络运算模块还包括运算控制单元,运算控制单元的参数输入端与控制模块连接,以及运算控制单元的控制端分别与卷积运算单元、批处理运算单元和激活运算单元连接;
运算控制单元用于接收卷积神经网络配置参数中的控制类配置参数,所述控制类配置参数包括输入或者输出原始图像尺寸、每层卷积神经网络的输入或者输出通道数目,并根据控制类配置参数控制从第一存储单元中读取原始图像数据,并将该原始图像数据发送给卷积运算单元,控制从第二存储单元中读取卷积运算参数和批处理运算参数,并将卷积运算参数和批处理运算参数发送给卷积运算单元;以及用于将激活运算参数发送给激活运算单元。
可选的,卷积神经网络运算模块还包括池化单元和写回单元,分别与运算控制单元的控制端连接;所述运算控制模块还用于从所述参数分发模块获取所 述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数,所述运算类配置参数包括卷积预算配置参数、卷积内核大小和池化方式,以及用于将所述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数发送给所述卷积神经网络运算模块的运算控制单元,所述运算控制单元还用于将激活运算参数发送给激活运算单元,以及将所述卷积运算配置参数、所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元;或者,
所述参数分发模块直接与所述激活运算单元、所述卷积运算单元和所述池化单元连接,所述参数分发模块用于直接将所述激活运算参数发送给激活运算单元,以及将所述卷积运算配置参数和所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元。
可选的,卷积神经网络运算模块还包括:
图像预处理单元,设置在第一存储单元和卷积运算单元之间,用于对原始图像数据进行图像填充处理后发给卷积运算单元;
参数预处理单元,设置在第二存储单元和卷积运算单元之间,用于对卷积运算参数进行累加求和后发送给卷积运算单元。
可选的,参数获取模块还包括数据读写单元,数据读写单元分别与第一接口和第一存储单元连接,分别与第二接口和第二存储单元连接,分别与图像预处理单元和参数预处理单元连接,以及与写回单元连接;
数据读写单元用于从第一接口获取原始图像数据,并写入到第一存储单元,以及从第一存储单元读取原始图像数据,并发送给图像预处理单元;
数据读写单元还用于从第二接口获取卷积运算参数和批处理运算参数,并写入到第二存储单元,以及从第二存储单元读取卷积运算参数和批处理运算参数,并发送给参数预处理单元
数据读写单元还用于将写回单元发送的池化运算后的图像数据写入到第一存储单元。
可选的,第二存储单元包括第一存储器、第二存储器和第三存储器,参数读取单元具体用于将卷积运算参数写入到第一存储器或第二存储器中,以及从 第一存储器或第二存储器读取所述卷积运算参数,且在向所述第一存储器写入所述卷积运算参数时,从所述第二存储器中读取所述卷积运算参数,或者,在向所述第二存储器写入所述卷积运算参数时,从所述第一存储器中读取所述卷积运算参数;
将所述批处理运算参数写入到所述第三存储器,以及从第三存储器中读取所述批处理运算参数。
可选的,还包括:
数据暂存单元,与每个运算组件的卷积运算单元连接,用于存储卷积运算单元运算后的每个输入通道的卷积运算结果。
可选的,数据传输总线为先进可扩展接口AXI总线。
本发明实施例提供的技术方案,其中在图像识别处理过程中,在进行CNN运算时,卷积神经网络运算模块包括N个并行设置的运算组件,每个运算组件包括依次连接的卷积运算单元、批处理运算单元和激活运算单元,而所述N个运算组件分别对原始图像数据中的N个像素点数据同时进行卷积运算、批处理运算和激活运算,其中N为小于或等于M的正整数,即可以实现由运算组件同时对N个像素点数据进行上述运算,达到对原始图像数据进行并行运算的技术效果,相当于现有技术中以串行方式依次对每个像素点数据进行CNN运算,提供了运算效率,提升了图像识别处理过程的实时性。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本发明。
附图说明
结合附图,通过以下非限制性实施方式的详细描述,本发明的其它特征、目的和优点将变得更加明显。
在附图中:
图1为本发明实施例提供的图像识别处理方法的流程示意图;
图2为本发明实施例提供的卷积神经网络运算模块的结构示意图;
图3为本发明实施例提供的一种图像识别处理装置的结构示意图;
图4为本发明实施例提供的另一种图像识别处理装置的结构示意图;
图5为本发明实施例提供的再一种图像识别处理装置的结构示意图;
图6为本发明实施例提供的第一存储单元所使用的存储器中的存储示意图;
图7为发明实施例提供的卷积运算示意图;
图8为本发明实施例提供的第二存储单元的结构示意图。
具体实施方式
下文中,将参考附图详细描述本发明的示例性实施方式,以使本领域技术人员可容易地实现它们。此外,为了清楚起见,在附图中省略了与描述示例性实施方式无关的部分。
在本发明中,应理解,诸如“包括”或“具有”等的术语旨在指示本说明书中所公开的特征、数字、步骤、行为、部件、部分或其组合的存在,并且不欲排除一个或多个其他特征、数字、步骤、行为、部件、部分或其组合存在或被添加的可能性。
另外还需要说明的是,在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
如背景技术中所述,现有技术中在进行图像识别处理时,其中的卷积神经网络运算模块需要对原始图像数据进行多次运算,而其中的运算过程是采用串行方式对每个像素点分别进行运算,存在运算效率低,实时性差的技术问题。
针对现有技术中的图像识别过程中,卷积神经网络运算时存在的上述技术问题,本发明提供了一种图像识别处理方法。图1为本发明实施例提供的图像识别处理方法的流程示意图,如图1所示,该方法包括如下步骤:
步骤101、获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,上述原始图像数据包括M个像素点数据,M为正整数;
具体的,本步骤中是在利用卷积神经网络进行图像识别,以识别图像中的人、动物或车辆等特征信息时,首先需要获取原始图像数据,并进一步获取卷积神经网络配置参数和卷积神经网络运算参数,卷积神经网络运算可以包括多层运算,而上述的配置参数和运算参数是进行每层卷积神经网络运算所需要的流程控制参数或运算参数,例如包括进行卷积运算、批处理运算和激活运算时所使用的运算参数,具体的,本步骤中可以设置专门的参数获取模块来获取上述的原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数。
步骤102、由卷积神经网络运算模块根据上述卷积神经网络配置参数和卷积神经网络运算参数对原始图像数据进行卷积神经网络运算,其中,卷积神经网络运算模块包括N个并行设置的运算组件,每个运算组件包括依次连接的卷积运算单元、批处理运算单元和激活运算单元,所述N个运算组件分别对原始图像数据中的N个像素点数据同时进行卷积运算、批处理运算和激活运算,N为小于或等于M的正整数。
具体的,在上述步骤101中获取到原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数的基础上,将原始图像数据输入到卷积神经网络运算模块中进行运算。参照图2所示,图2为本发明实施例提供的卷积神经网络运算模块的结构示意图,如图2所述,该模块包括N个并行设置的运算组件21,且每个运算组件包括依次连接的卷积运算单元22、批处理运算单元23和激活运算单元24,在进行运算时,每个像素点数据输入到一个运算组件中,则N个运算组件可以同时处理N个像素点数据,以使得各像素点数据同时在对应的运算组件中进行卷积运算、批处理运算和激活运算,实现并行运算的效果,进而能够提高卷积神经网络运算的运算效率,提升了图像识别过程的实时性。
在图1所示的实施例中,其中的步骤101中获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数可以具体为:通过数据传输总线获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,即除了获取原始图像数据外,还可以采用实时获取的方式从数据传输总线中获取到卷积神经网络配置参数和卷积神经网络运算参数,且通过该方式获取的配置参数和运算 参数,可以根据实际运算需求进行实时调整,实现卷积神经网络运算的可编程性和可配置性,同时也可以支持不同规模的卷积神经网络架构。
进一步的,上述的数据传输总线可以为先进可扩展接口AXI总线,即可以通过AXI总线获取上述的原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数。具体的,该AXI总线可以是一种符合ARM架构的数据传输总线,本实施例提供的技术方案能够兼容现有的ARM架构。AXI总线也是一种面向高性能、高带宽、低延迟的片内总线,能够满足超高性能和复杂的片上系统设计需求,符合本发明实施例中利用硬件系统实现卷积神经网络实时运算的需求。
在利用上述的数据传输总线(例如AXI总线)传输原始图像数据和卷积神经网络运算参数时,可以设置专用的接口与上述数据传输总线配合,以从数据传输总线上获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,并且进一步的,可以分别为原始图像数据和卷积神经网络运算参数设置对应的存储单元。
具体的,可以设置两个接口,即第一接口和第二接口,其中通过第一接口获取原始图像数据,并将所述原始图像数据写入到第一存储单元中,以及通过第一接口获取参数配置指令,并将参数配置指令发送给参数分发模块,该参数配置指令中包括卷积神经网络配置参数,具体的,可以是为每一层卷积神经网络匹配多条(例如12条)参数配置指令,该第一接口最多支持一次接收若干层(例如13层)网络的参数配置指令,当一层卷积神经网络的参数配置指令使用完毕后,可以接收新一层网络的参数配置指令。上述的卷积神经网络配置参数可以包括但不限于输入或者输出原始图像尺寸、每层卷积神经网络的输入或者输出通道数目、卷积内核大小、池化方式等,还可以包括卷积运算过程中所使用的对于每层卷积神经网络运算而言相对固定的乘法系数、移位系数和加法系数,即可以称为卷积运算配置参数。每一条参数配置指令可以是64bit定长,不同bit代表不同的含义,参数分发模块可以解析每条参数配置指令以获取其中的配置参数,分发给其他模块进行使用。另外,还可以通过第二接口获取卷积运算参数、批处理运算参数和激活运算参数,并将卷积运算参数和批处理运算 参数写入到第二存储单元中,将激活运算参数发送给参数分发模块。
上述步骤中通过将原始图像数据存储到第一存储单元中,以及将卷积神经网络运算参数存储到第二存储单元中,可以实现上述两类数据的分别存储,在卷积神经网络运算模块执行运算步骤时,可以分别从第一存储单元中读取原始图像数据,以及从第二存储单元中读取所述卷积神经网络运算参数,并将上述两类数据发送给卷积神经网络运算模块来执行运算步骤。
进一步的,在图像识别处理装置中,还可以包括运算控制模块,上述方法还可以进一步包括:
运算控制模块从参数分发模块获取卷积神经网络配置参数中的控制类配置参数,该运算控制模块还可以根据上述控制类配置参数对整个本发明实施例所涉及的图像识别处理方法的整个流程进行控制,例如可以控制从第一接口获取原始图像数据、从第一接口获取参数配置指令以及从第二接口获取卷积运算参数、批处理运算参数和激活运算参数。另外,该运算控制模块将卷积神经网络配置参数中的控制类配置参数发送给卷积神经网络运算模块,可以控制何时启动卷积神经网络运算。
进一步的,在卷积神经网络运算模块中,还可以包括运算控制单元,本发明实施例提供的方法还包括:运算控制单元接收卷积神经网络运算配置参数中的控制类配置参数,并可以基于卷积神经网络运算配置参数中的控制类配置参数控制卷积神经网络运算流程,上述控制类配置参数包括但不限于输入或者输出原始图像尺寸、每层卷积神经网络的输入或者输出通道数目。具体包括根据控制类配置参数控制从第一存储单元中读取原始图像数据,从第二存储单元中读取卷积运算参数和批处理运算参数,并将所述原始图像数据、卷积运算参数和批处理运算参数发送给卷积运算单元。本发明实施例中,对于每个运算组件中的卷积运算单元和批处理运算单元,其可以合并为同一个单元。
此外,对于卷积神经网络配置参数中的运算类配置参数和激活运算参数,可以通过两种方式发送给各个运算单元,即第一种方式是由运算控制模块和卷积神经网络运算模块的运算控制单元进行转发,第二种方式是由参数分发模块 直接进行分发。具体的,第一种方式进一步包括:
运算控制模块从参数分发模块获取激活运算参数和卷积神经网络配置参数中的运算类配置参数,该运算类配置参数包括卷积预算配置参数、卷积内核大小和池化方式;
运算控制模块将激活运算参数和卷积神经网络配置参数中的运算类配置参数发送给卷积神经网络运算模块的运算控制单元;
运算控制单元将激活运算参数发送给激活运算单元,以及将卷积运算配置参数、卷积内核大小发送给卷积运算单元,以及将池化方式发送给池化单元。
第二种方式进一步包括:参数分发模块直接将卷积运算配置参数和卷积内核大小发送给卷积运算单元,以及将激活运算参数发送给激活运算单元,以及将池化方式发送给池化单元。
通过上述两种方式进行激活运算参数和卷积神经网络配置参数中的运算类配置参数分发后,卷积运算单元可以根据卷积运算配置参数和卷积内核大小进行卷积运算,激活运算单元可以根据激活运算参数进行激活运算,以及池化单元可以根据池化方式进行池化运算,即卷积神经网络运算还可以进一步包括池化运算步骤和写回步骤,即上述方法还可以进一步包括:池化单元根据池化方式对激活运算处理后的图像数据进行池化运算,并将池化运算后的图像数据写回到第一存储单元中。具体的,在每层卷积神经网络运算完成后,可以将该层运算处理后的图像数据写回到第一存储单元中,以进行下一层的卷积神经网络运算,在进行下一层的卷积神经网络运算时,还可以利用上述N个运算组件进行并行处理,以提高卷积神经网络运算效率。
进一步的,卷积神经网络运算还可以进一步包括预处理步骤,该预处理步骤包括对原始图像数据进行图像填充处理,即对从第一存储单元中读取的原始图像数据进行图像填充处理后发送给卷积运算单元,本步骤中的图像填充处理可以是对图像的上下左右边界进行填充,以迎合卷积运算过程中卷积内核大小对图像数据的需求;以及对从第二存储单元中获取的卷积运算参数进行累加求和处理后发送给卷积运算单元,具体的,还可以将初始获取的卷积运算参数也 发送给卷积运算单元。
本发明上述实施例中,其中的将卷积运算参数和批处理运算参数存储到第二存储单元中,该第二存储单元可以包括第一存储器、第二存储器和第三存储器,则上述方法中将卷积运算参数和批处理运算参数写入到第二存储单元,以及从第二存储单元读取卷积运算参数和批处理运算参数可以具体包括:
将卷积运算参数写入到第一存储器或第二存储器中,以及从第一存储器或第二存储器中读取所述卷积运算参数,且在向第一存储器写入卷积运算参数时,从第二存储器中读取卷积运算参数,或者,在向第二存储器写入卷积运算参数时,从第一存储器中读取所述卷积运算参数。即采用一种乒乓操作的方式,交替向第一存储器和第二存储器中写入卷积神经网络运算参数,以及交替从第一存储器和第二存储器中读取卷积神经网络运算参数,提高参数读写效率,提升整个图像识别处理过程的实时性。而对于批处理运算参数,可以是将批处理运算参数写入到第三存储器中,以及从第三存储器中读取所述批处理运算参数。
与上述方法实施例对应的,本发明实施例还提供了一种图像识别处理装置,能够执行上述的图像识别处理方法,且能够达到相同的技术效果。图3为本发明实施例提供的一种图像识别处理装置的结构示意图,如图3所示,该图像识别处理装置包括参数获取模块1和卷积神经网络运算模块2,具体的,上述的参数获取模块1用于获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,原始图像数据包括M个像素点数据,M为正整数;卷积神经网络运算模块2与参数获取模块1连接,用于根据卷积神经网络配置参数和卷积神经网络运算参数对原始图像数据进行卷积神经网络运算;其中,参照上述的图2所示,其中的卷积神经网络运算模块2包括N个并行设置的运算组件21,每个运算组件21包括依次连接的卷积运算单元22、批处理运算单元23和激活运算单元24,N个运算组件21用于分别对原始图像数据中的N个像素点数据同时进行卷积运算、批处理运算和激活运算,N为小于或等于M的正整数。
本实施例提供的图像识别处理装置,其中卷积神经网络运算模块包括N个 并行设置的运算组件,在进行运算时,每个像素点数据输入到一个运算组件中,则N个运算组件可以同时处理N个像素点数据,以使得N个像素点数据同时在对应的运算组件中进行卷积运算、批处理运算和激活运算,实现并行运算的效果,进而能够提高卷积神经网络运算的运算效率,提升了图像识别处理过程的实时性。
图4为本发明实施例提供的另一种图像识别处理装置的结构示意图,如图4所示,为在图3所示的实施例基础上,装置还包括参数分发模块3,参数获取模块1包括数据传输总线11、第一接口12、第二接口13、第一存储单元14和第二存储单元15。其中的数据传输总线11用于传输原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数;第一接口12的第一端与数据传输总线11连接,第一接口12的第二端分别与参数分发模块3和第一存储单元14连接,第一接口12用于从数据传输总线11获取原始数据并写入到第一存储单元14,以及从数据传输总线11获取参数配置指令,并发送给参数分发模块3,该参数配置指令包括卷积神经网络配置参数;第二接口13的第一端与数据传输总线11连接,第二接口13的第二端分别与参数分发模块3和第二存储单元15连接,第二接口13用于从数据传输总线11获取卷积运算参数、批处理运算参数和激活运算参数,并将卷积运算参数和批处理运算参数写入到第二存储单元15中,将激活运算参数发送给参数分发模块3。本发明实施例中,其中第一存储单元14和第二存储单元15均可以使用随机存取存储器。
本实施例中通过设置数据传输总线,并使用该数据传输总线传输原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,使得可以在各层卷积神经网络运算过程中,实时从数据传输总线中获取卷积神经网络配置参数和卷积神经网络运算参数,且通过该方式获取的配置参数和运算参数,可以根据实际需求实时调整,进而可以实现卷积神经网络运算的可编程性和可配置性。进一步的,上述的数据传输总线可以为先进可扩展接口AXI总线。具体的,该AXI总线可以是一种符合ARM架构的数据传输总线,本实施例提供的技术方案能够兼容现有的ARM架构。AXI总线也是一种面向高性能、高带宽、低延迟的 片内总线,能够满足超高性能和复杂的片上系统设计需求,符合本发明实施例中利用硬件系统实现卷积神经网络实时运算的需求。
本发明实施例中,仍如上述的附图4所示,还可以进一步包括运算控制模块4,该运算控制模块4分别与参数分发模块3和卷积神经网络运算模块2连接;
运算控制模块4用于从参数分发模块3获取卷积神经网络配置参数中的控制类配置参数,并可以根据控制类配置参数对本发明实施例所涉及的图像识别处理的整个流程进行控制。具体的,该运算控制模块4用于控制从第一接口获取原始图像数据、从第一接口获取参数配置指令以及从第二接口获取卷积运算参数、批处理运算参数和激活运算参数;以及用于将卷积神经网络配置参数中的控制类配置参数发送给卷积神经网络运算模块2。另外,该运算控制模块4将卷积神经网络配置参数中的控制类配置参数发送给卷积神经网络运算模块2,可以对卷积神经网络运算流程进行控制,控制何时启动卷积神经网络运算。
进一步的,仍如上述图4所示,卷积神经网络运算模块2还包括运算控制单元25,该运算控制单元25的参数输入端与运算控制模块4连接,以及运算控制单元25的控制端分别与卷积运算单元22、批处理运算单元23和激活运算单元24连接;其中的运算控制单元25用于接收卷积神经网络配置参数中的控制类配置参数,该控制类配置参数包括输入或者输出原始图像尺寸、每层卷积神经网络的输入或者输出通道数目,并根据卷积神经网络配置参数中的控制类配置参数控制从第一存储单元14中读取原始图像数据,并将该原始图像数据发送给卷积运算单元22,控制从第二存储单元15中读取卷积运算参数和批处理运算参数,并将上述卷积运算参数和批处理运算参数发送给卷积运算单元22,本发明实施例中,对于每个运算组件中的卷积运算单元22和批处理运算单元23,其可以合并为同一个单元,并可以称其为卷积和批处理运算单元,来执行卷积运算和批处理运算的功能。
进一步的,卷积神经网络运算模块2还包括池化单元26和写回单元27,分别与运算控制单元25的控制端连接。
具体的,针对激活运算参数和所述卷积神经网络配置参数中的运算类配置参数,可以通过两种方式分发到各个运算单元。
第一种方式是依次由运算控制模块4和卷积神经网络运算模块2的运算控制单元25进行分发,即运算控制模块4还用于从参数分发模块3获取激活运算参数和卷积神经网络配置参数中的运算类配置参数,运算类配置参数包括卷积预算配置参数、卷积内核大小和池化方式,以及用于将激活运算参数和卷积神经网络配置参数中的运算类配置参数发送给卷积神经网络运算模块2的运算控制单元25,运算控制单元25还用于将激活运算参数发送给激活运算单元24,以及将卷积运算配置参数、卷积内核大小发送给卷积运算单元22,以及将池化方式发送给池化单元26。
第二种方式是由参数分发模块直接进行分发,即参数分发模块3还可以直接与激活运算单元24、卷积运算单元22和池化单元26连接,参数分发模块3用于直接将激活运算参数发送给激活运算单元24,以及将卷积运算配置参数和所述卷积内核大小发送给卷积运算单元22,以及将池化方式发送给池化单元26。
通过上述两种方式进行激活运算参数和卷积神经网络配置参数中的运算类配置参数分发后,卷积运算单元22可以根据卷积运算配置参数和卷积内核大小进行卷积运算,激活运算单元24可以根据激活运算参数进行激活运算,以及池化单元26可以根据池化方式进行池化运算。
具体的,池化单元26用于根据卷积神经网络配置参数中的池化方式对激活运算处理后的图像数据进行池化运算,并控制写回单元27将池化运算后的图像数据写回到第一存储单元14中。具体的,在每层卷积神经网络运算完成后,可以将该层运算处理后的图像数据写回到第一存储单元中,以进行下一层的卷积神经网络运算,下一层的卷积神经网络运算时也可以利用上述的N个运算组件并行运算。
图5为本发明实施例提供的再一种图像识别处理装置的结构示意图,如图5所示,该装置的卷积神经网络运算模块2中还进一步包括图像预处理单元28 和参数预处理单元29,其中的图像预处理单元28设置在第一存储单元14和卷积运算单元22之间,用于对原始图像数据进行图像填充处理后发给卷积运算单元22,其中的图像填充处理可以是对图像的上下左右边界进行填充,以迎合卷积运算过程中卷积内核大小对图像数据的需求;参数预处理单元29设置在第二存储单元15和卷积运算单元22之间,用于对卷积运算参数进行累加求和后发送卷积运算单元22,具体的,还可以将初始获取的卷积运算参数也发送给卷积运算单元。
可选的,仍如上述图5所示,其中的参数获取模块1还可以包括数据读写单元16,数据读写单元16分别与第一接口12和第一存储单元14连接,分别与第二接口13和第二存储单元15连接,分别与图像预处理单元28和参数预处理单元29连接,以及与写回单元27连接,该数据读写单元16用于从第一接口12获取原始图像数据,并写入到第一存储单元14,以及从第一存储单元14读取原始图像数据,并发送给图像预处理单元28;以及数据读写单元16还用于从第二接口13获取卷积运算参数和批处理运算参数,并写入到第二存储单元15,以及从第二存储单元15读取卷积运算参数和批处理运算参数,并发送给参数预处理单元29,数据读写单元16还用于将写回单元27发送的池化运算后的图像数据写入到第一存储单元14。
具体的,本发明实施例中的数据读写单元16可以对第一存储单元14和第二存储单元15的读写命令进行分类,从第一接口12接收到的原始图像数据将会通过数据读写单元16写入到第一存储单元14中,图像预处理单元28可以通过数据读写单元16读取第一存储单元14,第二接口接收到的卷积运算参数、批处理运算参数也将通过数据读写单元16写入第二存储单元15中,参数预处理单元29也可以通过数据读写单元16读取第二存储单元15。另外,写回单元27也可以将运算结束后的每个输出通道的图像数据通过数据读写单元16写入第一存储单元14中。本发明实施例中,其中的第一接口可以是主动写入类型的接口,即数据传输总线将原始图像数据和卷积神经网络配置参数通过第一接口主动写入;而第二接口可以是被动读取类型的接口,即需要通过第二接口读取 卷积运算参数、批处理运算参数和激活运算参数。
第一存储单元14可以对原始图像数据、运算后的图像数据进行存储。举例来说,第一存储单元14所使用的存储器的深度可以为32K,位宽可以为64Byte,也就是说第一存储单元14所使用的存储器的每一行可以存储64个像素点数据,如果原始图像数据宽度超过64个像素点,则原始图像数据的一行将会在第一存储单元14所使用的存储器中分多行存储。根据上述实施例中的说明,第一存储单元中存储的64个像素点数据可以同时参与卷积神经网络运算模块中的运算,也就是说卷积神经网络运算模块中包括64个运算组件,第一存储单元14所使用的存储器中的64个像素点数据将分别参与每个运算组件的卷积运算、批处理运算和激活运算。为了支持卷积运算内核大小为3x3的情况,可以将第一存储单元14所使用的存储器位宽由64Byte扩展为66Byte,将66Byte分别定义为B0、B1、B2…B65,其中B1-B64依然存储每一行原始图像数据的64Byte,B0存储第一存储单元14所使用的存储器中前一行的B64,B65存储第一存储单元14所使用的存储器中后一行的B1。该第一存储单元14所使用的存储器中的存储示意图如图6所示,其中D idx代表一个像素点,含义同B0~B65。图7为发明实施例提供的卷积运算示意图,如图7所示,其中是以卷积内核大小为3x3为例进行说明,也就是卷积运算中滑动窗大小是3x3,卷积运算以每个3x3大小的滑动窗为单位,将该滑动窗内的像素点与卷积运算参数逐个相乘后累加求和,然后进行其他步骤的卷积运算。
可选的,图8为本发明实施例提供的第二存储单元的结构示意图,如图8所示,本发明实施例中的第二存储单元15可以包括第一存储器151、第二存储器152和第三存储器153,数据读写单元16具体用于将卷积运算参数写入到第一存储器151或第二存储器152中,以及从第一存储器151或第二存储器152中读取卷积运算参数,且在向第一存储器151写入卷积运算参数时,从第二存储器152中读取卷积运算参数,或者,在向第二存储器152写入卷积运算参数时,从第一存储器151中读取卷积运算参数,即采用一种乒乓操作的方式,交替向第一存储器151和第二存储器152中写入卷积神经网络运算参数,以及交 替从第一存储器151和第二存储器152中读取卷积神经网络运算参数,提高参数读写效率,提升整个图像识别处理过程的实时性。另外,上述的数据读写单元16还用于将批处理运算参数写入到第三存储器153,以及从第三存储器153中读取批处理运算参数。
以下结合本发明实施例中卷积神经网络运算模块中的各个功能单元,对卷积神经网络运算的过程进行详细说明,仍可以参考图5,具体涉及图像预处理单元28、参数预处理单元29、卷积运算单元22、批处理运算单元23、激活运算单元24、池化单元26和写回单元27。
具体的,在进行运算过程中,首先由图像预处理单元28向数据读写单元16发起数据读取请求,该数据读取请求由数据单元16发送给第一存储单元14,图像预处理单元28对反馈回来的原始图像数据进行图像填充处理后发给所述卷积运算单元22,以迎合卷积运算过程中卷积内核大小对图像数据的需求,例如卷积内核大小为3x3。参数预处理单元29向数据读写单元16发起参数读取请求,该参数读取请求由数据单元16发送给第二存储单元15,以获取卷积运算参数,参数预处理单元29会对卷积运算参数进行累加求和后,将累加求和结果和读取的卷积运算参数同时发送给卷积运算单元22,另外,还会获取到批处理运算参数,该批处理运算参数也会发送给卷积运算单元22,且具体的,由于批处理运算为对卷积运算结果直接进行处理,批处理运算单元23和卷积运算单元22可以合并为同一个运算单元。
卷积运算单元22在接收到预处理的图像数据和卷积运算参数,同时获取到运算控制单元24或者参数分发模块3发送的卷积运算过程中所使用的乘法系数、移位系数和加法系数后,可以进行卷积运算,具体的运算过程包括:对卷积滑动窗中的像素点数据与卷积运算参数对应位置相乘,再累加求和。具体实现过程如下:卷积滑动窗中的像素点数据与卷积运算参数对应位置逐个相乘,并且对所有的乘法结果求取累加和sum0,滑动窗大小即卷积内核大小,可以参考图7。
另外,滑动窗像素点数据累加和、卷积运算参数累加和再分别进行乘法、 移位操作得到累加和sum1和sum2,最后对sum0、sum1、sum2和加法系数偏移量求和,最终得到卷积运算结果。其中的乘法系数、移位系数和加法系数可以参照上述从运算控制单元24或者参数分发模块3获取。
批处理运算单元23要对卷积运算结果进行规范化,其中均值为0,方差为1。具体实现过程为,将卷积运算结果与批处理乘法系数相乘,经过移位且与加法偏移量累加后,得到批处理结果,其中乘法系数、移位系数、加法系数都是由第二接口读取数据传输总线获取,并且存储在第二存储单元16。由于批处理运算直接对卷积运算结果进行处理,因此卷积运算单元22和批处理运算单元23可以合并在同一个功能单元中。
激活运算单元24可以进行激活运算,该单元可以实现输入数据分段非线性映射,也就是说将输入数据与若干个(例如16个)激活参数进行比较,寻找到最接近输入数据的激活参数,求取两者差值,然后对差值依次做乘法、移位、加法操作,这里的16个比较系数、乘法系数、移位系数、加法系数均是由第二接口读取数据传输总线获得后发送给参数分发模块3,并由参数分发模块3直接分发,或者最终由运算控制单元24发送给激活运算单元。另外,本发明实施例中,其中依次连接的卷积运算单元22、批处理单元23和激活单元24可以构成一个运算组件,一共可以设置N个运算组件,以实现同一时间内对N个像素点数据进行卷积神经网络运算,达到并行运算的技术效果,提升图像识别处理过程的运算效率。
池化单元26可以进行池化运算,将输入图像进行缩小,保留重要的信息,它可以计算2x2或者4x4像素点区域内的最大值或者均值,每一层卷积神经网络需要进行哪种池化类型,将由配置参数决定。具体的,池化类型可以包括但不限于2x2像素点区域内求均值、最大值、左上方值、右上方值,4x4像素点区域内求均值、最大值、左上方值等类型。
数据暂存单元20与每个卷积运算单元22连接,可以将每个输入通道的卷积运算结果暂时存储起来,也就是说当前输入通道的卷积运算完成后,卷积运算单元22可以将前一次卷积运算结果从数据暂存单元20中读出,然后与当前 输入通道的卷积运算结果累加后,再送入数据暂存单元20中暂存。
写回单元27可以将经过卷积运算、批处理运算、激活运算、池化处理后的每个输出通道的图像数据写回到第一存储单元14中,此时的写请求会先发送给数据读写单元16中,然后发送给第一存储单元14。另外,写回单元27还会将每个输出通道的图像数据写到数据传输总线上,再以直接内存存取方式读取数据传输总线。
本发明实施例提供的技术方案中,除了设置了N个运算组件进行并行运算,以及设置乒乓操作模式的第一存储器和第二存储器来实现卷积运算参数的存储,能够显著的提高运算效率,提升图像识别处理方法的实时性。另外,本发明实施例提供的图像识别处理装置的架构中,将与数据传输总线连接的第一接口和第二接口,与内部的卷积神经网络运算模块分离,便于对卷积神经网络运算模块进行升级优化。另外,本发明实施例提供的架构,其中在对各个模块进行设计时,例如对第一接口、第一存储单元和运算控制模块进行设计时,可以充分考虑不同的原始图像尺寸、卷积神经网络运算层数和卷积内核大小的需求,可以支持不同规模的卷积神经网络架构。
本发明实施例的附图中的流程图和框图,图示了按照本发明实施例各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。在基于硬件的系统实现时,其中加法操作、乘法操作等可以通过加法器、乘法器等硬件实现,另 外,还可以增加一些逻辑控制器来实现基本的逻辑控制。
描述于本发明实施例实施方式中所涉及到的单元或模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元或模块也可以设置在处理器中,这些单元或模块的名称在某种情况下并不构成对该单元或模块本身的限定。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (19)

  1. 一种图像识别处理方法,其特征在于,包括:
    获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,所述原始图像数据包括M个像素点数据,M为正整数;
    由卷积神经网络运算模块根据卷积神经网络配置参数和卷积神经网络运算参数对所述原始图像数据进行卷积神经网络运算,其中,所述卷积神经网络运算模块包括N个并行设置的运算组件,每个运算组件包括依次连接的卷积运算单元、批处理运算单元和激活运算单元,所述N个运算组件分别对所述原始图像数据中的N个像素点数据同时进行卷积运算、批处理运算和激活运算,N为小于或等于M的正整数。
  2. 根据权利要求1所述的图像识别处理方法,其特征在于,所述获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数包括:
    通过数据传输总线获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数。
  3. 根据权利要求2所述的图像识别处理方法,其特征在于,所述通过数据传输总线获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数包括:
    通过先进可扩展接口AXI总线获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数。
  4. 根据权利要求2所述的图像识别处理方法,其特征在于,所述获取原始图像数据包括通过第一接口获取原始图像数据,并将所述原始图像数据写入到第一存储单元中;
    所述获取卷积神经网络配置参数包括通过第一接口获取参数配置指令,并将所述参数配置指令发送给参数分发模块,所述参数配置指令包括所述卷积神经网络配置参数;
    所述获取卷积神经网络运算参数包括通过第二接口获取卷积运算参数、批处理运算参数和激活运算参数,并将所述卷积运算参数和批处理运算参数写入 到第二存储单元中,将所述激活运算参数发送给参数分发模块。
  5. 根据权利要求4所述的图像识别处理方法,其特征在于,还包括:
    运算控制模块从参数分发模块获取所述卷积神经网络配置参数中的控制类配置参数;
    所述运算控制模块根据所述控制类配置参数控制从第一接口获取原始图像数据、从第一接口获取参数配置指令以及从第二接口获取卷积运算参数、批处理运算参数和激活运算参数;
    所述运算控制模块将所述卷积神经网络配置参数中的控制类配置参数发送给卷积神经网络运算模块。
  6. 根据权利要求5所述的图像识别处理方法,其特征在于,所述卷积神经网络运算模块还包括运算控制单元,所述方法还包括:
    所述运算控制单元接收所述卷积神经网络配置参数中的控制类配置参数,所述控制类配置参数包括输入或者输出原始图像尺寸、每层卷积神经网络的输入或者输出通道数目;
    所述运算控制单元根据所述控制类配置参数控制从所述第一存储单元中读取原始图像数据,从所述第二存储单元中读取卷积运算参数和批处理运算参数,并将所述原始图像数据、卷积运算参数和批处理运算参数发送给卷积运算单元。
  7. 根据权利要求6所述的图像识别处理方法,其特征在于,所述方法还包括:
    所述运算控制模块从所述参数分发模块获取所述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数,所述运算类配置参数包括卷积预算配置参数、卷积内核大小和池化方式;
    所述运算控制模块将所述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数发送给所述卷积神经网络运算模块的运算控制单元;
    所述运算控制单元将激活运算参数发送给激活运算单元,以及将所述卷积运算配置参数、所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元;或者,
    所述参数分发模块直接将所述激活运算参数发送给激活运算单元,以及将所述卷积运算配置参数和所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元。
  8. 根据权利要求6所述的图像识别处理方法,其特征在于,所述方法还包括:
    对所述从第一存储单元中读取的原始图像数据进行图像填充处理后发送给所述卷积运算单元;以及,对所述从第二存储单元中获取的卷积运算参数进行累加求和处理后发送给卷积运算单元。
  9. 根据权利要求6所述的图像识别处理方法,其特征在于,所述第二存储单元包括第一存储器、第二存储器和第三存储器,所述将卷积运算参数和批处理运算参数写入到第二存储单元,以及从第二存储单元读取卷积运算参数和批处理运算参数包括:
    将所述卷积运算参数写入到第一存储器或第二存储器中,以及从第一存储器或第二存储器中读取所述卷积运算参数,且在向所述第一存储器写入所述卷积运算参数时,从所述第二存储器中读取所述卷积运算参数,或者,在向所述第二存储器写入所述卷积运算参数时,从所述第一存储器中读取所述卷积运算参数;
    将所述批处理运算参数写入到所述第三存储器中,以及从所述第三存储器中读取所述批处理运算参数。
  10. 一种图像识别处理装置,其特征在于,包括:
    参数获取模块,用于获取原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数,所述原始图像数据包括M个像素点数据,M为正整数;
    卷积神经网络运算模块,与所述参数获取模块连接,用于根据所述卷积神经网络配置参数和卷积神经网络运算参数对所述原始图像数据进行卷积神经网络运算,其中,所述卷积神经网络运算模块包括N个并行设置的运算组件,每个运算组件包括依次连接的卷积运算单元、批处理运算单元和激活运算单元,所述N个运算组件分别对所述原始图像数据中的N个像素点数据同时进行卷 积运算、批处理运算和激活运算,N为小于或等于M的正整数。
  11. 根据权利要求10所述的图像识别处理装置,其特征在于,还包括参数分发模块,所述参数获取模块包括数据传输总线、第一接口、第二接口、第一存储单元和第二存储单元;
    所述数据传输总线用于传输原始图像数据、卷积神经网络配置参数和卷积神经网络运算参数;
    所述第一接口的第一端与所述数据传输总线连接,所述第一接口的第二端分别与所述参数分发模块和所述第一存储单元连接,所述第一接口用于从所述数据传输总线获取原始数据并写入所述第一存储单元,以及从所述数据传输总线获取参数配置指令,并发送给参数分发模块,所述参数配置指令包括卷积神经网络配置参数;
    所述第二接口的第一端与所述数据传输总线连接,所述第二接口的第二端分别与所述参数分发模块和所述第二存储单元连接,所述第二接口用于从所述数据传输总线获取卷积运算参数、批处理运算参数和激活运算参数,并将所述卷积运算参数和批处理运算参数写入到第二存储单元中,将所述激活运算参数发送给参数分发模块。
  12. 根据权利要求11所述的图像识别处理装置,其特征在于,还包括运算控制模块,所述运算控制模块分别与所述参数分发模块和所述卷积神经网络运算模块连接;
    所述运算控制模块用于从参数分发模块获取所述卷积神经网络配置参数中的控制类配置参数;以及用于根据所述控制类配置参数控制从第一接口获取原始图像数据、从第一接口获取参数配置指令以及从第二接口获取卷积运算参数、批处理运算参数和激活运算参数;以及用于将所述卷积神经网络配置参数中的控制类配置参数发送给卷积神经网络运算模块。
  13. 根据权利要求12所述的图像识别处理装置,其特征在于,所述卷积神经网络运算模块还包括运算控制单元,所述运算控制单元的参数输入端与所述控制模块连接,以及运算控制单元的控制端分别与所述卷积运算单元、批处理 运算单元和激活运算单元连接;
    所述运算控制单元用于接收所述卷积神经网络配置参数中的控制类配置参数,所述控制类配置参数包括输入或者输出原始图像尺寸、每层卷积神经网络的输入或者输出通道数目,并根据所述控制类配置参数控制从所述第一存储单元中读取原始图像数据,并将该原始图像数据发送给卷积运算单元,控制从所述第二存储单元中读取卷积运算参数和批处理运算参数,并将卷积运算参数和批处理运算参数发送给卷积运算单元;以及用于将激活运算参数发送给激活运算单元。
  14. 根据权利要求13所述的图像识别处理装置,其特征在于,所述卷积神经网络运算模块还包括池化单元和写回单元,分别与所述运算控制单元的控制端连接;
    所述运算控制模块还用于从所述参数分发模块获取所述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数,所述运算类配置参数包括卷积预算配置参数、卷积内核大小和池化方式,以及用于将所述激活运算参数和所述卷积神经网络配置参数中的运算类配置参数发送给所述卷积神经网络运算模块的运算控制单元,所述运算控制单元还用于将激活运算参数发送给激活运算单元,以及将所述卷积运算配置参数、所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元;或者,
    所述参数分发模块直接与所述激活运算单元、所述卷积运算单元和所述池化单元连接,所述参数分发模块用于直接将所述激活运算参数发送给激活运算单元,以及将所述卷积运算配置参数和所述卷积内核大小发送给卷积运算单元,以及将所述池化方式发送给池化单元。
  15. 根据权利要求13所述的图像识别处理装置,其特征在于,所述卷积神经网络运算模块还包括:
    图像预处理单元,设置在所述第一存储单元和所述卷积运算单元之间,用于对原始图像数据进行图像填充处理后发给所述卷积运算单元;
    参数预处理单元,设置在所述第二存储单元和所述卷积运算单元之间,用 于对所述卷积运算参数进行累加求和后发送给所述卷积运算单元。
  16. 根据权利要求15所述的图像识别处理装置,其特征在于,所述参数获取模块还包括数据读写单元,所述数据读写单元分别与所述第一接口和所述第一存储单元连接,分别与所述第二接口和所述第二存储单元连接,分别与所述图像预处理单元和所述参数预处理单元连接,以及与所述写回单元连接;
    所述数据读写单元用于从第一接口获取原始图像数据,并写入到第一存储单元,以及从所述第一存储单元读取原始图像数据,并发送给所述图像预处理单元;
    所述数据读写单元还用于从第二接口获取所述卷积运算参数和批处理运算参数,并写入到第二存储单元,以及从所述第二存储单元读取所述卷积运算参数和批处理运算参数,并发送给所述参数预处理单元
    所述数据读写单元还用于将写回单元发送的池化运算后的图像数据写入到第一存储单元。
  17. 根据权利要求16所述的图像识别处理装置,其特征在于,第二存储单元包括第一存储器、第二存储器和第三存储器,所述参数读取单元具体用于将所述卷积运算参数写入到第一存储器或第二存储器中,以及从第一存储器或第二存储器读取所述卷积运算参数,且在向所述第一存储器写入所述卷积运算参数时,从所述第二存储器中读取所述卷积运算参数,或者,在向所述第二存储器写入所述卷积运算参数时,从所述第一存储器中读取所述卷积运算参数;
    将所述批处理运算参数写入到所述第三存储器,以及从第三存储器中读取所述批处理运算参数。
  18. 根据权利要求10所述的图像识别处理装置,还包括:
    数据暂存单元,与每个所述运算组件的卷积运算单元连接,用于存储所述卷积运算单元运算后的每个输入通道的卷积运算结果。
  19. 根据权利要求10所述的图像识别处理装置,其特征在于,所述数据传输总线为先进可扩展接口AXI总线。
PCT/CN2019/095449 2018-08-31 2019-07-10 图像识别处理方法和装置 WO2020042771A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/272,557 US12026105B2 (en) 2018-08-31 2019-07-10 Image recognition processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811010061.0 2018-08-31
CN201811010061.0A CN110874605B (zh) 2018-08-31 2018-08-31 图像识别处理方法和装置

Publications (2)

Publication Number Publication Date
WO2020042771A1 WO2020042771A1 (zh) 2020-03-05
WO2020042771A9 true WO2020042771A9 (zh) 2021-05-20

Family

ID=69644683

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/095449 WO2020042771A1 (zh) 2018-08-31 2019-07-10 图像识别处理方法和装置

Country Status (2)

Country Link
CN (1) CN110874605B (zh)
WO (1) WO2020042771A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154624A (zh) * 2021-12-07 2022-03-08 广州小鹏自动驾驶科技有限公司 基于卷积神经网络的数据处理方法、装置及设备
CN116151352B (zh) * 2023-04-13 2024-06-04 中浙信科技咨询有限公司 基于大脑信息通路整合机制的卷积循环神经网络诊断方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035750A (zh) * 2014-06-11 2014-09-10 西安电子科技大学 一种基于fpga的实时模板卷积实现方法
US9886377B2 (en) * 2015-10-05 2018-02-06 Intel Corporation Pipelined convolutional operations for processing clusters
CN109086877B (zh) * 2016-04-29 2020-05-08 中科寒武纪科技股份有限公司 一种用于执行卷积神经网络正向运算的装置和方法
CN106355244B (zh) * 2016-08-30 2019-08-13 深圳市诺比邻科技有限公司 卷积神经网络的构建方法及系统
CN107292334A (zh) * 2017-06-08 2017-10-24 北京深瞐科技有限公司 图像识别方法及装置
CN107590535A (zh) * 2017-09-08 2018-01-16 西安电子科技大学 可编程神经网络处理器
CN107657581B (zh) * 2017-09-28 2020-12-22 中国人民解放军国防科技大学 一种卷积神经网络cnn硬件加速器及加速方法
CN107992486A (zh) * 2017-10-30 2018-05-04 上海寒武纪信息科技有限公司 一种信息处理方法及相关产品
CN108416422B (zh) * 2017-12-29 2024-03-01 国民技术股份有限公司 一种基于fpga的卷积神经网络实现方法及装置

Also Published As

Publication number Publication date
WO2020042771A1 (zh) 2020-03-05
CN110874605B (zh) 2024-05-03
US20210326619A1 (en) 2021-10-21
CN110874605A (zh) 2020-03-10

Similar Documents

Publication Publication Date Title
EP3407266B1 (en) Artificial neural network calculating device and method for sparse connection
US10846591B2 (en) Configurable and programmable multi-core architecture with a specialized instruction set for embedded application based on neural networks
WO2020073211A1 (zh) 运算加速器、处理方法及相关设备
CN109993707B (zh) 图像去噪方法和装置
CN107292352B (zh) 基于卷积神经网络的图像分类方法和装置
CN107341547A (zh) 一种用于执行卷积神经网络训练的装置和方法
WO2022067508A1 (zh) 一种神经网络加速器、加速方法以及装置
CN112163601B (zh) 图像分类方法、系统、计算机设备及存储介质
CN110766127B (zh) 神经网络计算专用电路及其相关计算平台与实现方法
CN110991630A (zh) 一种面向边缘计算的卷积神经网络处理器
CN111931901A (zh) 一种神经网络构建方法以及装置
WO2020042771A9 (zh) 图像识别处理方法和装置
JP2020042774A (ja) 人工知能推論演算装置
WO2020042770A9 (zh) 图像识别处理方法和装置
CN108334944A (zh) 一种人工神经网络运算的装置及方法
DE102021107510A1 (de) Training eines neuronalen netzwerks unter speicherbeschränkung
CN110009644B (zh) 一种特征图行像素分段的方法和装置
WO2022156475A1 (zh) 神经网络模型的训练方法、数据处理方法及装置
CN109359542A (zh) 基于神经网络的车辆损伤级别的确定方法及终端设备
WO2022227024A1 (zh) 神经网络模型的运算方法、训练方法及装置
US12026105B2 (en) Image recognition processing method and apparatus
US12033379B2 (en) Image recognition method and apparatus
CN114730331A (zh) 数据处理装置和数据处理方法
CN109144470B (zh) 一种计算装置及方法
JP2023043188A (ja) 常微分方程式に基づくgruネットワークモデル及び特徴抽出方法と装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19856389

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19856389

Country of ref document: EP

Kind code of ref document: A1