CN109871939B

CN109871939B - Image processing method and image processing device

Info

Publication number: CN109871939B
Application number: CN201910083846.9A
Authority: CN
Inventors: 陈海波
Original assignee: DeepBlue AI Chips Research Institute Jiangsu Co Ltd
Current assignee: DeepBlue AI Chips Research Institute Jiangsu Co Ltd
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2021-06-15
Anticipated expiration: 2039-01-29
Also published as: CN109871939A

Abstract

The invention discloses an image processing method and an image processing device, wherein an FPGA receives a first instruction sent by an ARM chip, so that the FPGA processes an acquired image by utilizing a convolutional neural network, when the FPGA processes the image, configuration parameters are stored in the ARM chip, the FPGA actively accesses the ARM chip to acquire the configuration parameters of the convolutional neural network, and then processes the image according to the configuration parameters to acquire a first characteristic of the image.

Description

Image processing method and image processing device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image processing method and an image processing apparatus.

Background

The convolutional neural network is widely applied to the field of computer vision, and particularly shows good application prospects in the aspects of target detection and image recognition. Since the image processing algorithm corresponding to the convolutional neural network is very computationally intensive, in order to increase the processing speed of an image using the convolutional neural network, an image processing unit (GPU) is generally used to accelerate the processing process. However, because the GPU has high price and large power consumption, the GPU cannot be used in an application scenario requiring real-time performance and low power consumption, and a field-programmable gate array (FPGA) is used to accelerate the processing procedure.

How to increase the speed of processing an image by using a convolutional neural network in an FPGA is a technical problem to be solved urgently at present.

Disclosure of Invention

Embodiments of the present invention provide an image processing method and an image processing apparatus, so as to solve the technical problem that how to increase the speed of processing an image by using a convolutional neural network in an FPGA in the prior art is currently in urgent need to be solved.

In a first aspect, an embodiment of the present invention provides an image processing method, where the method includes:

the method comprises the steps that a Field Programmable Gate Array (FPGA) receives a first instruction sent by an advanced reduced instruction set processor (ARM) chip, wherein the first instruction is used for indicating the FPGA to process an acquired image by using a convolutional neural network;

the FPGA acquires configuration parameters of the convolutional neural network by accessing the ARM chip;

and the FPGA processes the image according to the configuration parameters of the convolutional neural network to obtain the first characteristic of the image.

In this embodiment, the FPGA receives the first instruction sent by the ARM chip, so that the FPGA processes the acquired image by using the convolutional neural network, that is, the ARM chip controls the FPGA to process the image, when the FPGA processes the image, the configuration parameter is stored in the ARM chip, the FPGA actively accesses the ARM chip to acquire the configuration parameter of the convolutional neural network, and then processes the image according to the configuration parameter to acquire the first characteristic of the image.

Optionally, the configuration parameters at least include the number of layers of the convolutional neural network, a processing type required for each layer of the convolutional neural network, a weight parameter corresponding to each processing type, and related parameters of an image that can be processed by the convolutional neural network, where the processing types include convolutional processing, pooling processing, and sampling processing, the weight parameters include a weight parameter for convolutional processing, a step parameter for pooling processing, and a step parameter for sampling processing, and the related parameters include a size parameter of the image and a storage location of the image in a double data rate synchronous dynamic random access memory (DDR) connected to the FPGA.

In the embodiment, the configuration parameters include parameters such as the number of layers of the convolutional neural network, the processing type required by each layer, the weight parameters and the like, different configuration parameters correspond to different convolutional neural networks, and by setting different configuration parameters, images can be processed by using different convolutional neural networks in the same FPGA without changing the circuit structure of the FPGA, so that the universality of the FPGA on different convolutional neural networks is realized, and the technical effect of saving resources is achieved.

Optionally, when the FPGA processes the image by using the layer 0 of the convolutional neural network, the method includes the following steps:

when the FPGA is in an idle state, if an enabling signal of the 0 th layer is received, the FPGA enters a configuration parameter reading state, and the enabling signal of the 0 th layer is generated by the FPGA according to the first instruction;

when the FPGA is in the configuration parameter reading state, the ARM chip is accessed, and a first processing type required by the 0 th layer is obtained from the configuration parameters of the convolutional neural network;

after the first processing type is obtained, the FPGA enters a weight parameter reading state, and when the FPGA is in the weight parameter reading state, a first weight parameter corresponding to the first processing type is obtained from the ARM chip;

and when receiving information used for representing that the reading of the weight parameters is finished, the FPGA enters a calculation state, and when the FPGA is in the calculation state, the image indicated in the first instruction is processed according to the first processing type and the first weight parameters to acquire a second characteristic of the image, wherein the image is an image sent by a sensor connected with the FPGA.

The FPGA stores the second feature to the DDR;

and if information for representing the successful storage of the second feature is received, the FPGA enters a current layer ending state of the 0 th layer.

In this embodiment, when the FPGA processes an image using the layer 0 of the convolutional neural network, the FPGA only needs to go through 5 states, i.e., an idle state, a configuration parameter reading state, a weight parameter reading state, a calculation state, and a current layer end state, and the FPGA processes the image according to a jump between the states.

Optionally, when the FPGA processes the image by using the nth layer of the convolutional neural network, where N is a positive integer, the method includes the following steps:

when the FPGA is in an idle state or the FPGA uses the layer N-1 of the convolutional neural network to process the image, if an enabling signal of the layer N is received, the FPGA enters a configuration parameter reading state;

when the FPGA is in the configuration parameter reading state, the N processing type required by the N layer is obtained from the configuration parameters of the convolutional neural network by accessing the ARM chip;

after the Nth processing type is obtained, the FPGA enters an image feature reading state, when the FPGA is in the image feature reading state, an initial address of a position for storing feature data of the image and the byte number of the feature data are obtained from configuration parameters of the convolutional neural network by accessing the ARM chip, the FPGA determines a storage address of the feature data in the DDR according to the initial address, the byte number and the value of N, and the FPGA reads the feature data according to the storage address; when N is 1, the feature data is the second feature;

after the feature data is read, the FPGA enters a weight parameter reading state, and when the FPGA is in the weight parameter reading state, an Nth weight parameter corresponding to the Nth processing type is obtained from the ARM chip;

after receiving information used for representing the completion of reading of the weight parameters, the FPGA enters a calculation state, and when the FPGA is in the calculation state, the feature data is processed according to the Nth processing type and the Nth weight parameters to acquire the (N + 2) th feature of the image;

the FPGA stores the (N + 2) th feature to the DDR;

if information for representing the successful storage of the (N + 2) th feature is received, the FPGA enters a current layer ending state of the Nth layer;

when the nth layer is the last layer of the convolutional neural network, the N +2 th feature stored in the DDR is the first feature.

In this embodiment, when the FPGA processes an image using the nth layer of the convolutional neural network, the FPGA only needs to go through 6 states, i.e., an idle state, a configuration parameter reading state, an image feature reading state, a weight parameter reading state, a calculation state, and a current layer end state, and the FPGA processes the image according to a jump between the states.

Optionally, when the FPGA is in the image feature reading state, the FPGA reads the feature data according to the storage address, including:

the FPGA determines that the characteristic data comprise M data blocks, wherein M is an integer greater than 1;

the FPGA sequentially acquires each data block of the M data blocks;

correspondingly, when the FPGA is in the weight parameter reading state, the step of obtaining the nth weight parameter corresponding to the nth processing type from the ARM chip includes:

the FPGA determines that the Nth weight parameter comprises L groups of weight parameters, wherein each group of weight parameters in the L groups of weight parameters is used for processing at least one data block in the M data blocks;

and the FPGA sequentially acquires each group of weight parameters in the L groups of weight parameters.

In this embodiment, when the FPGA processes an image using the nth layer of the convolutional neural network, if the data size of the image features and weights is large, the image features and weights may be divided into a plurality of times of reading, so as to avoid the problem that the calculation time is too long when the image features or weights are calculated using the image features or weights due to an excessively large data size of the image features or weights.

Optionally, the step of the FPGA entering a computing state, and when the FPGA is in the computing state, processing the feature data according to the nth processing type and the nth weight parameter to obtain an N +2 th feature of the image includes:

sequentially taking i as 1 to M, processing the ith data block by the FPGA according to at least one group of weight parameters corresponding to the ith data block to obtain the ith part of the (N + 2) th feature, and storing the ith part to the DDR;

and when i is M, if information for representing that all parts of the (N + 2) th feature are successfully stored in the DDR is received, the FPGA enters a current layer ending state of the Nth layer.

The embodiment provides the technical effects that after the image characteristics and the weight blocks are read, the FPGA sequentially utilizes all the blocks of the weight to process all the blocks of the image characteristics, the accuracy of the result of block processing is ensured, the calculated amount of single calculation is reduced, and the calculation speed is accelerated.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, including:

the device comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a first instruction sent by an advanced reduced instruction set processor (ARM) chip, and the first instruction is used for indicating the device to process an acquired image by using a convolutional neural network;

the acquisition module is used for acquiring the configuration parameters of the convolutional neural network by accessing the ARM chip;

and the convolutional neural network processing module is used for processing the image according to the configuration parameters of the convolutional neural network so as to obtain the first characteristic of the image.

Optionally, the configuration parameters at least include the number of layers of the convolutional neural network, a processing type required for each layer of the convolutional neural network, a weight parameter corresponding to each processing type, and related parameters of an image that can be processed by the convolutional neural network, where the processing types include convolution processing, pooling processing, and sampling processing, the weight parameters include a weight parameter for convolution processing, a stride parameter for pooling processing, and a stride parameter for sampling processing, and the related parameters include a size parameter of the image and a storage location of the image in a double data rate synchronous dynamic random access memory DDR connected to the apparatus.

Optionally, the apparatus further includes a storage module, and when the apparatus processes the image by using the layer 0 of the convolutional neural network, the apparatus includes the following steps:

when the device is in an idle state, if an enabling signal of the 0 th layer is received, the device enters a configuration parameter reading state, and the enabling signal of the 0 th layer is generated by the FPGA according to the first instruction;

when the device is in the configuration parameter reading state, the acquisition module acquires a first processing type required by the 0 th layer from the configuration parameters of the convolutional neural network by accessing the ARM chip;

after the first processing type is obtained, the device enters a weight parameter reading state, and when the device is in the weight parameter reading state, the obtaining module obtains a first weight parameter corresponding to the first processing type from the ARM chip;

and when the device is in the calculation state, the convolutional neural network processing module processes the image indicated in the first instruction according to the first processing type and the first weight parameter to acquire a second feature of the image, wherein the image is an image sent by a sensor connected with the device.

The storage module stores the second feature to the DDR;

and if the convolutional neural network processing module receives information used for representing the successful storage of the second feature, the FPGA enters a current layer ending state of the 0 th layer.

Optionally, when the apparatus processes the image by using an nth layer of the convolutional neural network, where N is a positive integer, the apparatus includes:

when the device is in an idle state or the device finishes processing the image by using the layer N-1 of the convolutional neural network, if the receiving module receives an enabling signal of the layer N, the device enters a configuration parameter reading state;

when the device is in the configuration parameter reading state, the acquisition module acquires the Nth processing type required by the Nth layer from the configuration parameters of the convolutional neural network by accessing the ARM chip;

after the Nth processing type is obtained, the device enters an image feature reading state, when the device is in the image feature reading state, the obtaining module obtains an initial address of a position for storing feature data of the image and the byte number of the feature data from configuration parameters of the convolutional neural network by accessing the ARM chip, the device determines a storage address of the feature data in the DDR according to the initial address, the byte number and the value of N, and the obtaining module reads the feature data according to the storage address; when N is 1, the feature data is the second feature;

after the characteristic data is read, the device enters a weight parameter reading state, and when the device is in the weight parameter reading state, the acquisition module acquires an Nth weight parameter corresponding to the Nth processing type from the ARM chip;

after receiving information used for representing the completion of reading of the weight parameters, the device enters a calculation state, and when the device is in the calculation state, the convolutional neural network processing module processes the feature data according to the Nth processing type and the Nth weight parameters to acquire the (N + 2) th feature of the image;

the storage module stores the N +2 th feature to the DDR;

if the convolutional neural network processing module receives information used for representing the successful storage of the (N + 2) th feature, the device enters a current layer ending state of the Nth layer;

Optionally, when the apparatus is in the image feature reading state, the obtaining module reads the feature data according to the storage address, including:

the obtaining module determines that the feature data comprises M data blocks, wherein M is an integer greater than or equal to 1;

the obtaining module obtains each data block of the M data blocks in sequence;

correspondingly, when the device is in the weight parameter reading state, the obtaining module obtains the nth weight parameter corresponding to the nth processing type from the ARM chip, including:

the obtaining module determines that the Nth weight parameter comprises L groups of weight parameters, wherein each group of weight parameters in the L groups of weight parameters is used for processing at least one data block in the M data blocks;

the acquisition module sequentially acquires each set of weight parameters in the L sets of weight parameters.

Optionally, the entering of the device into a computation state, and when the device is in the computation state, the processing module of the convolutional neural network processes the feature data according to the nth processing type and the nth weight parameter to obtain an N +2 th feature of the image, includes:

sequentially taking i as 1 to M, processing the ith data block by the convolutional neural network processing module according to at least one group of weight parameters corresponding to the ith data block to obtain the ith part of the (N + 2) th feature, and storing the ith part to the DDR;

and when i is M, if the convolutional neural network processing module receives information used for representing that all parts of the (N + 2) th feature are successfully stored in the DDR, the device enters a current layer ending state of the Nth layer.

In a third aspect, an embodiment of the present invention provides a computer-readable storage medium, including:

the computer-readable storage medium has stored thereon computer instructions which, when executed by at least one processor of the computer apparatus, implement the method as described in the first aspect above.

One or more technical solutions provided in the embodiments of the present invention have at least the following technical effects or advantages:

in the embodiment of the invention, the FPGA receives a first instruction sent by an ARM chip and then processes the acquired image by using the convolutional neural network, when the FPGA processes the image, the configuration parameters are stored in the ARM chip, the FPGA actively accesses the ARM chip to acquire the configuration parameters of the convolutional neural network, and then processes the image according to the configuration parameters to acquire the first characteristic of the image.

Drawings

FIG. 1 is a schematic diagram of a calculation process of a convolutional neural network according to an embodiment of the present invention;

FIG. 2 is a flowchart of an image processing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an architecture of an FPGA, an ARM chip, and a DDR according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating 6 states of a convolutional neural network according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an FPGA according to an embodiment of the present invention;

fig. 6 is a schematic physical structure diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to solve the technical problem, the technical scheme in the embodiment of the invention has the following general idea:

an image processing method and an FPGA are provided, the image processing method comprises:

In the embodiment of the invention, the FPGA receives a first instruction sent by the ARM chip, so that the FPGA processes the acquired image by using the convolutional neural network, namely the ARM chip controls the FPGA to process the image, when the FPGA processes the image, the configuration parameters are stored in the ARM chip, the FPGA actively accesses the ARM chip to acquire the configuration parameters of the convolutional neural network, and then processes the image according to the configuration parameters to acquire the first characteristic of the image.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments. In the description of the embodiments of the present application, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present application, "a plurality" means two or more unless otherwise specified.

In the embodiment of the invention, the structure of the convolutional neural network can be a yolo3, AlexNet and other structures.

The convolution calculation in a convolutional neural network may be done in the manner shown in fig. 1, where in fig. 1 the image features are a 6 x 3 matrix and the filters (i.e., weights) are a 3 x 3 matrix. During calculation, a 3 × 3 × 3 image matrix selected by a frame is selected from a 6 × 6 × 3 image matrix to be subjected to convolution calculation with a 3 × 3 × 3 filter, wherein the calculation mode is that 27 elements in the 3 × 3 × 3 image matrix are multiplied by 27 corresponding positions in the 3 × 3 × 3 filter, the results of the 27 products are summed to be used as a first convolution value (similar to an operation method of matrix multiplication), and the first convolution value is stored at a position of 1 of a 4 × 4 × 4 convolution result; moving the culling frame to the right for one lattice, performing convolution calculation on the image matrix in the culling frame and the filter, and placing the calculation result at the position of 2 of the convolution result of 4 multiplied by 4; and after the selection frame is moved to the rightmost side, the selection frame is moved downwards by one line and moved to the leftmost side, convolution calculation is sequentially carried out from left to right, and the like, until the red frame is moved to the lower right corner, the image matrix in the selection frame is subjected to convolution calculation with the filter, the calculation result is placed at the position 16 of the 4 x 4 convolution result, and the convolution calculation of the 6 x 3 image and the 3 x 3 filter is completed.

Referring to fig. 2, an embodiment of the present invention provides an image processing method, where the method includes:

step S110, a Field Programmable Gate Array (FPGA) receives a first instruction sent by an advanced reduced instruction set processor (ARM) chip, wherein the first instruction is used for instructing the FPGA to process an acquired image by using a convolutional neural network;

step S120, the FPGA acquires configuration parameters of the convolutional neural network by accessing the ARM chip;

and S130, the FPGA processes the image according to the configuration parameters of the convolutional neural network to obtain a first characteristic of the image.

First, step S110 is executed, the FPGA can receive a first instruction sent by the ARM chip, referring to fig. 3, and fig. 3 shows an architecture among the FPGA 10, the ARM20 and the DDR 30. Data transmission is performed between the FPGA 20 and the DDR30, and between the FPGA 20 and the ARM 10 through an axi (advanced extensible interface) bus protocol. The ARM chip is a control chip used in a system on chip (SoC), such as a cortix-a 9 ARM chip.

The first instruction is used for instructing the FPGA to process the acquired image by using the convolutional neural network, namely, the FPGA can process the image after receiving the first instruction, the ARM chip controls the FPGA to start image processing, and the ARM chip at the moment serves as a main control unit.

In addition, the ARM chip is further configured to store configuration parameters of the convolutional neural network, the ARM chip may obtain initial configuration parameters from an SD card connected to the ARM chip, the initial configuration parameters are manually preset, and a specific setting method is not a key point of the present invention and is not described here.

The configuration parameters can comprise the number of layers of the convolutional neural network, the type of processing required by each layer of the convolutional neural network, a weight parameter corresponding to each processing type, and related parameters of the image which can be processed by the convolutional neural network, wherein the processing types comprise convolution processing, pooling processing and sampling processing, the weight parameters comprise a weight parameter for the convolution processing, a step parameter for the pooling processing and a step parameter for the sampling processing, and the related parameters comprise a size parameter of the image and a storage position of the image in a double-rate synchronous dynamic random access memory DDR connected with the FPGA. Referring to table 1, a table of the configuration of a convolutional neural network is given in table 1. The configuration parameters may be stored in a table form in the ARM chip, and specifically stored in a Block Ram (BR) of the ARM chip, where the block Ram is a dual-port Ram and includes two complete 36-bit read-write data buses and corresponding control buses. The configuration table may include parameters required by all layers of the convolutional neural network, or store the configuration parameters required by each layer of the convolutional neural network in a respective corresponding configuration table, for example, 16 layers of the convolutional neural network correspond to 16 configuration tables.

TABLE 1

For example, img W and img H in table 1 belong to relevant parameters of an image, specifically, the size of the image that can be processed by the convolutional neural network, the image acquired by the sensor is only a processing object of layer 0 of the convolutional neural network, and the processing objects of other layers are features obtained through one or more layers of processing of the convolutional neural network, and the features conform to the size of img W and img H. The Post type belongs to a type of processing, and particularly, a type of Post-processing other than convolution processing necessary in a convolutional neural network, including pooling processing and sampling processing. The Read image transfer num is also a relevant parameter of the image, for example, the image is too large, but the image cannot be moved once, so that the image needs to be subjected to blocking processing at this time, and the number of times of image reading transmission is the number of blocks of the image. The read image DDR address is the starting address of reading an image, the address is the address in the DDR, the read image DDR size is the size of the image that the FPGA needs to read only, and the FPGA can accurately read the image that needs to be processed only by changing the size and the storage address of the image. The pre-processing addressing parameter of the current layer is address offset, each layer in the convolutional neural network outputs calculation results which are stored in the DDR, and the configuration parameter only comprises an initial address for storing the calculation results, not a real storage address of each calculation result, and the storage address is equal to the initial address plus the address offset. The pre-processing addressing parameters of the current layer can also be obtained by FPGA calculation instead of being set in advance. As another example, a 15:0 in Table 1 indicates that the stored byte bits of the parameter are 16 bits from 0-15.

The initial configuration parameters acquired by the ARM chip may be the configuration parameters or may not include the address parameter part of the configuration parameters, and after the ARM acquires the parameters such as the weight parameters, the processing type, the image size and the like, the corresponding address parameters are allocated.

After step S110 is executed, step S120 and step S130 are executed.

The FPGA can access the ARM chip to obtain configuration parameters of the convolutional neural network, and the image is processed according to the configuration parameters, so that the first characteristic of the image is obtained. Referring to fig. 4, in the embodiment of the present invention, the process of processing the image by using the convolutional neural network by the FPGA includes 6 states of the convolutional neural network, which are an idle state (state 1), a configuration parameter reading state (state 2), an image feature reading state (state 3), a weight parameter reading state (state 4), a calculation state (state 5), and a current layer end state (state 6). When the FPGA is in an idle state, the FPGA waits for an enabling signal of a current layer, and if the enabling signal is not received, the FPGA is always in the idle state; when the current layer is in a configuration parameter reading state, the FPGA acquires the processing type of the current layer of the convolutional neural network from the ARM chip; when the image characteristic reading state is achieved, the FPGA accesses the ARM chip to obtain a storage address of characteristic data of the image, and the characteristic data are read from the DDR according to the storage address; when the weight parameter reading state is in, the FPGA accesses the ARM chip to obtain the number of channels of the weight, the depth of an image, the width and the height of the weight and the storage address of the weight in the ARM chip to obtain the specific numerical value of the weight; when the image is in a calculation state, the FPGA performs processing corresponding to the processing type acquired when the configuration parameter is in a reading state on the feature data of the image by using the acquired weight, and the acquired calculation result is stored in the DDR; and when the current layer is in the end state, the FPGA waits for an enabling signal of the next layer of the convolutional neural network, and if the enabling signal is not received within the preset time length, the FPGA automatically enters an idle state.

For example, when the FPGA processes an image by using the layer 0 of the convolutional neural network, the method includes the following steps:

when the FPGA is in an idle state, if an enabling signal of a 0 th layer is received, the FPGA enters a configuration parameter reading state;

when the FPGA is in a configuration parameter reading state, a first processing type required by a layer 0 is obtained from configuration parameters of the convolutional neural network by accessing an ARM chip;

and when the FPGA is in the calculation state, processing the image indicated in the first instruction according to the first processing type and the first weight parameter to acquire a second characteristic of the image, wherein the image is the image sent by a sensor connected with the FPGA.

The FPGA stores the second characteristic to the DDR;

and if the information for representing the successful storage of the second feature is received, the FPGA enters the current layer ending state of the 0 th layer.

The transition of the 6 states of layer 0 of the corresponding convolutional neural network is state 1 → state 3 → state 4 → state 5 → state 6.

For another example, when the FPGA processes an image by using the nth layer of the convolutional neural network, where N is a positive integer, the method includes the following steps:

when the FPGA is in an idle state or the FPGA finishes processing the image by using the layer N-1 of the convolutional neural network, if an enabling signal of the layer N is received, the FPGA enters a configuration parameter reading state;

when the FPGA is in a configuration parameter reading state, acquiring an Nth processing type required by an Nth layer from configuration parameters of the convolutional neural network by accessing the ARM chip;

after the Nth processing type is obtained, the FPGA enters an image feature reading state, when the FPGA is in the image feature reading state, an ARM chip is accessed, an initial address of a position for storing feature data of an image and the number of bytes of the feature data are obtained from configuration parameters of a convolutional neural network, the FPGA determines a storage address of the feature data in the DDR according to the initial address, the number of bytes and the value of N, and the FPGA reads the feature data according to the storage address; when N is 1, the characteristic data is a second characteristic;

after the characteristic data is read, the FPGA enters a weight parameter reading state, and when the FPGA is in the weight parameter reading state, an Nth weight parameter corresponding to the Nth processing type is obtained from the ARM chip;

after receiving information used for representing the completion of reading of the weight parameters, the FPGA enters a calculation state, and when the FPGA is in the calculation state, the FPGA processes the feature data according to the Nth processing type and the Nth weight parameters to acquire the (N + 2) th feature of the image;

the FPGA stores the (N + 2) th feature to the DDR;

if information for representing the (N + 2) th feature to be successfully stored is received, the FPGA enters a current layer ending state of the Nth layer;

and when the Nth layer is the last layer of the convolutional neural network, the (N + 2) th feature stored in the DDR is the first feature.

The transition of 6 states of the nth layer of the corresponding convolutional neural network is state 1 → state 2 → state 3 → state 4 → state 5 → state 6.

In the embodiment of the invention, the feature data and the weight parameters of the image are both expressed in a three-dimensional matrix form, if the size of the feature data of the image is too large, the feature data needs to be processed in a blocking manner, and similarly, if the size of the weight parameters is too large, the weight parameters need to be blocked, and the feature data is processed by utilizing the weight of each block.

In the embodiment of the invention, the starting address of the storage position of the N-th feature in the DDR and the starting address of the storage position of the other features except the N-th feature in the DDR can be the same or different.

When the feature data needs to be processed in blocks, the FPGA also needs to read the feature parameters for many times, and the following method can be specifically adopted:

the FPGA determines that the characteristic data comprise M data blocks, wherein M is an integer greater than or equal to 1;

the FPGA sequentially acquires each data block of the M data blocks;

correspondingly, when the FPGA is in the weight parameter reading state, the nth weight parameter corresponding to the nth processing type is obtained from the ARM chip, including:

the FPGA sequentially acquires each set of weight parameters in the L sets of weight parameters.

For example, the number of blocks for acquiring feature data from the ARM chip by the FPGA is 4, the FPGA acquires respective storage addresses of the 4 blocks from the ARM chip, and reads the feature data 4 times; the weight parameters acquired by the FPGA from the ARM chip comprise 6 blocks, the FPGA acquires respective storage addresses of the 6 weight parameter blocks from the ARM chip, and the weight parameters are read in 6 times.

Because the image processed by the 0 th layer is acquired by the sensor, the image acquired by the sensor does not need to be processed in a blocking way, the condition that the characteristic data or the weight parameter needs to be read in a blocking way appears in the Nth layer of the convolutional neural network, and the FPGA processes the characteristic data in a calculation state by adopting the following mode:

sequentially taking i as 1 to M, processing the ith data block by the FPGA according to at least one group of weight parameters corresponding to the ith data block to obtain an ith part of the (N + 2) th characteristic, and storing the ith part to the DDR;

and when the i is M, if the information for representing that all the parts of the N +2 th feature are successfully stored in the DDR is received, the FPGA enters the current layer ending state of the Nth layer.

Since both the feature data and the weight parameter may include multiple blocks or no blocks, the feature data and the weight parameter include the following three cases:

in case 1, the feature data includes a plurality of blocks, and the weight parameter is not blocked;

case 2, the feature data is not partitioned, and the weight parameter includes a plurality of partitions;

in case 3, the feature data includes a plurality of blocks, and the weight parameter includes a plurality of blocks.

For the case 1, for example, the feature data includes 4 blocks, the FPGA first acquires the 1 st block of the feature data, acquires the weight parameter, processes the 1 st block of the feature data by using the weight parameter, acquires the 1 st part of the N +2 th feature, then the FPGA acquires the 2 nd block of the feature data, processes the 2 nd block of the feature data by using the weight parameter, acquires the 2 nd part of the N +2 th feature, sequentially processes the 3 rd block and the 4 th block of the feature data, and acquires the 3 rd part and the 4 th part of the N +2 th feature, respectively, and in the process, the FPGA only needs to acquire the feature parameter 1 time. At this time, the transition of 6 states of the convolutional neural network corresponding to case 1 is state 1 → state 2 → state 3 → state 4 → state 5 → state 3 → state 5 → state 6.

For the case 3, for example, the feature data and the weight parameter both include 2 blocks, the FPGA first acquires the 1 st block of the feature data, acquires the 1 st block of the weight parameter, processes the 1 st block of the feature data using the 1 st block of the weight parameter, acquires the 1 st part of the N +2 th feature, then the FPGA acquires the 2 nd block of the feature data, processes the 2 nd block of the feature data using the 1 st block of the weight parameter, acquires the 2 nd part of the N +2 th feature, then the FPGA acquires the 1 st block of the feature data again, acquires the 2 nd block of the weight parameter, processes the 1 st block of the feature data using the 2 nd block of the weight parameter, acquires the 3 rd part of the N +2 th feature, finally the FPGA acquires the 2 nd block of the feature data again, processes the 2 nd block of the feature data using the 2 nd block of the weight parameter, the 4 th part of the N +2 th feature is obtained. At this time, the transition of 6 states of the convolutional neural network corresponding to case 3 is state 1 → state 2 → state 3 → state 4 → state 5 → state 3 → state 5 → state 6.

In the embodiment of the invention, both the ARM chip and the FPGA can be used as control units, when the ARM chip is used as a main control unit, the FPGA is controlled to process the image and acquire the configuration parameters of the convolutional neural network, when the FPGA is used as a main control unit, the FPGA actively acquires the configuration parameters from the ARM chip and controls the FPGA to process the image through the convolutional neural network.

As shown in fig. 5, a second embodiment of the present invention provides an image processing apparatus 200, including:

a receiving module 201, configured to receive a first instruction sent by an advanced reduced instruction set processor ARM chip, where the first instruction is used to instruct the apparatus to process an acquired image by using a convolutional neural network;

an obtaining module 202, configured to obtain configuration parameters of the convolutional neural network by accessing the ARM chip;

and the convolutional neural network processing module 203 is configured to process the image according to the configuration parameters of the convolutional neural network to obtain the first feature of the image.

The storage module stores the second feature to the DDR;

Specifically, the image processing apparatus may be an FPGA, and when the image processing apparatus is the FPGA, the FPGA further includes a control module, and the control module sends the first control signal read bias start addr, the second control signal read bias start size, the third control signal read bias start, and the fourth control signal read bias finish to the obtaining module. The first control signal is used for providing a starting address of read characteristic data in the DDR for the obtaining module, the second control signal is used for providing the size of the read characteristic data in the DDR for the obtaining module, the third control signal controls the obtaining module to start reading, and the fourth control signal controls the obtaining module to stop reading. In addition, a signal which is generated by the control module and used for controlling the reading weight is also sent to the acquisition module; the signal generated by the control module for controlling the storage characteristic is sent to the storage module.

The acquisition module sends parameters such as the size of the image, the number of channels, the size of the weight, the number of the weights and the like to the convolutional neural network processing module. When the feature data or the weight parameter is processed in a blocking mode, the control module can determine whether the reading range of the feature data or the weight parameter block is expanded or not according to the currently processed block part, and therefore it is guaranteed that each part of the calculated features can form a complete feature.

The FPGA also comprises a register module which is used for storing the layer number of the convolutional neural network and controlling the generation of the enabling signal of the convolutional neural network.

the storage module stores the N +2 th feature to the DDR;

the obtaining module determines that the feature data comprises M data blocks, wherein M is an integer greater than 1;

the obtaining module obtains each data block of the M data blocks in sequence;

Referring to fig. 6, a third embodiment of the present invention provides an image processing apparatus 400, including:

at least one processor 402, and a memory 403 coupled to the at least one processor 402;

the memory 403 stores instructions executable by the at least one processor 301, and the at least one processor 402 performs the steps of the method as described in the above method embodiments by executing the instructions stored by the memory 403.

Optionally, the processor 402 may specifically include a Central Processing Unit (CPU) and an Application Specific Integrated Circuit (ASIC), which may be one or more integrated circuits for controlling program execution, may be a hardware circuit developed by using a Field Programmable Gate Array (FPGA), and may be a baseband processor.

Optionally, processor 402 may include at least one processing core.

Optionally, the apparatus further includes a memory 403, and the memory 403 may include a Read Only Memory (ROM), a Random Access Memory (RAM), and a disk memory. The memory 403 is used for storing data required by the processor 402 during operation.

An embodiment of the present invention provides a computer-readable storage medium, including:

the computer-readable storage medium has stored thereon computer instructions which, when executed by at least one processor of the computer apparatus, implement the method as described in the first embodiment.

The technical scheme in the embodiment of the invention at least has the following technical effects or advantages:

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An image processing method, characterized in that the method comprises:

the method comprises the steps that a Field Programmable Gate Array (FPGA) receives a first instruction sent by an advanced reduced instruction set processor (ARM) chip, the first instruction is used for instructing the FPGA to process an acquired image by utilizing a convolutional neural network, and configuration parameters of the convolutional neural network are stored in the ARM chip;

the FPGA processes the image according to the configuration parameters of the convolutional neural network to obtain a first characteristic of the image;

when the FPGA processes the image by utilizing the Nth layer of the convolutional neural network, wherein N is a positive integer, the method comprises the following steps:

after the Nth processing type is obtained, the FPGA enters an image feature reading state, when the FPGA is in the image feature reading state, a storage address of the feature data in the DDR is obtained from configuration parameters of the convolutional neural network by accessing the ARM chip, and the FPGA reads the feature data according to the storage address; when N is 1, the feature data is a second feature;

the FPGA stores the (N + 2) th feature to the DDR;

2. The method of claim 1, wherein the configuration parameters at least include the number of layers of the convolutional neural network, the type of processing required for each layer of the convolutional neural network, a weight parameter corresponding to each processing type, and related parameters of an image that can be processed by the convolutional neural network, wherein the processing types include convolutional processing, pooling processing, and sampling processing, the weight parameters include a weight parameter for convolutional processing, a stride parameter for pooling processing, and a stride parameter for sampling processing, and the related parameters include a size parameter of the image and a storage location of a feature of the image in a double-rate synchronous dynamic random access memory (DDR) connected to the FPGA.

3. The method of claim 2, wherein the FPGA processing the image using layer 0 of the convolutional neural network comprises:

after receiving information used for representing that the reading of the weight parameters is completed, the FPGA enters a calculation state, and when the FPGA is in the calculation state, the image indicated in the first instruction is processed according to the first processing type and the first weight parameters to obtain a second feature of the image, wherein the image is an image sent by a sensor connected with the FPGA;

the FPGA stores the second feature to the DDR;

4. The method of claim 3, wherein when the FPGA is in the image feature reading state, the FPGA reads the feature data according to the storage address, comprising:

the FPGA sequentially acquires each data block of the M data blocks;

5. The method of claim 4, wherein the FPGA enters a computation state, and the FPGA processes the feature data according to the Nth processing type and the Nth weight parameter to obtain an N +2 th feature of the image while in the computation state, comprising:

6. An image processing apparatus, characterized in that the apparatus comprises:

the device comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a first instruction sent by an advanced reduced instruction set processor (ARM) chip, the first instruction is used for indicating the device to process an acquired image by using a convolutional neural network, and configuration parameters of the convolutional neural network are stored in the ARM chip;

the convolutional neural network processing module is used for processing the image according to the configuration parameters of the current layer of the convolutional neural network when an enabling signal of any layer in the convolutional neural network is received, and acquiring a first feature of the image after the image is processed according to the configuration parameters of each layer of the convolutional neural network;

when the device processes the image by utilizing the Nth layer of the convolutional neural network, wherein N is a positive integer, the device comprises the following steps:

after the Nth processing type is obtained, the device enters an image feature reading state, when the device is in the image feature reading state, the obtaining module obtains an initial address of a position for storing feature data of the image and the byte number of the feature data from configuration parameters of the convolutional neural network by accessing the ARM chip, the device determines a storage address of the feature data in the DDR according to the initial address, the byte number and the value of N, and the obtaining module reads the feature data according to the storage address; when N is 1, the feature data is a second feature;

the storage module stores the N +2 th feature to the DDR;

7. The apparatus of claim 6, wherein the configuration parameters at least include a number of layers of the convolutional neural network, a type of processing required for each layer of the convolutional neural network, a weight parameter corresponding to each type of processing, and related parameters of an image that can be processed by the convolutional neural network, wherein the types of processing include convolutional processing, pooling processing, and sampling processing, the weight parameters include a weight parameter for convolutional processing, a step parameter for pooling processing, and a step parameter for sampling processing, and the related parameters include a size parameter of the image and a storage location of the image in a double data rate synchronous dynamic random access memory (DDR) connected to the apparatus.

8. The apparatus of claim 7, further comprising a storage module, wherein the apparatus, when processing the image using layer 0 of the convolutional neural network, comprises:

when the device is in an idle state, if an enable signal of the 0 th layer is received, the device enters a configuration parameter reading state, and the enable signal of the 0 th layer is generated by the device according to the first instruction;

after receiving information used for representing that the reading of the weight parameters is completed, the device enters a calculation state, and when the device is in the calculation state, the convolutional neural network processing module processes the image indicated in the first instruction according to the first processing type and the first weight parameters to acquire a second feature of the image, wherein the image is an image sent by a sensor connected with the device;

the storage module stores the second feature to the DDR;

and if the convolutional neural network processing module receives information used for representing that the second feature is successfully stored, the device enters a current layer ending state of the layer 0.

9. The apparatus according to claim 8, wherein when the apparatus is in the image feature reading state, the obtaining module reads the feature data according to the storage address, and includes:

the obtaining module obtains each data block of the M data blocks in sequence;

10. The apparatus of claim 9, wherein the apparatus enters a computing state in which the convolutional neural network processing module processes the feature data according to the nth processing type and the nth weight parameter to obtain an N +2 feature of the image, comprising:

11. A computer-readable storage medium, comprising:

the computer-readable storage medium having stored thereon computer instructions which, when executed by at least one processor of the computer apparatus, implement the method of any one of claims 1-5.