CN111832629A

CN111832629A - FPGA-based fast-RCNN target detection method

Info

Publication number: CN111832629A
Application number: CN202010579892.0A
Authority: CN
Inventors: 王堃; 王铭宇; 吴晨
Original assignee: Chengdu Star Innovation Technology Co ltd
Current assignee: Chengdu Star Innovation Technology Co ltd
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2020-10-27

Abstract

The invention discloses a fast-RCNN target detection method based on an FPGA (field programmable gate array), which relates to the field of intelligent identification and comprises the steps of preprocessing a data set; constructing a fast-RCNN model; loading the data set into a master-RCNN model, and customizing an FPGA according to the master-RCNN model; training a master-RCNN model by utilizing a customized FPGA; testing according to the training result of the faster-RCNN model, if the training result is lower than the average accuracy AP threshold, modifying the parameters, and testing the training result after training again until the training result reaches the threshold; inputting a picture to be detected, and performing target identification by using the trained fast-RCNN model. According to the method, each processing module of the FPGA is customized according to the master-RCNN model, so that the object can be accurately identified by the master-RCNN model, and the problem of low identification speed of the master-RCNN model can be solved, and therefore higher detection speed, higher detection precision, better performance and lower power consumption are realized.

Description

FPGA-based fast-RCNN target detection method

Technical Field

The invention relates to the field of intelligent identification, in particular to a fast-RCNN target detection method based on an FPGA.

Background

With the development of intelligent identification technology, a terminal system needs to detect surrounding objects, and particularly in the field of automatic driving, the target identification must be fast and accurate in consideration of personal safety. Therefore, the method for quickly and accurately detecting the target has important practical significance.

The existing target detection of automatic driving requires high precision and high speed, and the existing target detection algorithm based on deep learning, such as SSD, YOLO and the like, has high speed but insufficient precision. In addition, the fast-RCNN algorithm has enough accuracy but not fast enough speed in target detection. Each type of assay is more or less specific with some problems. For example, in the actual driving process, once the target detection is delayed or inaccurate, the safety of the human body is greatly damaged.

In addition, in the field of computer vision, the GPU is mostly adopted as a processor at present, the processing speed is low, the power consumption is high, the heat productivity is large, and a fan is required to be used as a main chip for heat dissipation, so that the GPU cannot be used as a main processor for real-time target detection due to the factors, and another solution is required.

Disclosure of Invention

The invention aims to: the method is based on the existing deep learning network and computer vision technology, and the FPGA is used for deep customization according to the fast-RCNN model, so that the FPGA can realize accurate object recognition and solve the problem of low recognition speed, parallel calculation of the fast-RCNN model is realized, and the purpose of accelerating the fast-RCNN target detection is achieved.

The technical scheme adopted by the invention is as follows:

the invention relates to a fast-RCNN target detection method based on FPGA, which specifically comprises the following steps:

step 1: acquiring an existing data set for target detection, and preprocessing the data set;

step 2: constructing a fast-RCNN model;

and step 3: loading the data set in the step 1 into a master-RCNN model, and customizing an FPGA according to the master-RCNN model;

and 4, step 4: training the faster-RCNN model in the step 3 by using the customized FPGA;

and 5: setting an average accuracy AP threshold, testing according to the training result of the fast-RCNN model, modifying parameters if the testing result is lower than the average accuracy AP threshold, testing the training result after the step 4 is carried out again until the testing result reaches the threshold;

step 6: inputting a picture to be detected, and performing target identification by using the trained fast-RCNN model.

Further, the step of building the master-RCNN model in the step 1 is as follows:

step 21: building Conv layers for extracting a characteristic diagram of the picture, wherein the characteristic diagram comprises three layers, namely Conv, pooling and relu;

step 22: building a region generation network layer, and generating a detection frame by using the region generation network layer, namely preliminarily extracting a target candidate region in the picture;

step 23: building a region-of-interest pooling layer, acquiring the feature map of the step 21 and the target candidate region of the step 22, and extracting candidate feature maps after information is integrated;

step 24: and building a classification layer, obtaining the final accurate position of the detection frame by using frame regression, and judging the target category through the candidate feature map.

Furthermore, the FPGA is customized according to the characteristics of the master-RCNN model and comprises a controller, a Conv module, an area generation network module, a classification module, an on-chip storage and an off-chip storage, wherein the Conv module, the area generation network module and the classification module are controlled by the controller.

Further, the Conv module comprises 16 convolution processing units, and convolution parallel optimization is realized by using a convolution parallel algorithm and a pipeline technology.

Further, the area generation network module receives the data processed by the Conv module, and performs data interaction with on-chip storage and off-chip storage through a weight cache region.

Furthermore, the classification module receives data processed by the area generation network module, performs singular value decomposition, and performs data interaction with on-chip storage and off-chip storage through the weight cache region.

Further, the on-chip storage uses a block random access memory, and the off-chip storage uses a dynamic random access memory.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. the invention relates to a fast-RCNN target detection method based on an FPGA (field programmable gate array). Each processing module of the FPGA is customized according to a fast-RCNN model, parallel computation of the fast-RCNN model is realized by using a convolution parallel algorithm and a pipeline technology, and a large matrix generated in the computation of the fast-RCNN model is decomposed and then computed by adopting singular value decomposition, so that the aim of high target detection speed of the fast-RCNN model is fulfilled.

2. The invention relates to a fast-RCNN target detection method based on an FPGA (field programmable gate array). The method identifies and continuously learns the characteristics of the shape, the color and the like of a target object by utilizing the existing deep learning network and the computer vision technology, can identify the target under a plurality of scenes, and deeply customizes a fast-RCNN model by utilizing the FPGA, thereby realizing higher detection speed, higher detection precision, better performance and lower power consumption.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without creative efforts, and the proportional relationship of each component in the drawings in the present specification does not represent the proportional relationship in the actual material selection design, and is only a schematic diagram of the structure or the position, in which:

FIG. 1 is a flow chart of a method of object detection according to the present invention;

FIG. 2 is a diagram of an FPGA architecture customized according to the master-RCNN model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

The present invention will be described in detail with reference to the accompanying drawings.

Example one

As shown in fig. 1, the invention is a fast-RCNN target detection method based on FPGA, which specifically comprises the following steps:

step 2: constructing a fast-RCNN model;

In the invention, for the data set for target detection, a user can select an existing data set to download a PASCALVOC data set, which is a commonly used target detection data set, the data set comprises about 10000 marked pictures with a bounding box and 20 categories, all the pictures are adjusted to be consistent in size by preprocessing the pictures, the data are marked manually, and the data set is divided into a training set and a test set according to a proportion of 8: 2.

In step 5, the AP is a percentage smaller than 1, the closer the value is to 1, the better the effect is, but the AP value of the current target detection model is approximately 40% to 50%, in this embodiment, the average accuracy AP threshold is set to 42%, a custom FPGA is used to train the fast-RCNN model and a data centralized test set is used to perform testing, if the average accuracy AP is lower than 42%, the parameters are modified, the training result is tested after the training is performed again, if the average accuracy AP is higher than or equal to 42%, a picture to be detected is input, and the trained fast-RCNN model is used to perform target identification.

Example two

This example is a further illustration of the present invention.

The step of building the master-RCNN model in the step 1 is as follows:

In this embodiment, the family-RCNN model includes Conv layers, a region generation network layer, an interest region pooling layer, and a classification layer, the Conv layers are used to extract a feature map of an image, the region generation network layer, the interest region pooling layer, and the classification layer subsequent to the feature map, and parameters of the Conv layers are set as: the size of the convolution kernel is 3 x 3, the step size is 1, the pad filling is 1, the picture is converted into a matrix, and after Conv layers are calculated, the obtained matrix is the characteristic diagram. In step 24, the classification layer performs full join operation on the candidate feature map, determines the target class by using a Softmax function, and obtains the final accurate position of the detection frame by using frame regression.

EXAMPLE III

This example is a further illustration of the present invention.

As shown in fig. 2, in this embodiment, based on the above embodiment, in a preferred embodiment of the present invention, the FPGA is customized according to characteristics of the fast-RCNN model, and includes a controller, a Conv module, an area generation network module, a classification module, an on-chip storage and an off-chip storage, and the Conv module, the area generation network module, and the classification module are all controlled by the controller.

FPGAs, i.e., FPGA chips, a semi-custom circuit in an application specific integrated circuit, are programmable logic arrays.

In a preferred embodiment of the present invention, the Conv module includes 16 convolution processing units, and the convolution parallel optimization is implemented by using a convolution parallel algorithm and a pipeline technique.

In the invention, the Conv module is customized according to the characteristics of Conv layers in the master-RCNN model, and comprises 16 convolution processing units, different convolution kernels of each convolution processing unit are independently calculated, the 16 convolution processing units are used for simultaneous calculation, and then 16 convolution outputs can be generated in one clock cycle through a pipeline technology.

The convolution parallel algorithm is trained by dividing the parameters of a convolution processing unit into a plurality of parts. When the FPGA is used for training the master-RCNN model, the parameters of the master-RCNN model are divided into four parts to be trained simultaneously, and the data parallelism is adopted in the method, the training data are divided on different operation units, different models are obtained through training respectively, and then the models are fused.

In a preferred embodiment of the present invention, the area generation network module receives data processed by the Conv module, and performs data interaction with on-chip storage and off-chip storage through the weight buffer area.

In the invention, when the FPGA is used for training the family-RCNN model, the data received by the area generation network module is a characteristic diagram calculated and extracted by the Conv module, the area generation network module processes the data and then transmits the processed data to the classification module, and the data is interacted with the on-chip storage and the off-chip storage through the weight cache region, namely, the data is read through the controller and then stored into the weight cache region, and simultaneously, the weight index is established, when the weight of the weight cache region needs to be read, the query is firstly carried out from the weight index, and then the weight is read.

In a preferred embodiment of the present invention, the classification module receives data processed by the area generation network module, performs singular value decomposition, and performs data interaction with on-chip storage and off-chip storage through the weight cache region.

In the invention, when the FPGA is used for training the family-RCNN model, the classification module receives data of a region generation network module, and in the calculation process, when the candidate characteristic diagram is used for carrying out full connection operation, a large matrix is obtained through calculation, the large matrix is decomposed into smaller matrices through singular value decomposition, and data interaction is carried out with on-chip storage and off-chip storage through a weight cache region.

Singular value decomposition, i.e. the original larger matrix is decomposed into smaller matrices, the formula is as follows:

A_m×n≈U_m×r∑_r×rV^T _r×n

where A is an m × n order matrix, and r is a number much smaller than m and n. Decomposing a large A matrix into three small matrixes through a singular value decomposition technology: u, Σ, and V matrices.

In a preferred embodiment of the present invention, the on-chip storage uses a block random access memory, and the off-chip storage uses a dynamic random access memory. The commonly used weight buffers are stored in the chip, and the less commonly used weight buffers are stored out of the chip.

In summary, an FPGA chip is a semi-custom circuit in an asic, and is a programmable logic array. The FPGA chip can be customized according to the fast-RCNN model, so that the fast detection speed, the high detection precision, the high performance and the low power consumption are realized. The present invention is developed in the context of automatic driving, and is used for solving the problems existing in the target recognition of the automatic driving at present, but because of the general applicability of the system and the method of the present invention, the present invention is not limited to the target recognition in the automatic driving, and the present invention is also used for recognizing a moving object.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be made by those skilled in the art without inventive work within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope defined by the claims.

Claims

1. A fast-RCNN target detection method based on FPGA is characterized by comprising the following steps of:

step 2: constructing a fast-RCNN model;

2. The FPGA-based fast-RCNN target detection method according to claim 1, characterized in that: the step of building the master-RCNN model in the step 2 is as follows:

3. The FPGA-based fast-RCNN target detection method of claim 2, characterized in that: the FPGA is customized according to the characteristics of the master-RCNN model and comprises a controller, a Conv module, a region generation network module, a classification module, an on-chip storage, an off-chip storage, an input cache region and a weight cache region, wherein the Conv module, the region generation network module and the classification module are all controlled by the controller.

4. The FPGA-based fast-RCNN target detection method of claim 3, characterized in that: the Conv module comprises 16 convolution processing units, and convolution parallel optimization is achieved by using a convolution parallel algorithm and a pipeline technology.

5. The FPGA-based fast-RCNN target detection method of claim 3, characterized in that: and the area generation network module receives the data processed by the Conv module and performs data interaction with on-chip storage and off-chip storage through a weight cache region.

6. The FPGA-based fast-RCNN target detection method of claim 3, characterized in that: the classification module receives data processed by the area generation network module, carries out singular value decomposition, and carries out data interaction with on-chip storage and off-chip storage through the weight cache region.

7. The FPGA-based fast-RCNN target detection method of claim 3, characterized in that: the on-chip storage uses a block random access memory, and the off-chip storage uses a dynamic random access memory.