WO2021077743A1 - 一种图像目标检测方法、系统、电子设备及存储介质 - Google Patents

一种图像目标检测方法、系统、电子设备及存储介质 Download PDF

Info

Publication number
WO2021077743A1
WO2021077743A1 PCT/CN2020/092828 CN2020092828W WO2021077743A1 WO 2021077743 A1 WO2021077743 A1 WO 2021077743A1 CN 2020092828 W CN2020092828 W CN 2020092828W WO 2021077743 A1 WO2021077743 A1 WO 2021077743A1
Authority
WO
WIPO (PCT)
Prior art keywords
bounding box
information
target detection
bounding
position information
Prior art date
Application number
PCT/CN2020/092828
Other languages
English (en)
French (fr)
Inventor
刘海威
董刚
梁玲燕
杨宏斌
曹其春
赵雅倩
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2021077743A1 publication Critical patent/WO2021077743A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the field of computer technology, and in particular to an image target detection method and system, an electronic device and a storage medium.
  • NMS Non-Maximum Suppression, non-maximum suppression algorithm
  • the convolution calculation is first performed through the neural network in the FPGA, and the CPU receives the convolution calculation result to complete the NMS calculation.
  • the CPU takes a long time to complete the NMS calculation due to the characteristics of the serial calculation.
  • the purpose of this application is to provide an image target detection method, system, an electronic device, and a storage medium, which can reduce the amount of communication data between the parallel processing chip and the CPU in the image target detection process, and improve the efficiency of image target detection.
  • the present application provides an image target detection method applied to parallel processing chips of heterogeneous platforms.
  • the image target detection method includes:
  • the bounding box information of the bounding box to be output is transmitted to the central processing unit CPU of the heterogeneous platform, so that the central processing unit CPU outputs the image target detection result.
  • the bounding box information further includes bounding box classification probability information
  • storing the bounding box information in the random access memory RAM includes:
  • the bounding box information is written into the random access memory RAM in a descending order of the bounding box classification probability information.
  • multiple parallel execution of the local maximum search operation based on the non-maximum value suppression algorithm on the read bounding box position information to obtain a preset number of bounding boxes to be output includes:
  • a local maximum search operation based on a non-maximum value suppression algorithm is performed on all the bounding box position information in parallel according to the K reference bounding box position information, to obtain a preset number of the bounding boxes to be output.
  • it also includes:
  • the target bounding box position information is deleted from the random access memory RAM.
  • the parallel processing chip is a field programmable logic gate array FPGA, an ASIC chip or an embedded chip.
  • the bounding box position information includes bounding box X-axis direction coordinates, bounding box Y-axis direction coordinates, bounding box width, and bounding box height.
  • the bounding box information further includes the foreground confidence of the bounding box of each target object
  • storing the bounding box information in the random access memory RAM includes:
  • This application also provides an image target detection system, which is applied to a parallel processing chip of a heterogeneous platform, and the image target detection system includes:
  • An information receiving module configured to receive bounding box information of multiple target object bounding boxes obtained by performing target recognition on an image by a convolutional neural network; wherein the bounding box information includes bounding box position information;
  • An information storage module for storing the bounding box information in a random access memory RAM
  • the bounding box selection module is used to sequentially read the bounding box position information in the bounding box information in the random access memory RAM, and execute non-maximum value suppression based on the read bounding box position information in multiple ways in parallel
  • the local maximum search operation of the algorithm obtains a preset number of bounding boxes to be output;
  • the result output module is used to transmit the bounding box information of the bounding box to be output to the central processing unit CPU of the heterogeneous platform, so that the central processing unit CPU outputs the image target detection result.
  • the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed, the steps performed by the above-mentioned image target detection method are realized.
  • the present application also provides an electronic device including a memory and a processor, wherein a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the steps performed by the above-mentioned image target detection method are implemented.
  • the present application provides an image target detection method, which is applied to parallel processing chips of heterogeneous platforms.
  • the image target detection method includes: receiving bounding boxes of multiple target object bounding boxes obtained by performing target recognition on an image by a convolutional neural network Information; wherein the bounding box information includes bounding box position information; storing the bounding box information in a random access memory RAM; sequentially reading the bounding box in the bounding box information in the random access memory RAM Position information, and execute the local maximum search operation based on the non-maximum suppression algorithm in multiple parallel to the read bounding box position information to obtain a preset number of bounding boxes to be output; the bounding box information of the bounding box to be output It is transmitted to the central processing unit CPU of the heterogeneous platform, so that the central processing unit CPU outputs the image target detection result.
  • This application first stores the bounding box information in the RAM of the parallel processing chip after the convolutional neural network performs target recognition on the image to obtain the bounding box information, and uses the feature of the parallel processing chip to process computational tasks in parallel.
  • the information performs a local maximum search operation based on a non-maximum suppression algorithm to obtain a preset number of bounding boxes to be output.
  • the bounding box information of the bounding box to be output is transmitted to the central processing unit CPU, and the CPU outputs the image detection result.
  • the above process uses the parallel processing chip to realize the calculation of the non-maximum value suppression algorithm NMS, and transmits the bounding box information of the bounding box to be output after the local maximum search operation to the CPU.
  • the solution of the present application can reduce the amount of communication data between the parallel processing chip and the CPU in the image target detection process, and improve the efficiency of image target detection.
  • This application also provides an image target detection system, an electronic device, and a storage medium, which have the above-mentioned beneficial effects, and will not be repeated here.
  • FIG. 1 is a flowchart of an image target detection method provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a data format of bounding box information provided by an embodiment of this application.
  • FIG. 3 is a schematic diagram of a process for implementing NMS calculation using a CPU+FPGA heterogeneous platform provided by an embodiment of the application;
  • FIG. 4 is a schematic diagram of a principle of sorting bounding box information provided by an embodiment of this application.
  • FIG. 5 is a schematic diagram of the principle of an FPGA accelerated NMS algorithm provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of an image target detection system provided by an embodiment of the application.
  • FIG. 1 is a flowchart of an image target detection method provided by an embodiment of the application.
  • S101 Receive bounding box information of multiple target object bounding boxes obtained by performing target recognition on an image by a convolutional neural network
  • the execution body of this embodiment may be a parallel processing chip of a heterogeneous platform.
  • the heterogeneous processing platform may include a main processing chip and a co-processing chip.
  • the heterogeneous processing platform mentioned in this embodiment may be a CPU+GPU ( Graphics Processing Unit, graphics processor) heterogeneous acceleration framework, can also be CPU+FPGA (Field Programmable Gate Array, field programmable logic gate array) heterogeneous acceleration framework, or CPU+ASIC (Application Specific Integrated Circuit, dedicated Integrated circuit) heterogeneous acceleration framework or CPU+embedded chip heterogeneous acceleration framework.
  • the main processing chip in the heterogeneous platform is a CPU (Central Processing Unit, central processing unit), and the co-processing chip in the heterogeneous platform may be a parallel processing chip that can realize parallel computing and pipeline processing. This embodiment does not apply to the parallel processing chip.
  • the specific types are limited.
  • Image target detection refers to identifying and detecting specific types of things in an image.
  • Image target detection is also called target detection.
  • multiple bounding boxes bbox
  • each bounding box corresponds to a score.
  • redundancy needs to be removed. Only the bounding box with the highest score is kept.
  • NMS algorithm Non-Maximum Suppression
  • NMS is usually used to remove redundant bounding boxes in target detection.
  • the convolution calculation of the convolutional neural network may include the target detection object.
  • the bounding box of the target object such as a face, a QR code, or a vehicle. Since there may be some overlap between the above-mentioned object bounding boxes, it is necessary to filter out redundant object bounding boxes through the operation of this embodiment.
  • This embodiment does not limit the type of the bounding box of the target object, and each type of target detection object in an image can correspond to multiple bounding boxes of the target object.
  • the bounding box information mentioned in this embodiment may include bounding box position information.
  • the bounding box information is information used to describe the position and size of the bounding box of the target object. According to the bounding box information, the position of the bounding box of the target object in the image can be determined.
  • the specific bounding box position information may include bounding box X-axis direction coordinates, bounding box Y-axis direction coordinates, bounding box width, and bounding box height.
  • the parallel processing chip can store the bounding box information in the random access memory RAM (Random Access Memory) of the parallel processing chip after receiving the bounding box information. Specifically, in this step, the bounding box information can be combined with the bounding box ID of the target object. The corresponding relationship is stored in RAM.
  • RAM Random Access Memory
  • S103 Read the bounding box position information in the bounding box information in the random access memory RAM in sequence, and execute the local maximum search based on the non-maximum value suppression algorithm in multiple and parallel to the read bounding box position information Operate to obtain a preset number of bounding boxes to be output;
  • the purpose of this step is to determine the effective bounding box of the target object. For example, there are bounding box A, bounding box B, and bounding box C in the image, and any two of the three bounding boxes of bounding box A, bounding box B and bounding box C are present.
  • the overlapping area ratio of each bounding box is greater than 80%.
  • bounding box A, bounding box B, and bounding box C are all target bounding boxes of the same object in the image.
  • the optimal bounding box is selected for output, so the bounding box with the highest score (that is, the bounding box classification probability) can be selected as the bounding box to be output.
  • the parallel processing chip can sequentially read the bounding box position information in the bounding box information in the random access memory RAM in a preset order. Based on the characteristics of the parallel computing and pipeline processing of the parallel processing chip, it can be realized based on non- The local maximum search operation of the maximum suppression algorithm obtains a preset number of bounding boxes to be output. Specifically, the foregoing preset number may be set before performing the local maximum search operation, that is, the number of bounding boxes to be output needs to be obtained.
  • the operation process of obtaining a preset number of bounding boxes to be output in this embodiment may include the following steps:
  • Step 1 Determine the number of parallel processing channels K of the parallel processing chip
  • Step 2 Read the position information of the bounding box in order from high to low according to the classification probability information of the bounding box to obtain K reference bounding box position information; where any two of the reference bounding box position information overlap The area is less than the preset value;
  • Step 3 Perform a local maximum search operation based on a non-maximum value suppression algorithm on all the bounding box location information in parallel according to the K reference bounding box location information, to obtain a preset number of the bounding boxes to be output.
  • the target bounding box position information is deleted from the random access memory RAM.
  • the bounding box information input in the RAM is read out, and the overlapping area of the bounding boxes needs to be calculated and the duplicate boxes are removed.
  • the parallel processing chip can perform K parallel processing, and K can be any value.
  • the first address (ie address 0) is the box with the highest score, which is directly reserved and sent to the output RAM for storage.
  • the bounding box information of the second address Read the bounding box information of the second address and calculate the overlapping area ratio with the bounding box information of the first address.
  • the overlapping area ratio is less than the set threshold, it will be retained and sent to the output RAM for storage; when the overlapping area is greater than the set
  • the threshold is set, the bounding box corresponding to the address is deemed to be a repeated box, and the corresponding information is discarded.
  • S104 Transmit the bounding box information of the bounding box to be output to the central processing unit CPU of the heterogeneous platform, so that the central processing unit CPU outputs the image target detection result.
  • this step is based on the completion of the calculation of the random access memory RAM data, that is, the removal of duplicate frames has been completed, and the NMS calculation process has been completed.
  • the bounding box information of the bounding box to be output can be output to CPU.
  • the method for accelerating NMS calculation by the parallel processing chip proposed in this embodiment only needs to transmit the final effective bounding box data to the CPU, which greatly reduces the communication overhead.
  • the bounding box information is first stored in the RAM of the parallel processing chip, and the characteristic of the parallel processing chip to be able to process calculation tasks in parallel pipeline is used for all the boundaries.
  • the box information performs a local maximum search operation based on a non-maximum value suppression algorithm to obtain a preset number of bounding boxes to be output.
  • the bounding box information of the bounding box to be output is transmitted to the central processing unit CPU, and the CPU outputs the image detection result.
  • the above process uses the parallel processing chip to realize the calculation of the non-maximum value suppression algorithm NMS, and transmits the bounding box information of the bounding box to be output after the local maximum search operation to the CPU. Compared with the related technology, all the bounding box position information is transmitted to The CPU solution, the solution of this embodiment can reduce the amount of communication data between the parallel processing chip and the CPU in the image target detection process, and improve the efficiency of image target detection.
  • the above-mentioned bounding box information may also include bounding box classification probability information, and the bounding box classification probability information is information used to describe the probability that the image type in the target object frame is the target detection object type.
  • bounding box classification probability information is information used to describe the probability that the image type in the target object frame is the target detection object type. The higher the classification probability of the bounding box, the higher the probability that the region corresponding to the target object frame includes the target detection object, and the classification probability information of the bounding box is equivalent to the score of the target object frame.
  • the process of storing the bounding box information in the random access memory RAM in the embodiment corresponding to FIG. 1 may be: writing the bounding box information in the order of the bounding box classification probability information from high to low.
  • the random access memory RAM When using traditional sorting algorithms to sort the bounding box scores, a lot of hardware resources are required.
  • This embodiment proposes a sorting method based on hardware addresses, so that the higher the bounding box classification probability, the larger the address value stored in the RAM of the bounding box information, that is, the higher the address. After the write is completed, the data is read from the sort RAM in the order of address from high to low.
  • the above-mentioned bounding box information may also include the foreground confidence of the bounding box of each target object, and the foreground confidence is the probability information used to describe the image corresponding to the bounding box of the target object as the foreground image. The higher the foreground confidence, the greater the probability that the image corresponding to the bounding box of the target object is the foreground image.
  • the process of storing the bounding box information in the random access memory RAM in the embodiment corresponding to FIG. 1 may be storing the bounding box information whose foreground confidence is greater than or equal to a preset value in the random access memory.
  • RAM the foregoing embodiment may first use the foreground confidence to filter each bounding box information, store the bounding box information whose foreground confidence is greater than or equal to the preset value, and eliminate those whose foreground confidence is less than the preset value. Bounding box information.
  • the parallel processing chip first performs target object detection based on the convolutional neural network. After the convolutional neural network completes convolution and other calculations, the classification and location information (ie, bounding box information) corresponding to different types of targets can be obtained.
  • the bounding box information can include Foreground confidence, the coordinates of the center point of the bounding box, the size of the bounding box, and the classification probability (ie score) of the bounding box. Please refer to FIG. 2.
  • FIG. 2 is a schematic diagram of a data format of bounding box information provided by an embodiment of the application.
  • Each category contains M target object bounding boxes, and the bounding box information corresponding to each target object bounding box includes score p_data (12bit), bounding box X-axis coordinate x_data (12bit), Y-axis coordinate y_data (12bit) , The width of the bounding box w_data (12bit), the height of the bounding box h_data (12bit).
  • the process of using the parallel processing chip to complete the NMS calculation can include: preprocessing the information such as the score and coordinate position of the bounding box calculated by the neural network, filtering out the data whose foreground confidence is less than the foreground threshold, and retaining the foreground confidence greater than or equal to For the foreground threshold data, the confidence level (16bit) is replaced with 0x1 at the same time, indicating that the bounding box is a valid box. Send the bounding box information with the foreground confidence greater than the preset value to the sorting module.
  • the sorting module completes the sorting of the scores from high to low, while retaining the bounding box coordinates and size information corresponding to the scores; after the sorting result is obtained, the highest score can be obtained Calculate the overlap area between the frame and the frame as the basis, and remove the frame with low score if it is greater than a certain threshold.
  • the parallel processing chip can also use K-way parallel mode to compare thresholds to complete the screening operation of invalid bounding boxes. It can also write the bounding box coordinates and size information that need to be retained into the output FIFO. When K-way is parallel, The threshold comparison process is repeated. After all bounding boxes are traversed, the final result output is obtained, and the NMS calculation is completed.
  • This embodiment is a method for accelerating the NMS algorithm based on a parallel processing chip.
  • the NMS algorithm is also implemented on the parallel processing chip, which can make full use of the parallel processing and pipeline of the parallel processing chip.
  • the processing characteristics improve the calculation speed of NMS.
  • a targeted method of using RAM write address to complete scoring sorting is also proposed, which can effectively reduce hardware resource consumption.
  • parallel processing chips are used for parallel computing and pipeline processing to implement NMS algorithms, and parallel processing chips are used to replace traditional CPUs to implement NMS algorithms. While increasing the calculation speed, it reduces the communication overhead between the processor (CPU) and FPGA hardware.
  • a parallel processing chip is used to implement the hardware structure of the NMS algorithm calculation, and RAM parameters can be flexibly configured according to different networks to adapt them to different requirements.
  • the method of using the RAM write address to complete the score sorting provided by this embodiment can directly use the score as the RAM write address, and then start reading data from the high address, which can quickly achieve sorting and reduce hardware resource consumption.
  • FIG. 3 is a schematic diagram of a process for implementing NMS calculation using a CPU+FPGA heterogeneous platform provided by an embodiment of the application.
  • the bounding box information of the bounding boxes of different types of target objects can be obtained.
  • the bounding box information can include the foreground confidence, the coordinates of the center point of the bounding box, the size of the bounding box, and the classification of the bounding box. Probability.
  • FIG. 4 is a schematic diagram of a sorting principle of bounding box information provided by an embodiment of this application. This embodiment proposes a sorting method based on hardware addresses, and the preprocessed data are all foreground bounding box data.
  • the data volume of each bounding box is 64 bits, of which the lower 48 bits are x_data, y_data, w_data and h_data, the 49th bit is the flag bit 1, and the remaining bits are 0.
  • the write address uses the score data (12bit) as the write address to write the above data into the sorting RAM with a depth of 4096 (2 ⁇ 12, assuming M ⁇ 4096) and a width of 64bit, so that the data with a higher score is stored in the sorting RAM The larger the address value, the higher the address.
  • the data is read from the sort RAM in the order of address from high to low, and the 49th bit is 1 for the foreground bounding box data, which is written into the input RAM until all M data are read.
  • FIG. 5 is a schematic diagram of the principle of an FPGA-accelerated NMS algorithm provided by an embodiment of the application.
  • the first address ie address 0
  • the first address is the box with the highest score, which is directly reserved and transferred to the output RAM for storage. Read the bounding box information of the second address and calculate the overlapping area ratio with the bounding box information of the first address.
  • the overlapping area ratio When the overlapping area ratio is less than the set threshold, it will be retained and sent to the output RAM for storage; when the overlapping area is greater than the set
  • the threshold When the threshold is set, the bounding box corresponding to the address is deemed to be a repeated box, and the corresponding information is discarded.
  • the bounding box corresponding to the address is deemed to be a repeated box, and the corresponding information is discarded.
  • the bounding box corresponding to the address is deemed to be a repeated box, and the corresponding information is discarded.
  • the first bounding box information read from the input RAM is directly sent to the output RAM for storage, and the 4 reference bounding box information used for each traversal is also directly stored in the output RAM.
  • the input RAM and the intermediate buffer RAM After all the data is calculated, the removal of the duplicate frame has been completed, the data in the output RAM is output to the CPU, and the NMS calculation process is all completed.
  • the method for accelerating NMS calculation by FPGA proposed in this embodiment only needs to transmit the final effective bounding box data to the CPU, which greatly reduces the communication overhead.
  • the foregoing embodiment is the NMS calculation implementation process for one type of target.
  • multiple types of targets can be simultaneously calculated by the NMS algorithm.
  • the NMS calculation process for different types of targets is the same, and an appropriate degree of parallelism can be selected according to hardware resource conditions.
  • the NMS calculation is implemented on the FPGA, and the characteristics of the parallel processing and pipeline processing of the FPGA are fully utilized, and the NMS calculation speed is improved.
  • This embodiment also uses the RAM write address to complete the sorting of the bounding box scores. Compared with the implementation of the traditional sorting algorithm, the consumption of hardware resources is greatly reduced.
  • the FPGA only transmits valid bounding box data to the CPU, which reduces the communication overhead between the FPGA and the CPU, and improves the overall efficiency of target detection.
  • FIG. 6 is a schematic structural diagram of an image target detection system provided by an embodiment of the application.
  • the system can include:
  • the information receiving module 100 is configured to receive bounding box information of multiple target object bounding boxes obtained by performing target recognition on an image by a convolutional neural network; wherein the bounding box information includes bounding box position information;
  • the information storage module 200 is configured to store the bounding box information in a random access memory RAM;
  • the bounding box selection module 300 is configured to sequentially read the bounding box position information in the bounding box information in the random access memory RAM, and perform multiple parallel execution based on non-maximum value on the read bounding box position information Suppress the local maximum search operation of the algorithm to obtain a preset number of bounding boxes to be output;
  • the result output module 400 is configured to transmit the bounding box information of the bounding box to be output to the central processing unit CPU of the heterogeneous platform, so that the central processing unit CPU outputs the image target detection result.
  • the bounding box information is first stored in the RAM of the parallel processing chip, and the characteristic of the parallel processing chip to be able to process computational tasks in parallel pipeline is used for all the boundaries.
  • the box information performs a local maximum search operation based on a non-maximum value suppression algorithm to obtain a preset number of bounding boxes to be output.
  • the bounding box information of the bounding box to be output is transmitted to the central processing unit CPU, and the CPU outputs the image detection result.
  • the above process uses the parallel processing chip to realize the calculation of the non-maximum value suppression algorithm NMS, and transmits the bounding box information of the bounding box to be output after the local maximum search operation to the CPU. Compared with the related technology, all the bounding box position information is transmitted to The CPU solution, the solution of this embodiment can reduce the amount of communication data between the parallel processing chip and the CPU in the image target detection process, and improve the efficiency of image target detection.
  • bounding box information also includes bounding box classification probability information
  • the information storage module 200 is specifically a module for writing the bounding box information into the random access memory RAM in a descending order of the bounding box classification probability information.
  • the bounding box selection module 300 includes:
  • a parallel processing channel number determining unit configured to determine the parallel processing channel number K of the parallel processing chip
  • the reference bounding box position determining unit is configured to read the bounding box position information in sequence from high to low according to the bounding box classification probability information to obtain K pieces of reference bounding box position information; wherein, any two of the benchmarks The overlapping area of the position information of the bounding box is smaller than the preset value;
  • the NMS calculation module is configured to perform a local maximum search operation based on a non-maximum value suppression algorithm on all the bounding box location information in parallel according to the K reference bounding box location information, to obtain a preset number of the bounding boxes to be output.
  • the redundant deletion module is configured to delete the target bounding box position information from the random access memory RAM when the target object bounding box corresponding to the target bounding box position information is not the bounding box to be output.
  • the parallel processing chip is a field programmable logic gate array FPGA, an ASIC chip or an embedded chip.
  • the bounding box position information includes bounding box X-axis direction coordinates, bounding box Y-axis direction coordinates, bounding box width, and bounding box height.
  • the bounding box information further includes the foreground confidence of the bounding box of each target object
  • the information storage module 200 is specifically configured to store the bounding box information whose foreground confidence is greater than or equal to a preset value to the random access memory RAM.
  • the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed, the steps provided in the above-mentioned embodiments can be implemented.
  • the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the present application also provides an electronic device, which may include a memory and a processor.
  • the memory stores a computer program.
  • the processor invokes the computer program in the memory, the steps provided in the foregoing embodiments can be implemented.
  • the electronic device may also include various network interfaces, power supplies and other components.

Abstract

一种图像目标检测方法、系统、电子设备及存储介质,该方法包括:接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;将边界框信息存储至随机存取存储器RAM;对读取的边界框信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;将待输出边界框的边界框信息传输至异构平台的中央处理器CPU,以便中央处理器CPU输出图像目标检测结果。本申请能够减少图像目标检测过程中并行处理芯片与CPU的通信数据量,提高图像目标检测的效率。

Description

一种图像目标检测方法、系统、电子设备及存储介质
本申请要求于2019年10月25日提交中国专利局、申请号为201911025107.0、发明名称为“一种图像目标检测方法、系统、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种图像目标检测方法、系统、一种电子设备及一种存储介质。
背景技术
在图像目标检测过程中,需要依据目标的种类分别生成边界框。对于同一类别的边界框,NMS(Non-Maximum Suppression,非极大值抑制算法)计算需要将本类别中所有边界框的得分进行排序,首先选择最高分及其对应的框,然后遍历其余的框,如果和当前最高分框的重叠面积大于一定阈值,就将该框删除。然后从下一类别中继续选一个得分最高的边界框,重复上述过程,最终得到所有类别的检测结果。
相关技术中,首先通过FPGA中神经网络进行卷积计算,CPU接收卷积计算结果进而完成NMS计算。但是CPU由于串行计算的特点,完成NMS计算耗时较长。利用上述相关技术实现图像目标检测时,需要接收大量的FPGA传输的数据并对数据进行串行处理,使得目标检测的效率较低,NMS计算成为目标检测的计算瓶颈。
因此,如何减少图像目标检测过程中并行处理芯片与CPU的通信数据量,提高图像目标检测的效率是本领域技术人员目前需要解决的技术问题。
发明内容
本申请的目的是提供一种图像目标检测方法、系统、一种电子设备及一种存储介质,能够减少图像目标检测过程中并行处理芯片与CPU的通信数据量,提高图像目标检测的效率。
为解决上述技术问题,本申请提供一种图像目标检测方法,应用于异构平台的并行处理芯片,该图像目标检测方法包括:
接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;其中,所述边界框信息包括边界框位置信息;
将所述边界框信息存储至随机存取存储器RAM;
依次读取所述随机存取存储器RAM中的所述边界框信息中的边界框位置信息,并对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;
将所述待输出边界框的边界框信息传输至所述异构平台的中央处理器CPU,以便所述中央处理器CPU输出图像目标检测结果。
可选的,所述边界框信息还包括边界框分类概率信息;
相应的,将所述边界框信息存储至随机存取存储器RAM包括:
将所述边界框信息按照所述边界框分类概率信息从高到低的顺序写入所述随机存取存储器RAM。
可选的,对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框包括:
确定所述并行处理芯片的并行处理路数K;
按照所述边界框分类概率信息从高到低的顺序依次读取所述边界框位置信息,得到K个基准边界框位置信息;其中,任意两个所述基准边界框位置信息的重叠面积小于预设值;
根据K个所述基准边界框位置信息对所有边界框位置信息并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的所述待输出边界框。
可选的,还包括:
当目标边界框位置信息对应的目标对象边界框不为所述待输出边界框时,将所述目标边界框位置信息从所述随机存取存储器RAM中删除。
可选的,所述并行处理芯片为现场可编程逻辑门阵列FPGA、ASIC芯片或嵌入式芯片。
可选的,所述边界框位置信息包括边界框X轴方向坐标、边界框Y轴 方向坐标、边界框宽度和边界框高度。
可选的,所述边界框信息还包括每一目标对象边界框的前景置信度;
相应的,将所述边界框信息存储至随机存取存储器RAM包括:
将所述前景置信度大于或等于预设值的边界框信息存储至所述随机存取存储器RAM。
本申请还提供了一种图像目标检测系统,应用于异构平台的并行处理芯片,所述图像目标检测系统包括:
信息接收模块,用于接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;其中,所述边界框信息包括边界框位置信息;
信息存储模块,用于将所述边界框信息存储至随机存取存储器RAM;
边界框选择模块,用于依次读取所述随机存取存储器RAM中的所述边界框信息中的边界框位置信息,并对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;
结果输出模块,用于将所述待输出边界框的边界框信息传输至所述异构平台的中央处理器CPU,以便所述中央处理器CPU输出图像目标检测结果。
本申请还提供了一种存储介质,其上存储有计算机程序,所述计算机程序执行时实现上述图像目标检测方法执行的步骤。
本申请还提供了一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现上述图像目标检测方法执行的步骤。
本申请提供了一种图像目标检测方法,应用于异构平台的并行处理芯片,所述图像目标检测方法包括:接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;其中,所述边界框信息包括边界框位置信息;将所述边界框信息存储至随机存取存储器RAM;依次读取所述随机存取存储器RAM中的所述边界框信息中的边界框位置信息,并 对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;将所述待输出边界框的边界框信息传输至所述异构平台的中央处理器CPU,以便所述中央处理器CPU输出图像目标检测结果。
本申请在卷积神经网络对图像进行目标识别得到边界框信息后,先将边界框信息存储至并行处理芯片的RAM中,利用并行处理芯片能够并行流水处理计算任务的特性,对所有的边界框信息执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框。在得到待输出边界框后,将待输出边界框的边界框信息传输至中央处理器CPU中,由CPU输出图像检测结果。上述过程利用并行处理芯片实现非极大值抑制算法NMS的计算,将经过局部最大搜索操作后的待输出边界框的边界框信息传输至CPU,相对于相关技术中将所有边界框位置信息传输至CPU的方案,本申请的方案能够减少图像目标检测过程中并行处理芯片与CPU的通信数据量,提高图像目标检测的效率。本申请同时还提供了一种图像目标检测系统、一种电子设备和一种存储介质,具有上述有益效果,在此不再赘述。
附图说明
为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种图像目标检测方法的流程图;
图2为本申请实施例所提供的一种边界框信息的数据格式示意图;
图3为本申请实施例所提供的一种利用CPU+FPGA异构平台实现NMS计算的流程示意图;
图4为本申请实施例所提供的一种边界框信息排序原理示意图;
图5为本申请实施例所提供的一种FPGA加速NMS算法的原理示意图;
图6为本申请实施例所提供的一种图像目标检测系统的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面请参见图1,图1为本申请实施例所提供的一种图像目标检测方法的流程图。
具体步骤可以包括:
S101:接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;
其中,本实施例的执行主体可以为异构平台的并行处理芯片,异构处理平台可以包括主处理芯片和协处理芯片,例如本实施例中提到的异构处理平台可以为CPU+GPU(Graphics Processing Unit,图形处理器)异构加速框架,还可以为CPU+FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)异构加速框架,也可以为CPU+ASIC(Application Specific Integrated Circuit,专用集成电路)异构加速框架或CPU+嵌入式芯片异构加速框架。异构平台中的主处理芯片为CPU(Central Processing Unit,中央处理器),异构平台中的协处理芯片可以为能够实现并行计算和流水线处理的并行处理芯片,本实施例不对并行处理芯片的具体种类进行限定。
本实施例的目的在于提高图像目标检测的效率,图像目标检测指对于图像中特定种类的事物进行识别检测。图像目标检测又叫目标检测,在目标检测过程中,对于每个目标通常会产生多个边界框(bounding box,bbox),每个边界框都对应一个得分,最终呈现检测结果时需要去除冗余的边界框,只保得分最高的那一个边界框。NMS算法(Non-Maximum Suppression),即非极大值抑制算法,顾名思义就是抑制不是极大值的元素,可以理解为局部最大搜索。NMS通常用于目标检测中去除冗余的边界框。例如在人脸检测中,经过卷积神经网络(Convolutional Neural Networks,CNN)中的 多层计算,同一张人脸图像上会有多个边界框,每个边界框都对应一个得分,不同的边界框会存在包含或者大部分交叉的情况,这时就需要用到NMS来选取得分最高的边界框,该边界框中目标是人脸的概率最大,同时去除冗余的边界框,进而实现人脸检测。
在本步骤之前可以存在并行处理芯片利用卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息的操作,即通过卷积神经网络的卷积计算生成可能包括目标检测对象(例如人脸、二维码或车辆)的目标对象边界框。由于上述对象边界框之间可能存在一定的重复,因此需要通过本实施例的操作滤除冗余对象边界框。本实施例不限定目标对象边界框的类别,在一张图像中每一类目标检测对象都可以对应多个目标对象边界框。
本实施例中提到的边界框信息可以包括边界框位置信息,边界框信息为用于描述目标对象边界框位置和尺寸的信息,根据边界框信息可以确定目标对象边界框在图像中的位置。具体的边界框位置信息可以包括边界框X轴方向坐标、边界框Y轴方向坐标、边界框宽度和边界框高度。
S102:将所述边界框信息存储至随机存取存储器RAM;
其中,并行处理芯片在接收到边界框信息之后可以将边界框信息存储至并行处理芯片的随机存取存储器RAM(Random Access Memory),具体的,本步骤可以将边界框信息与目标对象边界框ID的对应关系存储至RAM中。
S103:依次读取所述随机存取存储器RAM中的所述边界框信息中的边界框位置信息,并对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;
其中,本步骤的目的在于确定有效的目标对象边界框,例如在图像中存在边界框A、边界框B和边界框C,边界框A、边界框B和边界框C三个边界框中任意两个边界框的重叠面积比例均大于80%,此时说明边界框A、边界框B和边界框C中均为图像中同一物体的目标对象边界框,需要从边界框A、边界框B和边界框C中选择最优的边界框进行输出,故可以选择得分(即边界框分类概率)最高的边界框作为待输出边界框。
在本步骤中并行处理芯片可以按照预设顺序依次读取随机存取存储器RAM中的所述边界框信息中的边界框位置信息,基于并行处理芯片的并行计算和和流水线处理的特点实现基于非极大值抑制算法的局部最大搜索操作得到预设数量的待输出边界框。具体的,上述预设数量可以为在进行局部最大搜索操作之前设置的,即需要获得待输出边界框的数量。
具体的,本实施例中得到预设数量的待输出边界框的操作过程可以包括以下步骤:
步骤1:确定所述并行处理芯片的并行处理路数K;
步骤2:按照所述边界框分类概率信息从高到低的顺序依次读取所述边界框位置信息,得到K个基准边界框位置信息;其中,任意两个所述基准边界框位置信息的重叠面积小于预设值;
步骤3:根据K个所述基准边界框位置信息对所有边界框位置信息并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的所述待输出边界框。
相应的,当目标边界框位置信息对应的目标对象边界框不为所述待输出边界框时,将所述目标边界框位置信息从所述随机存取存储器RAM中删除。本步骤将输入RAM中的边界框信息读出,需计算边界框重叠面积并去除重复的框。举例说明上述过程:为了提高计算效率,并行处理芯片可以进行K路并行处理,K可为任意值。下面以K=4为例说明并行处理过程。第1个地址(即地址0)为得分最高框,直接保留并送至输出RAM存储。读取第2个地址的边界框信息,并与第1个地址的边界框信息进行重叠面积比例计算,当重叠面积比例小于设定阈值时,保留并送至输出RAM保存;当重叠面积大于设定阈值时,则认定该地址对应的边界框为重复框,舍弃对应信息。继续读取第3个地址的边界框位置信息,分别与第1个地址和第2个地址中的边界框进行重叠面积计算,与其中某一个被判断为重复框则舍弃它;如果两次比较均不被判定为重复框,则将此边界框保留并送至输出RAM存储;以此类推,不断计算直到保留出4个有效框信息。将上面计算出的4个有效边界框分别作为4个阈值比较模块的基准,按照顺序从输入RAM中读取边界框信息,将所输出的边界框同时与4个基准 边界框进行重叠面积比例计算,去除掉比例大于设定阈值的框,否则将保留的框输出到中间缓存RAM-1,直到将输入RAM内数据全部计算完毕。遍历完输入RAM中的数据后,中间缓存RAM-1作为输入RAM,而中间缓存RAM-1作为作新的中间缓存RAM。从中间缓存RAM-1中读取并计算出4个有效框信息,重复上述计算过程步骤,直到遍历完中间缓存RAM-1及中间缓存RAM-2内的数据。
S104:将所述待输出边界框的边界框信息传输至所述异构平台的中央处理器CPU,以便所述中央处理器CPU输出图像目标检测结果。
其中,本步骤建立在随机存取存储器RAM数据全部计算完毕的基础上,即已经完成重复框的去除工作,NMS计算过程全部完成,此时可以将所述待输出边界框的边界框信息输出给CPU。与传统技术中用CPU进行NMS计算相比,本实施例提出的并行处理芯片加速NMS计算的方法只需要向CPU传输最终有效的边界框数据,大大减少了通信开销。
本实施例在卷积神经网络对图像进行目标识别得到边界框信息后,先将边界框信息存储至并行处理芯片的RAM中,利用并行处理芯片能够并行流水处理计算任务的特性,对所有的边界框信息执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框。在得到待输出边界框后,将待输出边界框的边界框信息传输至中央处理器CPU中,由CPU输出图像检测结果。上述过程利用并行处理芯片实现非极大值抑制算法NMS的计算,将经过局部最大搜索操作后的待输出边界框的边界框信息传输至CPU,相对于相关技术中将所有边界框位置信息传输至CPU的方案,本实施例的方案能够减少图像目标检测过程中并行处理芯片与CPU的通信数据量,提高图像目标检测的效率。
作为对于图1对应实施例的进一步补充,上述边界框信息还可以包括边界框分类概率信息,边界框分类概率信息为用于描述目标对象框中的图像类型为目标检测对象类型的概率的信息,边界框分类概率越高说明该目标对象框中对应的区域包括目标检测对象的概率越高,边界框分类概率信息相当于目标对象框的得分。
进一步的,图1对应实施例中将所述边界框信息存储至随机存取存储 器RAM的过程可以为:将所述边界框信息按照所述边界框分类概率信息从高到低的顺序写入所述随机存取存储器RAM。利用传统排序算法对边界框得分进行排序时,需要耗费大量的硬件资源。本实施例提出了一种基于硬件地址的排序方法,从而使得边界框分类概率越高的边界框信息在RAM中存储的地址值越大,即地址越高。完成写入后,从排序RAM中依照地址从高到低的顺序读出数据。
作为对于图1对应实施例的进一步补充,上述边界框信息还可以包括每一目标对象边界框的前景置信度,前景置信度为用于描述目标对象边界框对应的图像为前景图像的概率信息,前景置信度越高目标对象边界框对应的图像为前景图像的概率越大。
进一步的,图1对应实施例中将所述边界框信息存储至随机存取存储器RAM的过程可以为将所述前景置信度大于或等于预设值的边界框信息存储至所述随机存取存储器RAM。也就是说,上述实施方式可以首先利用前景置信度对每一边界框信息进行筛选,对前景置信度大于或等于预设值的边界框信息进行存储,并剔除掉前景置信度小于预设值的边界框信息。
若将图1对应的实施例与上述两种对于图1对应的实施例的进一步补充结合可以得到更为优选的实施方式,具体如下:
并行处理芯片首先基于卷积神经网络的进行目标对象检测,卷积神经网络完成卷积等计算后,可以得到不同种类目标所对应的分类及位置信息(即边界框信息),边界框信息可以包括前景置信度、边界框中心点坐标、边界框的大小和边界框分类概率(即得分)。请参见图2,图2为本申请实施例所提供的一种边界框信息的数据格式示意图,在用于目标对象检测的卷积神经网络计算完成后,可以得到N种类别的目标对象边界框,每种类别中包含M个目标对象边界框,每个目标对象边界框对应的边界框信息包括得分p_data(12bit)、边界框X轴方向坐标x_data(12bit)、Y轴方向坐标y_data(12bit)、边框框宽度w_data(12bit)、边界框高度h_data(12bit)。
利用并行处理芯片完成NMS计算的过程可以包括:对神经网络计算得出的边界框的得分和坐标位置等信息进行预处理,筛除前景置信度小于 前景阈值的数据,保留前景置信度大于或等于前景阈值的数据,同时将置信度(16bit)替换为0x1,表示该边界框为有效框。将前景置信度大于预设值的边界框信息送给排序模块,排序模块完成得分从高到低的排序,同时保留得分对应的边界框坐标及大小信息;在得到排序结果后,可以以最高得分为基础计算框与框之间重叠面积,如果大于一定阈值则将得分低的框去除。本实施例中并行处理芯片还可以采用K路并行方式进行阈值比较,完成对无效边界框的筛除操作,还可以将需保留的边界框坐标及大小信息写入输出FIFO,K路并行时,重复阈值比较过程,当遍历完所有边界框后,便得到最终的结果输出,完成NMS的计算。
本实施例一种基于并行处理芯片加速NMS算法的方法,在并行处理芯片完成神经网络加速计算的基础上,将NMS算法也放在并行处理芯片上实现,能够充分利用并行处理芯片并行处理和流水线处理的特性,提高NMS计算速度。同时还针对性提出了一种利用RAM写地址完成得分排序的方法,能够有效降低硬件资源消耗。本实施例利用并行处理芯片并行计算和流水线处理实现NMS算法,用并行处理芯片替代传统CPU去实现NMS算法,在提高计算速度的同时,减少处理器(CPU)与FPGA硬件之间的通信开销。本实施例利用并行处理芯片实现NMS算法计算的硬件结构,可根据网络的不同,灵活配置RAM参数,使其适配不同的要求。本实施例提供的利用RAM写地址完成得分排序的方法可以直接将得分作为RAM的写地址,再从高地址开始读出数据,能够快速实现排序,并且减少硬件资源消耗。
下面通过在实际应用中的实施例说明上述实施例描述的流程,请参见图3,图3为本申请实施例所提供的一种利用CPU+FPGA异构平台实现NMS计算的流程示意图。
在FPGA的卷积神经网络完成卷积等计算后,可以得到不同种类目标对象边界框的边界框信息,边界框信息可以包括前景置信度、边界框中心点坐标、边界框的大小和边界框分类概率。
首先FPGA可以根据边界框分类概率对边界框信息进行排序。利用传 统排序算法对边界框得分进行排序时,需要耗费大量的硬件资源。请参见图4,图4为本申请实施例所提供的一种边界框信息排序原理示意图,本实施例提出了一种基于硬件地址的排序方法,经过预处理的数据均为前景边界框数据,每个边界框数据量为64bit,其中低48bit为x_data,y_data,w_data和h_data,第49bit为标志位1,其余位为0。以得分数据(12bit)作为写地址,将上述数据写入到深度为4096(2^12,假设M<4096)、宽度为64bit的排序RAM中,这样得分越高的数据在排序RAM中存储的地址值越大,即地址越高。完成写入后,从排序RAM中依照地址从高到低的顺序读出数据,其中第49bit为1的为前景边界框数据,将其写入到输入RAM中,直至M个数据全部读出。
在完成根据边界框分类概率的排序操作之后,FPGA可以将输入RAM中的边界框信息读出,需计算边界框重叠面积并去除重复的框。请参见图5,图5为本申请实施例所提供的一种FPGA加速NMS算法的原理示意图。为了提高计算效率,FPGA可以进行K路并行处理,K可为任意值。下面以K=4为例说明并行处理过程。第1个地址(即地址0)为得分最高框,直接保留并传输至输出RAM存储。读取第2个地址的边界框信息,并与第1个地址的边界框信息进行重叠面积比例计算,当重叠面积比例小于设定阈值时,保留并送至输出RAM保存;当重叠面积大于设定阈值时,则认定该地址对应的边界框为重复框,舍弃对应信息。继续读取第3个地址的边界框位置信息,分别与第1个地址和第2个地址中的边界框进行重叠面积计算,与其中某一个被判断为重复框则舍弃它;如果两次比较均不被判定为重复框,则将此边界框保留并送至输出RAM存储;以此类推,不断计算直到保留出4个有效框信息。将上面计算出的4个有效边界框分别作为4个阈值比较模块的基准,按照顺序从输入RAM中读取边界框信息,将所输出的边界框同时与4个基准边界框进行重叠面积比例计算,去除掉比例大于设定阈值的框,否则将保留的框输出到中间缓存RAM-1,直到将输入RAM内数据全部计算完毕。遍历完输入RAM中的数据后,中间缓存RAM-1作为输入RAM,而中间缓存RAM-1作为作新的中间缓存RAM。从中间缓存RAM-1中读取并计算出4个有效框信息,重复上述计算过程 步骤,直到遍历完中间缓存RAM-1及中间缓存RAM-2内的数据。
在上述计算过程中,输入RAM读出的第1个边界框信息直接送至输出RAM存储,每次遍历所使用的4个基准边界框信息也直接存储到输出RAM,当输入RAM及中间缓存RAM数据全部计算完毕后,则已经完成重复框的去除工作,将输出RAM内的数据输出给CPU,NMS计算过程全部完成。与传统技术中用CPU进行NMS计算相比,本实施例提出的FPGA加速NMS计算的方法只需要向CPU传输最终有效的边界框数据,大大减少了通信开销。
上述实施例为针对一种类别目标的NMS计算实现过程,在实际应用中可以多个种类目标同时进行NMS算法计算,不同种类目标的NMS计算过程相同,可以依据硬件资源情况选择合适的并行度。
本实施例在FPGA上实现NMS计算,充分利用了FPGA并行处理和流水线处理的特性,提高了NMS计算速度。本实施例还利用RAM写地址完成对边界框得分的排序,相对于传统排序算法实现,大大减少硬件资源的消耗。本实施例中FPGA仅将有效的边界框数据传输至CPU,减少了FPGA与CPU之间的通信开销,提升了目标检测的整体效率。
请参见图6,图6为本申请实施例所提供的一种图像目标检测系统的结构示意图;
该系统可以包括:
信息接收模块100,用于接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;其中,所述边界框信息包括边界框位置信息;
信息存储模块200,用于将所述边界框信息存储至随机存取存储器RAM;
边界框选择模块300,用于依次读取所述随机存取存储器RAM中的所述边界框信息中的边界框位置信息,并对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;
结果输出模块400,用于将所述待输出边界框的边界框信息传输至所述异构平台的中央处理器CPU,以便所述中央处理器CPU输出图像目标检测结果。
本实施例在卷积神经网络对图像进行目标识别得到边界框信息后,先将边界框信息存储至并行处理芯片的RAM中,利用并行处理芯片能够并行流水处理计算任务的特性,对所有的边界框信息执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框。在得到待输出边界框后,将待输出边界框的边界框信息传输至中央处理器CPU中,由CPU输出图像检测结果。上述过程利用并行处理芯片实现非极大值抑制算法NMS的计算,将经过局部最大搜索操作后的待输出边界框的边界框信息传输至CPU,相对于相关技术中将所有边界框位置信息传输至CPU的方案,本实施例的方案能够减少图像目标检测过程中并行处理芯片与CPU的通信数据量,提高图像目标检测的效率。
进一步的,所述边界框信息还包括边界框分类概率信息;
相应的,信息存储模块200具体为用于将所述边界框信息按照所述边界框分类概率信息从高到低的顺序写入所述随机存取存储器RAM的模块。
进一步的,边界框选择模块300包括:
并行处理路数确定单元,用于确定所述并行处理芯片的并行处理路数K;
基准边界框位置确定单元,用于按照所述边界框分类概率信息从高到低的顺序依次读取所述边界框位置信息,得到K个基准边界框位置信息;其中,任意两个所述基准边界框位置信息的重叠面积小于预设值;
NMS计算模块,用于根据K个所述基准边界框位置信息对所有边界框位置信息并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的所述待输出边界框。
进一步的,还包括:
冗余删除模块,用于当目标边界框位置信息对应的目标对象边界框不为所述待输出边界框时,将所述目标边界框位置信息从所述随机存取存储器RAM中删除。
进一步的,所述并行处理芯片为现场可编程逻辑门阵列FPGA、ASIC芯片或嵌入式芯片。
进一步的,所述边界框位置信息包括边界框X轴方向坐标、边界框Y轴方向坐标、边界框宽度和边界框高度。
进一步的,所述边界框信息还包括每一目标对象边界框的前景置信度;
相应的,信息存储模块200具体为用于将所述前景置信度大于或等于预设值的边界框信息存储至所述随机存取存储器RAM。
由于系统部分的实施例与方法部分的实施例相互对应,因此系统部分的实施例请参见方法部分的实施例的描述,这里暂不赘述。
本申请还提供了一种存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请还提供了一种电子设备,可以包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时,可以实现上述实施例所提供的步骤。当然所述电子设备还可以包括各种网络接口,电源等组件。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且, 术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (10)

  1. 一种图像目标检测方法,其特征在于,应用于异构平台的并行处理芯片,所述图像目标检测方法包括:
    接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;其中,所述边界框信息包括边界框位置信息;
    将所述边界框信息存储至随机存取存储器RAM;
    依次读取所述随机存取存储器RAM中的所述边界框信息中的边界框位置信息,并对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;
    将所述待输出边界框的边界框信息传输至所述异构平台的中央处理器CPU,以便所述中央处理器CPU输出图像目标检测结果。
  2. 根据权利要求1所述图像目标检测方法,其特征在于,所述边界框信息还包括边界框分类概率信息;
    相应的,将所述边界框信息存储至随机存取存储器RAM包括:
    将所述边界框信息按照所述边界框分类概率信息从高到低的顺序写入所述随机存取存储器RAM。
  3. 根据权利要求2所述图像目标检测方法,其特征在于,所述对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框包括:
    确定所述并行处理芯片的并行处理路数K;
    按照所述边界框分类概率信息从高到低的顺序依次读取所述边界框位置信息,得到K个基准边界框位置信息;其中,任意两个所述基准边界框位置信息的重叠面积小于预设值;
    根据K个所述基准边界框位置信息对所有边界框位置信息并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的所述待输出边界框。
  4. 根据权利要求3所述图像目标检测方法,其特征在于,还包括:
    当目标边界框位置信息对应的目标对象边界框不为所述待输出边界框时,将所述目标边界框位置信息从所述随机存取存储器RAM中删除。
  5. 根据权利要求1所述图像目标检测方法,其特征在于,所述并行处理芯片为现场可编程逻辑门阵列FPGA、ASIC芯片或嵌入式芯片。
  6. 根据权利要求1所述图像目标检测方法,其特征在于,所述边界框位置信息包括边界框X轴方向坐标、边界框Y轴方向坐标、边界框宽度和边界框高度。
  7. 根据权利要求1至6任一项所述图像目标检测方法,其特征在于,所述边界框信息还包括每一目标对象边界框的前景置信度;
    相应的,将所述边界框信息存储至随机存取存储器RAM包括:
    将所述前景置信度大于或等于预设值的边界框信息存储至所述随机存取存储器RAM。
  8. 一种图像目标检测系统,其特征在于,应用于异构平台的并行处理芯片,所述图像目标检测系统包括:
    信息接收模块,用于接收卷积神经网络对图像进行目标识别得到的多个目标对象边界框的边界框信息;其中,所述边界框信息包括边界框位置信息;
    信息存储模块,用于将所述边界框信息存储至随机存取存储器RAM;
    边界框选择模块,用于依次读取所述随机存取存储器RAM中的所述边界框信息中的边界框位置信息,并对读取的边界框位置信息多路并行执行基于非极大值抑制算法的局部最大搜索操作,得到预设数量的待输出边界框;
    结果输出模块,用于将所述待输出边界框的边界框信息传输至所述异构平台的中央处理器CPU,以便所述中央处理器CPU输出图像目标检测结果。
  9. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如权利要求1至7任一项所述图像目标检测方法的步骤。
  10. 一种存储介质,其特征在于,所述存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如上权利要求1至7任一项所述图像目标检测方法的步骤。
PCT/CN2020/092828 2019-10-25 2020-05-28 一种图像目标检测方法、系统、电子设备及存储介质 WO2021077743A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911025107.0A CN110781819A (zh) 2019-10-25 2019-10-25 一种图像目标检测方法、系统、电子设备及存储介质
CN201911025107.0 2019-10-25

Publications (1)

Publication Number Publication Date
WO2021077743A1 true WO2021077743A1 (zh) 2021-04-29

Family

ID=69386647

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092828 WO2021077743A1 (zh) 2019-10-25 2020-05-28 一种图像目标检测方法、系统、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN110781819A (zh)
WO (1) WO2021077743A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642510A (zh) * 2021-08-27 2021-11-12 北京京东乾石科技有限公司 目标检测方法、装置、设备和计算机可读介质
CN113837086A (zh) * 2021-09-24 2021-12-24 南通大学 一种基于深度卷积神经网络的水库钓鱼人检测方法
GB2604991A (en) * 2021-01-14 2022-09-21 Nvidia Corp Performing non-maximum suppression in parallel
CN115410060A (zh) * 2022-11-01 2022-11-29 山东省人工智能研究院 面向公共安全视频的全局感知小目标智能检测方法

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781819A (zh) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 一种图像目标检测方法、系统、电子设备及存储介质
CN111310759B (zh) * 2020-02-13 2024-03-01 中科智云科技有限公司 双模式协作的目标检测抑制优化方法及设备
CN112396048B (zh) * 2020-11-17 2023-09-29 中国平安人寿保险股份有限公司 图片信息提取方法、装置、计算机设备及存储介质
CN112613564A (zh) * 2020-12-25 2021-04-06 桂林汉璟智能仪器有限公司 一种剔除重叠框的目标检测后处理方法
CN112766073B (zh) * 2020-12-31 2022-06-10 贝壳找房(北京)科技有限公司 表格提取方法、装置、电子设备及可读存储介质
CN112801035B (zh) * 2021-02-24 2023-04-07 山东大学 基于知识与数据双驱动的搭载式岩性智能识别方法及系统
CN112817881A (zh) * 2021-02-26 2021-05-18 上海阵量智能科技有限公司 信息处理方法、装置、设备及存储介质
CN113111929B (zh) * 2021-04-01 2024-04-12 广东拓斯达科技股份有限公司 一种模板匹配方法、装置、计算机设备及存储介质
CN114049616B (zh) * 2021-12-01 2022-09-09 清华大学 一种基于模糊分类的立体空间目标检测方法及系统
CN114998438B (zh) * 2022-08-02 2022-11-01 深圳比特微电子科技有限公司 一种目标检测方法、装置和机器可读存储介质
CN116596990B (zh) * 2023-07-13 2023-09-29 杭州菲数科技有限公司 目标检测方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104020983A (zh) * 2014-06-16 2014-09-03 上海大学 一种基于OpenCL的KNN-GPU加速方法
US20160133002A1 (en) * 2014-11-07 2016-05-12 Samsung Electronics Co., Ltd. Method and device to determine landmark from region of interest of image
CN108268869A (zh) * 2018-02-13 2018-07-10 北京旷视科技有限公司 目标检测方法、装置及系统
CN110781819A (zh) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 一种图像目标检测方法、系统、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214389B (zh) * 2018-09-21 2021-09-28 上海小萌科技有限公司 一种目标识别方法、计算机装置及可读存储介质
CN109784290B (zh) * 2019-01-23 2021-03-05 科大讯飞股份有限公司 一种目标检测方法、装置、设备及可读存储介质
CN110298298B (zh) * 2019-06-26 2022-03-08 北京市商汤科技开发有限公司 目标检测及目标检测网络的训练方法、装置及设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104020983A (zh) * 2014-06-16 2014-09-03 上海大学 一种基于OpenCL的KNN-GPU加速方法
US20160133002A1 (en) * 2014-11-07 2016-05-12 Samsung Electronics Co., Ltd. Method and device to determine landmark from region of interest of image
CN108268869A (zh) * 2018-02-13 2018-07-10 北京旷视科技有限公司 目标检测方法、装置及系统
CN110781819A (zh) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 一种图像目标检测方法、系统、电子设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2604991A (en) * 2021-01-14 2022-09-21 Nvidia Corp Performing non-maximum suppression in parallel
GB2604991B (en) * 2021-01-14 2023-08-02 Nvidia Corp Performing non-maximum suppression in parallel
CN113642510A (zh) * 2021-08-27 2021-11-12 北京京东乾石科技有限公司 目标检测方法、装置、设备和计算机可读介质
CN113837086A (zh) * 2021-09-24 2021-12-24 南通大学 一种基于深度卷积神经网络的水库钓鱼人检测方法
CN115410060A (zh) * 2022-11-01 2022-11-29 山东省人工智能研究院 面向公共安全视频的全局感知小目标智能检测方法

Also Published As

Publication number Publication date
CN110781819A (zh) 2020-02-11

Similar Documents

Publication Publication Date Title
WO2021077743A1 (zh) 一种图像目标检测方法、系统、电子设备及存储介质
US11551068B2 (en) Processing system and method for binary weight convolutional neural network
US10042576B2 (en) Method and apparatus for compressing addresses
KR102499335B1 (ko) 신경망 데이터 처리 장치, 방법 및 전자 장비
US20160306588A1 (en) Solid state disk and data moving method
KR102147356B1 (ko) 캐시 메모리 시스템 및 그 동작방법
US20230195637A1 (en) On-chip cache apparatus, on-chip cache read-write method, and computer-readable medium
EP3686816A1 (en) Techniques for removing masks from pruned neural networks
US20150143045A1 (en) Cache control apparatus and method
KR101730151B1 (ko) 플래시 메모리 장치에 데이터를 기록하는 방법, 플래시 메모리 장치, 및 저장 시스템
US10789194B2 (en) Techniques for efficiently synchronizing data transmissions on a network
WO2023071273A1 (zh) 点云数据的处理
EP2919120A1 (en) Memory monitoring method and related device
US20210295607A1 (en) Data reading/writing method and system in 3d image processing, storage medium and terminal
US20230161811A1 (en) Image search system, method, and apparatus
US20220374733A1 (en) Data packet classification method and system based on convolutional neural network
JP2022160662A (ja) 文字認識方法、装置、機器、記憶媒体、スマート辞書ペン及びコンピュータプログラム
CN112579595A (zh) 数据处理方法、装置、电子设备及可读存储介质
US20160124841A1 (en) Information processing system and data processing method
CN113158132A (zh) 一种基于非结构化稀疏的卷积神经网络加速系统
CN111738290A (zh) 图像检测方法、模型构建和训练方法、装置、设备和介质
CN115937879A (zh) 基于多尺度特征融合网络的学术内容目标检测方法及系统
CN106502775A (zh) 一种分时调度dsp算法的方法和系统
CN110390392B (zh) 基于fpga的卷积参数加速装置、数据读写方法
US11003578B2 (en) Method and system for parallel mark processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20878615

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20878615

Country of ref document: EP

Kind code of ref document: A1