CN111340790B - Bounding box determination method, device, computer equipment and storage medium - Google Patents

Bounding box determination method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN111340790B
CN111340790B CN202010135898.9A CN202010135898A CN111340790B CN 111340790 B CN111340790 B CN 111340790B CN 202010135898 A CN202010135898 A CN 202010135898A CN 111340790 B CN111340790 B CN 111340790B
Authority
CN
China
Prior art keywords
thread
bitmap
target
bounding box
bit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010135898.9A
Other languages
Chinese (zh)
Other versions
CN111340790A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DeepRoute AI Ltd
Original Assignee
DeepRoute AI Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DeepRoute AI Ltd filed Critical DeepRoute AI Ltd
Priority to CN202010135898.9A priority Critical patent/CN111340790B/en
Publication of CN111340790A publication Critical patent/CN111340790A/en
Application granted granted Critical
Publication of CN111340790B publication Critical patent/CN111340790B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to a bounding box determination method, a bounding box determination device, computer equipment and a storage medium. The method comprises the following steps: acquiring a bitmap in threads contained in a thread group; the bitmap of any thread corresponds to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object; running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation; the first target thread is selected from the thread group, and the first remaining threads are threads in the thread group except the first target thread; according to the first bit and the result of the operation, obtaining the overlapping information between the initial bounding boxes corresponding to each thread of the thread group; and determining an independent bounding box from the initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding box as a target bounding box of the target object. The determination of the entire bounding box can be accomplished quickly.

Description

Bounding box determination method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of bounding box determination technologies, and in particular, to a bounding box determination method, apparatus, computer device, and storage medium.
Background
Non-maximum suppression has wide application in artificial intelligence object detection post-processing. The neural network calculates bounding boxes to a large number of objects and a score for each bounding box. The non-maximal value suppression method (Non Max Imumsuppression, NMS) can remove most repeated bounding boxes and select the point with the highest score as the most suitable bounding box of the detected object, thereby achieving the purpose of detecting the type, position and size of the object. The non-maximum suppression algorithm can consider parallel computing to increase the speed, so how to achieve the optimal collocation of efficiency and energy consumption under the parallel algorithm, that is, using fewer GPU (graphics processor, graphics Processing Unit) hardware resources to enable more and faster NMS detection, is a very important problem.
In the conventional art, a most suitable target bounding box is selected from initial bounding boxes after a map-reduce (map-reduce) operation. However, the inventor finds that the number of threads occupied by storing bounding box information in the prior art is large, so that the number of bounding boxes for which one thread runs is small, and the determining efficiency of the bounding boxes is low.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a bounding box determination method, apparatus, computer device, and storage medium that are capable of quickly determining a target bounding box.
A method of determining a bounding box, the method comprising:
acquiring a bitmap in threads contained in a thread group; the bitmap of any thread is corresponding to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object;
running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation; the first target thread is selected from the thread group, and the first residual threads are threads in the thread group, from which the first target thread is removed;
according to the first bit and the operation result, obtaining the overlapping information between the initial bounding boxes corresponding to each thread of the thread group;
And determining an independent bounding box from the initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding box as a target bounding box of the target object.
A bounding box determination apparatus, the apparatus comprising:
the bitmap acquisition module is used for acquiring bitmaps in threads contained in the thread group; the bitmap of any thread is corresponding to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object;
the bitmap operation module is used for running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation; the first target thread is selected from the thread group, and the first residual threads are threads in the thread group, from which the first target thread is removed;
the overlapping information determining module is used for obtaining overlapping information among initial bounding boxes corresponding to each thread of the thread group according to the first bit and the operation result;
and the bounding box determining module is used for determining an independent bounding box from initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding box as a target bounding box of the target object.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
acquiring a bitmap in threads contained in a thread group; the bitmap of any thread is corresponding to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object;
running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation; the first target thread is selected from the thread group, and the first residual threads are threads in the thread group, from which the first target thread is removed;
according to the first bit and the operation result, obtaining the overlapping information between the initial bounding boxes corresponding to each thread of the thread group;
and determining an independent bounding box from the initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding box as a target bounding box of the target object.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
Acquiring a bitmap in threads contained in a thread group; the bitmap of any thread is corresponding to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object;
running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation; the first target thread is selected from the thread group, and the first residual threads are threads in the thread group, from which the first target thread is removed;
according to the first bit and the operation result, obtaining the overlapping information between the initial bounding boxes corresponding to each thread of the thread group;
and determining an independent bounding box from the initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding box as a target bounding box of the target object.
According to the bounding box determining method, the device, the computer equipment and the storage medium, the information of the initial bounding boxes is recorded through the bitmap in the thread, the initial bounding boxes in the same thread are independent bounding boxes, namely, the initial bounding boxes corresponding to the same thread are not overlapped, the overlapping information of the initial bounding boxes corresponding to each thread of the thread group is rapidly determined through the mode of parallel running of the threads in the thread group, further, mutually independent target bounding boxes are determined from the initial bounding boxes, and the whole bounding box determining process can be rapidly completed.
Drawings
FIG. 1 is an application environment diagram of a method of bounding box determination in one embodiment;
FIG. 2 is a flow diagram of a method of bounding box determination in one embodiment;
FIG. 3 is a schematic diagram of a process from an initial bounding box to a target bounding box in one embodiment;
FIG. 4 is a schematic diagram of the structure of threads within a thread group in one embodiment;
FIG. 5 is a flow diagram of updating bits in a thread in one embodiment;
FIG. 6 is a block diagram of the structure of a kernel function in one embodiment;
FIG. 7 is a schematic diagram of a streaming multiprocessor in one embodiment;
FIG. 8 is a diagram of a bit and operation process performed by a bit map in one embodiment;
FIG. 9 is a schematic diagram of a linear reduction process in one embodiment;
FIG. 10 is a schematic diagram of a first-order simplification in one embodiment;
FIG. 11 is a schematic diagram of an exemplary configuration of an atomic operation reduction;
FIG. 12 is a schematic diagram of a two-stage simplification structure in one embodiment;
fig. 13 is a schematic structural view of a determination device of a bounding box in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The bounding box determining method provided by the application can be applied to computer equipment shown in fig. 1, wherein the computer equipment can be a terminal, and an internal structure diagram of the computer equipment can be shown in fig. 1. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of determining bounding boxes. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, as shown in fig. 2, a method for determining a bounding box is provided, where this embodiment is applied to a computer device (where an NVIDIA inflight GPU may be configured) for illustration, and it is understood that the method may also be applied to a server, and may also be applied to a system including a computer device and a server, and implemented through interaction between the computer device and the server. In this embodiment, the method includes the steps of:
s201, obtaining a bitmap in threads contained in a thread group; the bitmap of any thread corresponds to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object.
The target object may refer to an object in an image or a video (the video may be split into a plurality of image frames), and may be a certain or some objects in the image, for example, in the automatic driving field, three lamps in a traffic light image may be respectively used as the target object. Further, the detection of the target object may be performed by Fast R-CNN, fast R-CNN, FPN, YOLO, SSD, retinaNet and other algorithms, where the algorithms analyze the features in the image, and the region that may be the target object forms a bounding box in the image by means of a detection box (may be a rectangular box or the like), and at the same time, these bounding boxes may also correspond to scores (the score may be the matching degree of the bounding box and the target object in size, and also reflect to a certain extent the probability that the bounding box belongs to the closest matching bounding box). In this step, all bounding boxes detected by the algorithm are determined as initial bounding boxes, as shown in the left side of fig. 3, and three lamp bounding boxes are determined by the algorithm, so that in order to facilitate the subsequent image processing process, only one most accurate target bounding box is often needed for one target object, therefore, selection needs to be performed from the initial bounding boxes, and all target objects in the image are ensured to be selected by the minimum bounding boxes (as shown in the right side of fig. 3).
Further, the bitmaps in the threads may be identical in number of bits, for example, 8 bits, 16 bits, 32 bits, or the like, and the bitmaps may be binary bitmaps or the like.
Each initial bounding box corresponds to a bit of a bitmap in a thread group, the initial bounding box corresponding to the thread is determined according to the position overlapping information between the initial bounding boxes in the same thread, and corresponding bit values are filled in the bits of the bitmap in the thread accordingly. In the process of judging the bounding boxes which are intersected in pairs, compared with a shaping storage mode in a traditional algorithm, the multi-bit bitmap is used, the memory utilization rate and the GPU operation resource utilization rate are greatly improved, and a large number of bounding boxes in a single thread group are enabled to be processed. For the low-power consumption GPU, the computing resources are precious, the use of the bitmap greatly reduces the use of main resources such as registers, memory bandwidth and the like, and the use of a single thread block also avoids the synchronization of a large amount of memory and cache.
Further, when the bitmap is a binary bitmap, each bit may be filled with 0 or 1, when it is determined that a certain initial bounding box is not independent, a bit value corresponding to the bitmap may be determined as 0, and when it is determined that a certain initial bounding box is independent, a bit value corresponding to the bitmap may be determined as 1. Specifically, if the initial bounding box a is independent of other initial bounding boxes corresponding to the same thread (i.e., the non-overlapping or overlapping area is smaller than the set threshold), the bit value corresponding to the initial bounding box a may be determined to be 1, and conversely, if the initial bounding box B overlaps with other initial bounding boxes corresponding to the same thread (or the overlapping area is larger than the set threshold), the initial bounding box B is not considered to be an independent bounding box, and the bit value corresponding to the initial bounding box B may be determined to be 0.
A schematic diagram of bitmaps contained by threads in a thread group may be as shown in fig. 4. The thread group includes m threads (the size of m may be determined by a hardware parameter of the computer device, for example, 32), each thread includes n bitmaps (the size of n may also be determined by a hardware parameter of the computer device, for example, 2014), and each bitmap includes 32 bits, where each bit corresponds to an initial bounding box (the initial bounding boxes may be numbered, and each bit in the thread and the initial bounding box are associated in the order of the size of the numbers, for example, bitmaps from left to right from top to bottom in fig. 3 respectively correspond to the initial bounding boxes with progressively larger numbers), and the 1 st bitmap of the 0 th thread is filled with bit values of "0" and "1", where "1" indicates that the corresponding initial bounding box does not overlap with other initial bounding boxes in the same thread.
In some embodiments, the number of thread groups may be more than one, and the number of thread groups may also be determined based on hardware parameters of the computer device.
S202, running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation (and); the first target thread is selected from the thread group, and the first remaining threads are threads in the thread group, from which the first target thread is removed.
The first target thread may be selected randomly from each thread of the thread group, or may be selected directionally. Taking the thread group in fig. 4 as an example, the 0 th thread may be determined as the first target thread, further, the 1 st to (m-1) th threads may be determined as the first remaining threads, and further, the 1 st to (m-1) th threads and the 0 th thread may be gradually subjected to bit and operation, so that the bit and operation result of each thread in the thread group may be obtained.
Before performing the bit and operation, a process of analyzing an overlapping relation between an initial bounding box corresponding to the first remaining thread and an initial bounding box corresponding to the first target thread and redefining bit values of bitmaps in the threads according to the overlapping relation may be included. For easy understanding, the bitmap is color-filled, taking the 0 th thread and the 1 st thread for bit-and-operation as an example, as shown in fig. 5 (the bitmap in each thread in fig. 5 is simplified to be a bitmap and the bitmap is 8 bits), the black box in the upper half of fig. 5 represents the independent bounding boxes in the 0 th thread and the 1 st thread, from which it can be seen that there are 6 independent bounding boxes, after bit-and-operation, the independent bounding boxes can be determined from the 6 bounding boxes (the determination process is consistent with the method described above), and as shown in the lower half of fig. 5, 4 independent bounding boxes can be determined from the determination process, therefore, the bit value of the independent bounding box is determined to be 1, the bit value of the dependent bounding box is determined to be 0 (grey box 501), and the bit values of other bounding boxes are determined to be 1, and at this time, the independent target bounding boxes can be determined from the determination process.
In one embodiment, the step of running each thread in the thread group in parallel to perform a first bit and operation on the bitmap in the first remaining thread and the bitmap in the first target thread includes: and running each thread in the thread group in parallel so as to enable the bit value of the bitmap in the first residual thread to respectively carry out first bit AND operation with the bit value of the bitmap in the first target thread.
S203, according to the first bit and the operation result, overlapping information among initial bounding boxes corresponding to the threads of the thread group is obtained.
According to the description of S202, an independent bounding box of the initial bounding box corresponding to the first remaining thread relative to the initial bounding box corresponding to the first target thread may be determined according to the first bit and the result of the operation. Further, overlapping information between the independent bounding boxes may be determined, the bitmap in the thread may be adjusted according to the overlapping information, and the bit and operation may be repeated to determine the target bounding box from the independent bounding boxes (i.e., the step implemented in S204).
S204, determining independent bounding boxes from initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding boxes as target bounding boxes of the target object.
The determined target bounding box is a bounding box capable of independently characterizing the target object in the image, so that subsequent analysis of the target object can be performed (e.g., tracking the position of the target vehicle in the next frame, determining which frame color of the traffic light will change in the next frame, etc.).
In the bounding box determining method, the information of the initial bounding boxes is recorded through the bitmaps in the threads, one bitmap can record the information of a plurality of initial bounding boxes, the initial bounding boxes in the same thread are independent bounding boxes, namely, the initial bounding boxes corresponding to the same thread are not overlapped, overlapping information among the initial bounding boxes corresponding to each thread of the thread group is rapidly determined through the mode of parallel running of the threads in the thread group, further, mutually independent target bounding boxes are determined from the initial bounding boxes, and the whole bounding box determining process can be rapidly completed. In the case of a software solution, full play of the hardware performance can still be achieved. The non-maximum value inhibition processing of a large number of bounding boxes is completed under the condition of occupying little resources, so that tasks which cannot be completed under the traditional method are completed. In addition, the speed can be further increased without taking occupancy into account by the high power consumption platform.
In one embodiment, before the step of obtaining the bitmap in the threads included in the thread group, the method further includes: determining a first bounding box and a second bounding box from target initial bounding boxes corresponding to target threads; the target thread is any thread in the thread group; the first bounding box traverses the target initial bounding box step by step, and the second bounding box is a bounding box in which the first bounding box is removed from the target initial bounding box; determining a degree of positional overlap between the first bounding box and the second bounding box; respectively obtaining matching values of the first bounding box and the second bounding box and the target object in size to obtain a first matching value and a second matching value; if the position overlapping degree is higher than a preset threshold (the size of the threshold can be determined according to practical situations, for example, 0.8, etc.), and the second matching value is higher than the first matching value, determining a bit value corresponding to the first bounding box on a target bitmap as 0, so as to obtain bit values of all bits in the target bitmap; the target bitmap is a bitmap corresponding to the target thread.
Wherein the positional overlap may be characterized by the size of the area of the bounding box intersection.
Further, in one embodiment, the step of determining the bit value corresponding to the first bounding box on the target bitmap to be 0 to obtain the bit value of each bit in the target bitmap includes: determining the number of the first bounding box; the number corresponds to a certain bit in the target bitmap; filling 0 as a bit value into a corresponding bit of the target bitmap according to the number; if the first bounding box traverses all the target initial bounding boxes, filling 1 into the rest bits to obtain bit values of all bits in the target bitmap; the remaining bits are bits in the target bitmap that are not filled with 0's.
The process of filling the bitmap according to the information of the initial bounding box can be understood as a mapping process in this embodiment. Taking fig. 4 as an example, the mapping process of the present embodiment may be as follows:
1. the 0 th thread is determined as the target thread, and each 32 bitmap in the target thread is initialized to 0xFFFFFFFF. Each bit of this 32 xn memory bitmap is padded with the following pad logic:
1.1, determining a bounding box corresponding to a bit (bit) with a sequence number of 1 as a first bounding box C11, determining the intersection degree between the first bounding box and a second bounding box under other sequence numbers of 1 to (32 multiplied by n-1), if the intersection area of the first bounding box C11 and a certain second bounding box C12 is larger than a certain threshold value, setting the bit with the sequence number of 1 as 0 if the matching degree score of the second bounding box C12 is higher than that of the first bounding box C11, otherwise, if the first bounding box C11 and all the second bounding boxes do not meet the conditions after being compared, indicating that the first bounding box C11 is not covered by other bounding boxes, and therefore, keeping the bit in a bitmap unchanged (still kept as 1).
1.2, determining a bounding box corresponding to a bit with the sequence number of 2 as a first bounding box C21, determining the intersection degree between the first bounding box C21 and a second bounding box under other sequence numbers of 2 to (32 multiplied by n-1), and determining the bit value of C21 in the bitmap in the same manner as 1.1.
1.3, traversing each initial bounding box corresponding to the 0 th thread step by step according to the method, and completing the mapping process of the bitmap in the 0 th thread.
2. The 1 st thread is determined to be the target thread, and each 32-bit map in the target thread is initialized to 0 xFFFFFFFFFF. Each bit of this 32 xn memory bitmap is padded with its padding logic referring to thread 0.
3. And traversing other threads in the thread group step by step according to the method, and completing the mapping process of each thread bitmap in the thread group.
The mapping for one thread can be considered as one (one pass) calculation, and the mapping of 32x n bounding boxes can be performed.
The mapping process can enable the initial bounding boxes corresponding to the bits of the bitmap in the same thread to be mutually independent, and the initial bounding boxes can be understood to be roughly screened from the initial bounding boxes, and the overlapped initial bounding boxes in the same thread are removed.
In one embodiment, the step of obtaining the bitmap in the threads included in the thread group includes: acquiring a bitmap in threads contained in a thread group through a target kernel function; the step of running each thread in the thread group in parallel to enable the bitmap in the first remaining thread and the bitmap in the first target thread to perform the first bit AND operation includes: and running each thread in the thread group in parallel through the target kernel function so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation.
The process of filling the information of the initial bounding box into each bit of the bitmap may be understood as a mapping process, and the process of operating the bitmap to determine the target bounding box from among the numerous initial bounding boxes may be understood as a reduction process. The mapping process of the embodiment can be realized through a mapping module in a kernel function, namely the mapping module is responsible for counting the situation of intersection between bounding boxes; the reduction process can be implemented by a reduction module in the kernel function, i.e. the reduction module is responsible for counting bounding boxes meeting the conditions. The mapping module and the reduction module in the kernel function may be as shown in fig. 6.
The mapping process and the simplifying process in this embodiment are completed in the same kernel function, so, as shown in fig. 7, this method only needs to occupy one kernel function (the kernel) of the 8 stream multiprocessors (Streaming Multiprocessors, SM) on the GPU. Therefore, the present embodiment saves the synchronization of the memory and the cache (when the simplifying process is performed, the result obtained in the mapping process does not need to be read from the cache, and the processing time is effectively reduced), and for the CPU, the time for the GPU to submit the task and allocate the task is effectively reduced.
In one embodiment, each thread in the thread group includes a preset number of bitmaps arranged horizontally; the preset number is determined according to hardware parameters; the step of running each thread in the thread group in parallel to enable the bitmap in the first remaining thread and the bitmap in the first target thread to perform the first bit AND operation includes: and running all threads in the thread group in parallel so as to change the bitmap in the first residual thread from horizontal to vertical, and carrying out first bit and operation on the bitmap in the first residual thread in the vertical direction and the bitmap in the first target thread in the horizontal direction to obtain a result of the first bit and operation in the vertical direction.
The bitmap in the transverse arrangement may be as shown in fig. 4, where m is the aforementioned preset number. The hardware parameter may refer to the number of threads of the GPU in the computer device, for example, the inflight series graphics card, and the preset number may be 32.
And performing a first bit and operation on the bitmap in the first residual thread in the vertical direction and the bitmap in the first target thread in the horizontal direction, so as to obtain a result of the first bit and operation in the vertical direction, as shown in fig. 8, and performing an and operation on each bit value in the horizontal direction and each bit value in the vertical direction in fig. 8, so as to obtain a result in the vertical direction on the right side. As shown in fig. 8, after performing bit and operation, the bitmaps in two directions become bitmaps in only one direction, that is, after performing an operation cycle (all the first remaining threads and the first target thread complete bit and operation and are considered as one operation cycle), the result in the current thread participating in simplification can be simplified to the first half of threads. Further, the first bit and operation described above may be performed more than once, but multiple times, until all bitmaps within a thread group are collected into one thread (thread 0 may be selected). As shown in fig. 9, assuming that one thread group includes 32 threads (32 b), after completing a bit and operation, the threads in the thread group become 16 threads, and based on the 16 threads, continue to perform bit and operation, become 8 threads, and so on, until all bitmaps are concentrated into one thread. In the traditional method, the information of an initial bounding box is stored through the int, one int stores one initial bounding box, bitmaps in threads need to be operated one by one when bit and operation are carried out, and only n bits and operations can be carried out in one cycle; in the embodiment of the invention, the information of the initial bounding box is stored by the bits in the bitmap, and the processing mode of n multiplied by 32 bits and operation can be carried out in one cycle by taking the 32-bit bitmap as an example, so that the operation speed is effectively improved and the determination efficiency of the bounding box is improved. Based on the above, the content of m times of operation in the traditional method can be finished through log (m) times of operation, and the speed is increased by m/log (m). Where, for an inflight GPU, m=32.
In the embodiment of the invention, because each bit in the bitmap is mapped with the initial bounding box respectively, the first-stage bit and operation are carried out, and the aim of determining the overlapping condition between every two initial bounding boxes and further removing the overlapping bounding boxes is not realized, so that the second-stage bit and operation can be carried out. If the second-level bit and operation is performed, the first-level bit and operation may be understood as one-level reduction (a schematic diagram of one-level reduction may be shown in fig. 10), and the second-level bit and operation may be understood as two-level reduction. In some embodiments, the second-stage bit and operation may be omitted, or the target bounding box may be determined from the result of the first-stage bit and operation by other methods, for example, the initial bounding box with the matching value higher than the preset threshold is determined as the target bounding box according to the initial bounding box left by the result of the first-stage bit and operation.
In one embodiment, the step of obtaining the overlapping information between the initial bounding boxes corresponding to the threads of the thread group according to the result of the first bit and the operation includes: acquiring a shared memory space matched with a bitmap in any thread in size; writing the first bit and the result of the operation into the shared memory space in an atomic and (atom and) mode; changing the bitmap in the shared memory space from the vertical direction to the horizontal direction, and filling the horizontal bitmap into the threads of the thread group respectively; running each thread in the thread group in parallel so that the bitmap in the second residual thread performs second bit AND operation on the bitmap in the second target thread; the second target thread is selected from the thread group, and the second remaining threads are threads in the thread group, from which the second target thread is removed; and according to the second bit and the result of the operation, obtaining the overlapping information between the initial bounding boxes corresponding to the threads of the thread group.
Further, in an embodiment, the step of obtaining the overlapping information between the initial bounding boxes corresponding to the threads of the thread group according to the second bit and the result of the operation includes: and determining two initial bounding boxes with the second bit and the result of 1 as mutually independent bounding boxes, and determining two initial bounding boxes with the second bit and the result of 0 as mutually overlapped bounding boxes to obtain the overlapped information. Therefore, the initial bounding box with the second bit and the operation result being 1 is determined, and the corresponding target bounding box is obtained.
For a computer device, the embodiment where the size of the bitmap included in each thread is the same obtains a shared memory space that matches the size of the bitmap of the thread. Further, the size of the shared memory space may be a shared memory space of 32 bits×n (n may take on a value of 1024), each 32 bits being initialized to 0xFFFFFFFF, and this shared memory being shared by all threads in the thread group.
After first-level simplification, bitmaps of each thread group are concentrated into one thread, as shown in fig. 11, bitmaps of the first thread group are concentrated into the 0 th thread, bitmaps of the second thread group are concentrated into the 32 nd thread (m), bitmaps of the third thread group are concentrated into the 64 th thread (2 m), and so on (fig. 11 includes n thread groups, and n/m atoms and operations are needed to write a 32-bit bitmap shared memory). Because bit and operations cannot be directly performed between thread groups, bitmaps in these thread groups are concentrated into one thread group by sharing memory space. Specifically, bitmaps in the thread groups are concentrated into the shared memory space in an atomic and mode (wherein, atomic and is to fill bitmaps in a certain thread into corresponding positions of the shared memory space), and bitmaps in the shared memory space are bit and operated by the threads in the certain thread group. Further, performing bit and operation on the bitmap in the shared memory space can be understood as two-level simplification, and the implementation process can be as shown in fig. 12.
Further, after the second-level simplification is performed, each thread sets the search bit in the shared memory of the 32-bit bitmap where the thread is located to be at the position of 1, and the position bounding box is the target bounding box which is finally reserved.
The method for determining the bounding box provided by the embodiment has the following beneficial effects: the non-maximum value suppression is realized through the GPU, and further, the parallelism of the program is improved to the maximum extent through the two-dimensional simplification of the thread group-thread block and the two-dimensional simplification of the flag bit memory-bitmap in the GPU structure, the parallel high-performance operation is realized, the bandwidth of each level of storage unit in the GPU is utilized to the maximum extent, the speed is greatly improved, the occupancy rate is reduced, and more and faster NMS detection can be realized by adopting fewer GPU hardware resources.
Further, a scene is identified for the traffic light of the autopilot platform. On the low-power-consumption Inlet-Weida platform (about 50W), under the condition of occupying only one stream multiprocessor (1/8 total resource), the non-maximum value inhibition processing of thousands of bounding boxes can be realized by only hundreds of microseconds (us). Thus meeting the scene requirement under the condition of almost zero occupation.
Further, after determining the target bounding box, the computer device may highlight the target bounding box in the interface to alert the user to where the target object is located.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a portion of the steps in fig. 2 may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed in rotation or alternatively with at least a portion of the steps or stages in other steps or other steps.
Based on the same ideas as the bounding box determination method in the above embodiment, the present invention also provides a bounding box determination apparatus that can be used to perform the bounding box determination method described above. For ease of illustration, only those portions of the schematic structural diagram of an embodiment of a determination device of a bounding box are shown, and those skilled in the art will appreciate that the illustrated structure is not limiting of the device and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.
In one embodiment, as shown in fig. 13, a bounding box determining apparatus 1300 is provided, which may employ a software module or a hardware module, or a combination of both, as a part of a computer device, and specifically includes: a bitmap acquisition module 1301, a bitmap operation module 1302, an overlapping information determination module 1303, and a bounding box determination module 1304, wherein:
a bitmap acquiring module 1301, configured to acquire a bitmap in a thread included in a thread group; the bitmap of any thread corresponds to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object.
The bitmap operation module 1302 is configured to run each thread in the thread group in parallel, so that the bitmap in the first remaining thread and the bitmap in the first target thread perform a first bit and operation; the first target thread is selected from the thread group, and the first remaining threads are threads in the thread group, from which the first target thread is removed.
And the overlapping information determining module 1303 is configured to obtain overlapping information between initial bounding boxes corresponding to each thread of the thread group according to the result of the first bit and operation.
And a bounding box determining module 1304, configured to determine, according to the overlapping information, an independent bounding box from initial bounding boxes corresponding to the threads of the thread group, where the independent bounding box is used as a target bounding box of the target object.
In the bounding box determining device, the information of the initial bounding boxes is recorded through the bitmaps in the threads, one bitmap can record the information of a plurality of initial bounding boxes, the initial bounding boxes in the same thread are independent bounding boxes, namely, the initial bounding boxes corresponding to the same thread are not overlapped, overlapping information among the initial bounding boxes corresponding to each thread of the thread group is rapidly determined through the mode of parallel running of the threads in the thread group, further, mutually independent target bounding boxes are determined from the initial bounding boxes, and the whole bounding box determining process can be rapidly completed.
In one embodiment, the apparatus 1300 for determining a bounding box further includes: the bounding box selection module is used for determining a first bounding box and a second bounding box from target initial bounding boxes corresponding to target threads; the target thread is any thread in the thread group; the first bounding box traverses the target initial bounding box step by step, and the second bounding box is a bounding box in which the first bounding box is removed from the target initial bounding box; the overlapping degree determining module is used for determining the position overlapping degree between the first bounding box and the second bounding box; the matching value determining module is used for respectively obtaining the matching values of the first bounding box, the second bounding box and the target object in size to obtain a first matching value and a second matching value; the bit value determining module is configured to determine, if the position overlapping degree is higher than a preset threshold and the second matching value is higher than the first matching value, a bit value corresponding to the first bounding box on a target bitmap to be 0, so as to obtain bit values of bits in the target bitmap; the target bitmap is a bitmap corresponding to the target thread.
In one embodiment, the bit value determination module comprises: a number determination submodule for determining the number of the first bounding box; the number corresponds to a certain bit in the target bitmap; a first bit value filling sub-module, configured to fill 0 as a bit value into a corresponding bit of the target bitmap according to the number; the second bit value filling submodule is used for filling 1 into the residual bits if the first bounding box traverses all the target initial bounding boxes to obtain bit values of all bits in the target bitmap; the remaining bits are bits in the target bitmap that are not filled with 0's.
In one embodiment, the bitmap obtaining module 1301 is further configured to obtain, by using the target kernel function, a bitmap in a thread included in the thread group; the bitmap operation module 1302 is further configured to run each thread in the thread group in parallel through the target kernel function, so that the bitmap in the first remaining thread and the bitmap in the first target thread perform a first bit and operation.
In one embodiment, each thread in the thread group includes a preset number of bitmaps arranged horizontally; the preset number is determined according to hardware parameters; the bitmap operation module 1302 is further configured to run each thread in the thread group in parallel, so that the bitmap in the first remaining thread is changed from horizontal to vertical, and perform first bit and operation on the bitmap in the vertical first remaining thread and the bitmap of the horizontal first target thread, to obtain a result of the first bit and operation in the vertical direction.
In one embodiment, the overlay information determination module 1303 includes: the memory space acquisition sub-module is used for acquiring a shared memory space which is matched with the bitmap in any thread in size; the memory space writing sub-module is used for writing the first bit and the result of the operation into the shared memory space in an atomic and mode; the thread filling submodule is used for changing the bitmap in the shared memory space from the vertical direction to the horizontal direction and filling the horizontal bitmap into the threads of the thread group respectively; the bitmap operation submodule is used for running all threads in the thread group in parallel so that the bitmap in the second residual thread carries out second bit AND operation on the bitmap in the second target thread; the second target thread is selected from the thread group, and the second remaining threads are threads in the thread group, from which the second target thread is removed; and the overlapping information determining submodule is used for obtaining overlapping information between initial bounding boxes corresponding to each thread of the thread group according to the second bit and the operation result.
In one embodiment, the overlapping information determining submodule is further configured to determine two initial bounding boxes with a result of the second bit and operation being 1 as bounding boxes independent of each other, determine two initial bounding boxes with a result of the second bit and operation being 0 as bounding boxes overlapping with each other, and obtain the overlapping information.
For specific limitations of the determination means of the bounding box, reference may be made to the above limitation of the determination method of the bounding box, and no further description is given here. The respective modules in the above-described bounding box determination apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A method of determining a bounding box, the method comprising:
acquiring a bitmap in threads contained in a thread group; the bitmap of any thread is corresponding to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object;
running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation; the first target thread is selected from the thread group, and the first residual threads are threads in the thread group, from which the first target thread is removed;
According to the first bit and the operation result, obtaining the overlapping information between the initial bounding boxes corresponding to each thread of the thread group;
and determining an independent bounding box from the initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding box as a target bounding box of the target object.
2. The method of claim 1, wherein prior to the step of obtaining the bitmap in the threads included in the thread group, further comprising:
determining a first bounding box and a second bounding box from target initial bounding boxes corresponding to target threads; the target thread is any thread in the thread group; the first bounding box traverses the target initial bounding box step by step, and the second bounding box is a bounding box in which the first bounding box is removed from the target initial bounding box;
determining a degree of positional overlap between the first bounding box and the second bounding box;
respectively obtaining matching values of the first bounding box and the second bounding box and the target object in size to obtain a first matching value and a second matching value;
if the position overlapping degree is higher than a preset threshold value and the second matching value is higher than the first matching value, determining a bit value corresponding to the first bounding box on a target bitmap as 0, and obtaining the bit value of each bit in the target bitmap; the target bitmap is a bitmap corresponding to the target thread.
3. The method of claim 2, wherein the step of determining the bit value of the first bounding box corresponding to the target bitmap as 0 to obtain the bit value of each bit in the target bitmap includes:
determining the number of the first bounding box; the number corresponds to a certain bit in the target bitmap;
filling 0 as a bit value into a corresponding bit of the target bitmap according to the number;
if the first bounding box traverses all the target initial bounding boxes, filling 1 into the rest bits to obtain bit values of all bits in the target bitmap; the remaining bits are bits in the target bitmap that are not filled with 0's.
4. The method of claim 1, wherein the step of obtaining a bitmap in threads included in the thread group comprises:
acquiring a bitmap in threads contained in a thread group through a target kernel function;
the step of running each thread in the thread group in parallel to enable the bitmap in the first remaining thread and the bitmap in the first target thread to perform the first bit AND operation includes:
and running each thread in the thread group in parallel through the target kernel function so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation.
5. The method of claim 1, wherein each thread in the thread group comprises a predetermined number of laterally arranged bitmaps; the preset number is determined according to hardware parameters;
the step of running each thread in the thread group in parallel to enable the bitmap in the first remaining thread and the bitmap in the first target thread to perform the first bit AND operation includes:
and running all threads in the thread group in parallel so as to change the bitmap in the first residual thread from horizontal to vertical, and carrying out first bit and operation on the bitmap in the first residual thread in the vertical direction and the bitmap in the first target thread in the horizontal direction to obtain a result of the first bit and operation in the vertical direction.
6. The method of claim 5, wherein the step of obtaining overlapping information between initial bounding boxes corresponding to the threads of the thread group according to the result of the first bit and operation includes:
acquiring a shared memory space matched with a bitmap in any thread in size;
writing the first bit and the result of the operation into the shared memory space in an atomic and mode;
changing the bitmap in the shared memory space from the vertical direction to the horizontal direction, and filling the horizontal bitmap into the threads of the thread group respectively;
Running each thread in the thread group in parallel so that the bitmap in the second residual thread performs second bit AND operation on the bitmap in the second target thread; the second target thread is selected from the thread group, and the second remaining threads are threads in the thread group, from which the second target thread is removed;
and according to the second bit and the result of the operation, obtaining the overlapping information between the initial bounding boxes corresponding to the threads of the thread group.
7. The method of claim 6, wherein the step of obtaining the overlapping information between the initial bounding boxes corresponding to the threads of the thread group according to the result of the second bit and operation includes:
and determining two initial bounding boxes with the second bit and the result of 1 as mutually independent bounding boxes, and determining two initial bounding boxes with the second bit and the result of 0 as mutually overlapped bounding boxes to obtain the overlapped information.
8. A bounding box determination apparatus, the apparatus comprising:
the bitmap acquisition module is used for acquiring bitmaps in threads contained in the thread group; the bitmap of any thread is corresponding to at least one independent initial bounding box, and the initial bounding box is a detection frame obtained by detecting an image target of a target object;
The bitmap operation module is used for running each thread in the thread group in parallel so as to enable the bitmap in the first residual thread and the bitmap in the first target thread to carry out first bit AND operation; the first target thread is selected from the thread group, and the first residual threads are threads in the thread group, from which the first target thread is removed;
the overlapping information determining module is used for obtaining overlapping information among initial bounding boxes corresponding to each thread of the thread group according to the first bit and the operation result;
and the bounding box determining module is used for determining an independent bounding box from initial bounding boxes corresponding to the threads of the thread group according to the overlapping information, and taking the independent bounding box as a target bounding box of the target object.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
CN202010135898.9A 2020-03-02 2020-03-02 Bounding box determination method, device, computer equipment and storage medium Active CN111340790B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010135898.9A CN111340790B (en) 2020-03-02 2020-03-02 Bounding box determination method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010135898.9A CN111340790B (en) 2020-03-02 2020-03-02 Bounding box determination method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111340790A CN111340790A (en) 2020-06-26
CN111340790B true CN111340790B (en) 2023-06-20

Family

ID=71184108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010135898.9A Active CN111340790B (en) 2020-03-02 2020-03-02 Bounding box determination method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111340790B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112021007439T5 (en) * 2021-03-31 2024-01-25 Nvidia Corporation GENERATION OF BOUNDARY BOXES
CN114595070B (en) * 2022-05-10 2022-08-12 上海登临科技有限公司 Processor, multithreading combination method and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109377552A (en) * 2018-10-19 2019-02-22 珠海金山网络游戏科技有限公司 Image occlusion test method, apparatus calculates equipment and storage medium
CN110276317A (en) * 2019-06-26 2019-09-24 Oppo广东移动通信有限公司 A kind of dimension of object detection method, dimension of object detection device and mobile terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10692270B2 (en) * 2017-08-18 2020-06-23 Microsoft Technology Licensing, Llc Non-divergent parallel traversal of a bounding volume hierarchy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109377552A (en) * 2018-10-19 2019-02-22 珠海金山网络游戏科技有限公司 Image occlusion test method, apparatus calculates equipment and storage medium
CN110276317A (en) * 2019-06-26 2019-09-24 Oppo广东移动通信有限公司 A kind of dimension of object detection method, dimension of object detection device and mobile terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张正昌 ; 何发智 ; 周毅 ; .基于动态任务调度的层次包围盒构建算法.计算机辅助设计与图形学学报.2018,(03),全文. *

Also Published As

Publication number Publication date
CN111340790A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN110546611B (en) Reducing power consumption in a neural network processor by skipping processing operations
US20220129752A1 (en) Memory bandwidth reduction techniques for low power convolutional neural network inference applications
US20210398287A1 (en) Image processing method and image processing device
US11055516B2 (en) Behavior prediction method, behavior prediction system, and non-transitory recording medium
US9971959B2 (en) Performing object detection operations via a graphics processing unit
US20190266747A1 (en) Object detection method, device, system and storage medium
CN110990516B (en) Map data processing method, device and server
CN103336758B (en) The sparse matrix storage means of a kind of employing with the sparse row of compression of local information and the SpMV implementation method based on the method
US11538244B2 (en) Extraction of spatial-temporal feature representation
US20150269773A1 (en) Graphics processing systems
CN111340790B (en) Bounding box determination method, device, computer equipment and storage medium
KR101609079B1 (en) Instruction culling in graphics processing unit
CN111985597B (en) Model compression method and device
Fan et al. Faster-than-real-time linear lane detection implementation using soc dsp tms320c6678
CN106709503A (en) Large spatial data clustering algorithm K-DBSCAN based on density
JP2014106736A (en) Information processor and control method thereof
CN107292002B (en) Method and device for reconstructing digital core
CN110399760A (en) A kind of batch two dimensional code localization method, device, electronic equipment and storage medium
CN109661671B (en) Improvement of image classification using boundary bitmaps
US10474574B2 (en) Method and apparatus for system resource management
US8953893B2 (en) System and method to determine feature candidate pixels of an image
CN105760484A (en) Crowd treading pre-warning method and system and server with system
Allegretti et al. Optimizing GPU-based connected components labeling algorithms
CN110580506A (en) Density-based clustering calculation method, device, equipment and storage medium
WO2020257517A1 (en) Optimizing machine learning model performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant