CN111310824A

CN111310824A - Multi-angle dense target detection inhibition optimization method and equipment

Info

Publication number: CN111310824A
Application number: CN202010091184.2A
Authority: CN
Inventors: 龚飞
Original assignee: Shanghai Dianze Intelligent Technology Co ltd; Zhongke Zhiyun Technology Co ltd
Current assignee: Shanghai Dianze Intelligent Technology Co ltd; Zhongke Zhiyun Technology Co ltd
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2020-06-19

Abstract

The invention aims to provide a multi-angle dense target detection inhibition optimization method and equipment, the inhibition principle of the application combines the intersection ratio IOU and the distance information of a target boundary box, the judgment mode clearly divides the boundary box of the same target, and carries out more reasonable inhibition processing on a redundant prediction result, thereby effectively avoiding the false inhibition operation of an original NMS algorithm on the multi-angle dense target under the conditions of rotation, clinging and the like, reducing the missing detection and the false detection and improving the practical application capability of a model. The method and the device can be particularly applied to commodity detection and identification.

Description

Multi-angle dense target detection inhibition optimization method and equipment

Technical Field

The invention relates to the field of computers, in particular to a multi-angle dense target detection inhibition optimization method and equipment.

Background

The Non-maximum suppression (NMS) algorithm is one of the most commonly used post-processing methods in the target detection task, and among many detection algorithms based on neural network models, such as yolo (young Only Look one), ssd (single Shot multi box detector), etc., the NMS mainly functions to suppress the redundant bounding box of the model detection to retain the final detection result.

The NMS algorithm is one of key algorithms in many current target detection frameworks, and has a certain decisive effect on accuracy of model detection performance, and a conventional NMS algorithm mainly suppresses a redundancy detection result according to an Intersection Over Unit (IOU) between bounding boxes (bbox).

As shown in FIG. 1, assuming that the model predicts two bounding boxes of bbox1 and bbox2 for a certain target, the confidence of bbox1 is P_bbox1Area is S_bbox1(ii) a Confidence of bbox2 is P_bbox2Area is S_bbox2(ii) a The intersection of the areas of bbox1 and bbox2 is S_{bbox1∩bbox2}. The IOU calculation formula of the areas of the bounding boxes bbox1 and bbox2 at this time is:

IOU＝S_{bbox1∩bbox2}/(S_bbox1+S_bbox2－S_{bbox1∩bbox2}) (formula 1)

Let the practically acceptable NMS throttling threshold be NMS _ threshold, if the IOU is greater than NMS _ threshold at this time, only the bounding box with the highest confidence is retained and the other bounding boxes are deleted.

Fig. 2 describes an implementation flow of a conventional NMS algorithm, and in a specific operation process:

firstly, sequencing the initial prediction results of the model according to the Sequence of confidence coefficient from large to small, and storing the initial Sequence 2;

then, the IOU between the head element and all following elements of the Sequence2 is calculated, the elements with the IOU greater than the threshold nms _ threshold are deleted from the following elements, and the head element of the Sequence2 is taken out and stored in the Sequence 1.

And continuing to perform the same operation on the remaining bounding boxes according to the method until the Sequence2 is empty, wherein the elements retained in the Sequence1 are the detection results output by the final model.

Under the condition that the target distribution is dense and disordered, the problems of target rotation and close attachment and the like are easy to occur, and at the moment, the existing NMS algorithm only takes the IOU as a judgment condition, so that some unreasonable inhibition problems are easy to cause, the model is missed to be detected, and the actual application capability of the model is reduced to a certain extent.

In particular, fig. 3 illustrates the common false suppression problem of heterogeneous and homogeneous NMS algorithms, wherein,

in the lower right corner of fig. 3a, there are two heterogeneous targets, a box target and a bottle target, and since these two targets are rotated and attached, the IOU between the bounding boxes of the model for predicting the two heterogeneous targets is high, which causes the false suppression operation of the conventional NMS algorithm, and makes the model have missed detection, as shown in fig. 3 b.

Two similar targets belonging to box categories exist in the lower right corner in fig. 3c, and as the two targets are in close contact with each other in a rotating manner, the IOU between the bounding boxes of the model for predicting the two similar targets is higher, so that false suppression operation of the conventional NMS algorithm is caused, and the model is subjected to missed detection, as shown in fig. 3 d.

Disclosure of Invention

The invention aims to provide a multi-angle dense target detection inhibition optimization method and equipment.

According to one aspect of the invention, a multi-angle dense object detection inhibition optimization method is provided, and the method comprises the following steps:

acquiring a picture of a target to be detected;

inputting the picture into a neural network model to obtain a plurality of initially predicted target boundary frames output by the neural network model and corresponding confidence coefficients;

and deleting the redundant initial predicted target bounding boxes based on the confidence degree of the initial predicted target bounding boxes, the intersection ratio and the distance between the initial predicted target bounding boxes, and taking the undeleted initial predicted target bounding boxes as final target bounding boxes.

Further, in the above method, the distance is a euclidean distance, and a calculation formula of the euclidean distance is as follows:

wherein, the coordinates of the central points of the two target boundary frames are respectively (x)₁,y₁) And (x)₂,y₂)。

Further, in the above method, based on the intersection ratio and the distance between the initially predicted target bounding boxes, deleting the redundant initially predicted target bounding boxes, and taking the remaining, i.e., undeleted, initially predicted target bounding boxes as final target bounding boxes, includes:

calculating the intersection ratio and the distance between every two initially predicted target bounding boxes;

deleting one initial predicted target boundary box with lower confidence degree from every two initial predicted target boundary boxes with the intersection ratio larger than a preset intersection ratio threshold value and the distance smaller than a preset distance threshold value;

and taking the undeleted initial predicted target bounding box as a final target bounding box.

Further, in the above method, deleting one of the initially predicted target bounding boxes with a lower confidence degree from every two initially predicted target bounding boxes whose intersection ratio is greater than a preset intersection ratio threshold and whose distance is less than a preset distance threshold, includes:

step S321, setting a preset intersection ratio threshold and a preset distance threshold, and initializing a first sequence of an empty sequence and a second sequence of the empty sequence;

step S322, storing each initially predicted target boundary box into the second sequence according to the sequence of the confidence degrees from large to small;

step S323, sequentially calculating the intersection ratio and the distance between the first initially predicted target boundary frame of the second sequence and each initially predicted target boundary frame behind the first initially predicted target boundary frame, and deleting a target boundary frame meeting the condition from the initially predicted target boundary frames behind the first initially predicted target boundary frame, wherein the intersection ratio of the target boundary frame meeting the condition and the first initially predicted target boundary frame is greater than the preset intersection ratio threshold, and the distance of the target boundary frame meeting the condition is less than the preset distance threshold;

step S324, extracting a first initially predicted target bounding box from the second sequence and storing the first initially predicted target bounding box into the first sequence;

step S325, repeating the steps S323 to S324 in sequence until the second sequence is empty;

step S326, outputting the first sequence.

According to another aspect of the present invention, there is also provided a multi-angle dense object detection suppression optimizing apparatus, wherein the apparatus comprises:

the acquisition device is used for acquiring a picture of the target to be detected;

the initial prediction device is used for inputting the picture into a neural network model so as to obtain a plurality of initial predicted target boundary boxes output by the neural network model and corresponding confidence coefficients;

and the screening device is used for deleting the redundant initial predicted target boundary frames and taking the undeleted initial predicted target boundary frames as final target boundary frames based on the confidence degree of the initial predicted target boundary frames, and the intersection ratio and the distance between the initial predicted target boundary frames.

Further, in the above device, the distance is a euclidean distance, and a calculation formula of the euclidean distance is as follows:

Further, in the above apparatus, the screening device includes:

the calculation module is used for calculating the intersection ratio and the distance between every two initially predicted target boundary boxes;

the screening module is used for deleting one initial predicted target boundary frame with lower confidence coefficient from every two initial predicted target boundary frames with the intersection ratio larger than a preset intersection ratio threshold and the distance smaller than a preset distance threshold;

and the output module is used for taking the undeleted initial predicted target bounding box as a final target bounding box.

Further, in the above device, the first sub-module is configured to set a preset intersection ratio threshold and a preset distance threshold, and initialize a first sequence of the null sequence and a second sequence of the null sequence;

the second submodule is used for storing each initially predicted target boundary box into the second sequence according to the sequence of confidence coefficients from large to small;

a third sub-module, configured to sequentially calculate an intersection ratio and a distance between a first initially predicted target bounding box of the second sequence and each initially predicted target bounding box subsequent to the first initially predicted target bounding box, and delete a eligible target bounding box from the initially predicted target bounding box subsequent to the first initially predicted target bounding box, where the eligible target bounding box is a target bounding box whose intersection ratio with the first initially predicted target bounding box is greater than the preset intersection ratio threshold and whose distance is less than a preset distance threshold;

a fourth sub-module for retrieving a first initially predicted target bounding box from said second sequence and storing it in said first sequence;

a fifth sub-module for repeatedly executing the third sub-module to the fourth sub-module in sequence until the second sequence is empty;

a sixth submodule for outputting the first sequence.

According to another aspect of the present invention, there is also provided a computing-based device, including:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

acquiring a picture of a target to be detected;

According to another aspect of the present invention, there is also provided a computer-readable storage medium having stored thereon computer-executable instructions, wherein the computer-executable instructions, when executed by a processor, cause the processor to:

acquiring a picture of a target to be detected;

Compared with the prior art, the method for suppressing the non-maximum value (NMS) for detecting the dense targets has the advantages that the suppression principle of the method combines the intersection ratio IOU and the distance information of the target boundary frames, the boundary frames of the same target are clearly divided by the distinguishing mode, the redundant prediction results are more reasonably suppressed, the false suppression operation of an original NMS algorithm on the multi-angle dense targets under the conditions of rotation, clinging and the like is effectively avoided, the missing detection and the false detection are reduced, and the practical application capability of the model is improved. The method and the device can be particularly applied to commodity detection and identification.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

FIG. 1 illustrates a conventional bounding box IOU suppression diagram;

FIG. 2 illustrates a flow chart of a conventional NMS algorithm operation;

FIG. 3a shows a graph of the original prediction results of the model;

FIG. 3b shows a heterogeneous fault rejection map of the conventional NMS algorithm of FIG. 3 a;

FIG. 3c shows a graph of the original prediction results of the model;

FIG. 3d shows a homogeneous false rejection plot of the conventional NMS algorithm of FIG. 3 c;

FIG. 4a shows a graph of the inhibition result of FIG. 3a applying the NMS method of the present invention;

FIG. 4b shows a graph of the inhibition result of FIG. 3c applying the NMS method of the present invention;

FIG. 5a is a schematic diagram of the original prediction box and center point of the model of FIG. 3 a;

FIG. 5b shows a schematic diagram of the original prediction box and center point of the model of FIG. 3 c;

fig. 6 shows a flow chart of the operation of the NMS algorithm of an embodiment of the invention.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

When the targets are densely distributed, the problem of rotation disordered arrangement is easy to occur, which can cause the error inhibition of the output result of the model by the existing NMS algorithm, and the application capability of the algorithm in the actual engineering is seriously influenced.

The invention provides a multi-angle dense target detection inhibition optimization method, which comprises the following steps:

step S1, acquiring a picture of a target to be detected;

step S2, inputting the picture into a neural network model to obtain a plurality of initially predicted target bounding boxes and corresponding confidence degrees output by the neural network model;

in step S3, based on the confidence of the initially predicted target bounding box, the distance between the initially predicted target bounding boxes, and the intersection ratio, the redundant initially predicted target bounding box is deleted, and the remaining, i.e., non-deleted, initially predicted target bounding box is used as the final target bounding box.

Here, each bounding box of the model prediction typically contains 5 values, which are the confidence (confidence), the coordinates of the center point of the detection box (x, y), and the length and width of the detection box (w, h), respectively. In order to solve the problem of unreasonable inhibition of the traditional NMS algorithm, more refined judgment is carried out by combining the distance between the boundary boxes on the basis of the original IOU judgment, so that the error inhibition operation is avoided.

In practical projects, instability of application scenes can bring great challenges to target detection and recognition algorithms.

The application provides a non-maximum suppression (NMS) method for detecting dense targets, the suppression principle of the application combines the intersection ratio IOU and the distance information of the boundary box of the target, the judgment mode clearly divides the boundary box of the same target, more reasonable suppression processing is carried out on redundant prediction results, false suppression operation of an original NMS algorithm on multi-angle dense targets under the conditions of rotation, clinging and the like is effectively avoided, missing detection and false detection are reduced, and the practical application capability of a model is improved. The method and the device can be particularly applied to commodity detection and identification.

Specifically, fig. 4a and 4b show the suppression effect of the NMS method proposed by the present application, fig. 3a shows an original prediction result graph of a model, fig. 4a shows a suppression result graph of the NMS method of the present invention applied to fig. 3a, fig. 3c shows another original prediction result graph of the model, and fig. 4b shows a suppression result graph of the NMS method of the present invention applied to fig. 3c, and it can be seen from the graphs that the NMS suppression algorithm proposed by the present application performs correct suppression, and well retains the optimal detection result of the model for each target prediction.

At present, in computer vision application, in a target detection framework of a YOLO series, an NMS algorithm is mostly used for inhibiting a redundant result of model prediction. In other object detection frameworks, such as SSD, fast-RCNN (fast-Regions with conditional Neural Networks), the present application is still applicable to solving the non-maximum false suppression problem they encounter.

In an embodiment of the multi-angle dense target detection inhibition optimization method, the distance is an euclidean distance, and a calculation formula of the euclidean distance is as follows:

In the invention, the redundant prediction result is more reasonably inhibited and processed by combining the information of the distance between the centers of the boundary boxes, thereby effectively solving the problem of error inhibition of the conventional NMS algorithm.

In particular, fig. 5a and 5b show the center point of each target bounding box in fig. 3a and 3c, respectively.

As can be seen from fig. 5a and 5b, the conventional model predicts a plurality of bounding boxes for the same object, and although the bounding boxes are not the same size, their center points are concentrated. Therefore, when the NMS method is designed, the boundary frames of the same target are more accurately screened according to the distance between the center points of the boundary frames, if the distance between the center points of some boundary frames is very small, for example, smaller than a preset distance threshold dis _ threshold, the boundary frames belonging to the same target can be considered as the boundary frames belonging to the same target, and then the boundary frames belonging to the same target are restrained according to a preset intersection ratio threshold NMS _ threshold, so that only the optimal detection frame is reserved, and the boundary frames of other targets cannot be restrained by mistake.

Suppose that two bounding boxes of the existing bbox1 and bbox2 have coordinates of (x) at the center point₁,y₁) And (x)₂,y₂) The Euclidean distance between the two is calculated as follows:

in an embodiment of the multi-angle dense object detection suppression optimization method of the present invention, in step S3, based on the intersection ratio and distance between the initially predicted object bounding boxes, the redundant initially predicted object bounding boxes are deleted, and the remaining, i.e., non-deleted, initially predicted object bounding boxes are used as final object bounding boxes, which includes:

step S31, calculating the intersection ratio and the distance between every two initially predicted target bounding boxes;

step S32, deleting one of the initial predicted target boundary frames with lower confidence degree from every two initial predicted target boundary frames with the intersection ratio larger than a preset intersection ratio threshold value and the distance smaller than a preset distance threshold value;

in step S33, the remaining, i.e., non-deleted, initially predicted target bounding box is used as the final target bounding box.

Here, in each two initially predicted target bounding boxes whose intersection ratio is greater than the preset intersection ratio threshold and whose distance is less than the preset distance threshold, one initially predicted target bounding box with a lower confidence is deleted, so that a reliable final target bounding box can be obtained.

As shown in fig. 6, in an embodiment of the multi-angle dense object detection suppression optimization method of the present invention, in step S32, in every two initially predicted object bounding boxes whose intersection ratio is greater than a preset intersection ratio threshold and whose distance is less than a preset distance threshold, one of the initially predicted object bounding boxes with a lower confidence is deleted, including:

step S321, setting a preset merging ratio threshold nms _ threshold and a preset distance threshold dis _ threshold, and initializing a first Sequence1 of the null Sequence and a second Sequence2 of the null Sequence;

step S322, storing each initially predicted target bounding box into the second Sequence2 according to the Sequence of confidence degrees from large to small;

step S323, sequentially calculating an intersection ratio IOU and a distance between a first initially predicted target bounding box of the second Sequence2 and each initially predicted target bounding box following the first initially predicted target bounding box, and deleting a target bounding box meeting a condition from the initially predicted target bounding boxes following the first initially predicted target bounding box, wherein the intersection ratio of the target bounding box meeting the condition to the first initially predicted target bounding box is greater than the preset intersection ratio and is greater than a threshold nms _ threshold, and the distance is less than a preset distance threshold dis _ threshold;

step S324, extracting a first initial predicted target bounding box from the second Sequence2 and storing the first initial predicted target bounding box in the first Sequence 1;

step S325, repeating the steps S323 to S324 in Sequence until the second Sequence2 is empty;

step S326, outputting the first Sequence 1.

Here, the output in step S326 is the first Sequence1 which is the final prediction result of the model, and the target bounding box in the first Sequence1 may be used as the final target bounding box.

According to the implementation, the final target boundary box can be accurately and efficiently obtained from the first Sequence1 by setting the preset intersection ratio threshold nms _ threshold and the preset distance threshold dis _ threshold and initializing the first Sequence1 and the second Sequence2 of the empty Sequence.

The invention provides a multi-angle dense target detection inhibition optimization device, which comprises:

and the screening device is used for deleting the redundant initial predicted target boundary frames and taking the residual initial predicted target boundary frames which are not deleted as final target boundary frames on the basis of the confidence degree of the initial predicted target boundary frames, and the intersection ratio and the distance between the initial predicted target boundary frames.

In an embodiment of the multi-angle dense target detection inhibition optimization device, the distance is an euclidean distance, and a calculation formula of the euclidean distance is as follows:

in an embodiment of the multi-angle dense target detection suppression optimizing apparatus of the present invention, the screening device includes:

and the output module is used for taking the residual, namely the undeleted, initial predicted target bounding box as a final target bounding box.

In an embodiment of the multi-angle dense target detection suppression optimizing apparatus of the present invention, the screening module includes:

the first submodule is used for setting a preset intersection ratio threshold value nms _ threshold and a preset distance threshold value dis _ threshold, and initializing a first Sequence1 of an empty Sequence and a second Sequence2 of the empty Sequence;

the second submodule is used for storing each initially predicted target boundary box into the second Sequence2 according to the Sequence of confidence coefficients from large to small;

a third sub-module, configured to sequentially calculate a cross-over ratio IOU and a distance between a first initially predicted target bounding box of the second Sequence2 and each initially predicted target bounding box following the first initially predicted target bounding box, and delete a eligible target bounding box from the initially predicted target bounding boxes following the first initially predicted target bounding box, where the eligible target bounding box is a target bounding box whose cross-over ratio with the first initially predicted target bounding box is greater than the preset cross-over ratio and is greater than a threshold nms _ threshold and the distance is less than a preset distance threshold dis _ threshold;

a fourth sub-module for retrieving a first initially predicted target bounding box from said second Sequence2 and storing in said first Sequence 1;

a fifth sub-module, configured to repeatedly execute the third sub-module to the fourth sub-module in Sequence until the second Sequence2 is empty;

a sixth sub-module for outputting said first Sequence 1.

Here, the output of the sixth sub-module is the first Sequence1, which is the final prediction result of the model, and the target bounding box in the first Sequence1 may be used as the final target bounding box.

a processor; and

acquiring a picture of a target to be detected;

For details of embodiments of each device and storage medium of the present invention, reference may be made to corresponding parts of each method embodiment, and details are not described herein again.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present invention may be implemented in software and/or in a combination of software and hardware, for example, as an Application Specific Integrated Circuit (ASIC), a general purpose computer or any other similar hardware device. In one embodiment, the software program of the present invention may be executed by a processor to implement the steps or functions described above. Also, the software programs (including associated data structures) of the present invention can be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Further, some of the steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present invention can be applied as a computer program product, such as computer program instructions, which when executed by a computer, can invoke or provide the method and/or technical solution according to the present invention through the operation of the computer. Program instructions which invoke the methods of the present invention may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the invention herein comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or solution according to embodiments of the invention as described above.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A multi-angle dense object detection inhibition optimization method comprises the following steps:

acquiring a picture of a target to be detected;

2. The method of claim 1, wherein the distance is a euclidean distance, the euclidean distance being calculated as follows:

3. The method of claim 1 or 2, wherein deleting redundant initial predicted target bounding boxes based on the intersection ratio and distance between the initial predicted target bounding boxes, and taking the remaining, i.e. non-deleted, initial predicted target bounding boxes as final target bounding boxes comprises:

4. The method of claim 3, wherein deleting one of the initial predicted target bounding boxes with lower confidence in every two of the initial predicted target bounding boxes with a cross ratio greater than a preset cross ratio threshold and a distance less than a preset distance threshold comprises:

step S326, outputting the first sequence.

5. A multi-angle dense object detection suppression optimization apparatus, wherein the apparatus comprises:

6. The apparatus of claim 5, wherein the distance is a Euclidean distance, and the Euclidean distance is calculated as follows:

7. The apparatus of claim 5 or 6, wherein the screening device comprises:

8. The device of claim 7, wherein the first sub-module is configured to set a preset intersection ratio threshold and a preset distance threshold, and initialize a first sequence of null sequences and a second sequence of null sequences;

a sixth submodule for outputting the first sequence.

9. A computing-based device, comprising:

a processor; and

acquiring a picture of a target to be detected;

10. A computer-readable storage medium having computer-executable instructions stored thereon, wherein the computer-executable instructions, when executed by a processor, cause the processor to:

acquiring a picture of a target to be detected;