CN110738125A

CN110738125A - Method, device and storage medium for selecting detection frame by using Mask R-CNN

Info

Publication number: CN110738125A
Application number: CN201910885674.7A
Authority: CN
Inventors: 陈欣
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2020-01-31
Anticipated expiration: 2039-09-19
Also published as: WO2021051601A1; CN110738125B

Abstract

The invention relates to the technical field of image recognition, and provides a method, a device and a storage medium for selecting a detection frame by using Mask R-CNN, wherein the method comprises the following steps: performing example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline corresponding to the candidate detection frame; calculating the candidate detection box and the polygon respectivelyThe IOU value of the contour is larger than preset threshold IOU₁And the IOU value of the polygonal contour is larger than a second preset threshold IOU₂Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU₂Greater than predetermined threshold IOU₁. According to the invention, the detection precision of the detection frame is improved through the IOU secondary screening of the polygonal outline.

Description

Method, device and storage medium for selecting detection frame by using Mask R-CNN

Technical Field

The invention relates to the technical field of image recognition, in particular to methods and devices for selecting a detection frame by using Mask R-CNN and a storage medium.

Background

The video-based motion human body detection and tracking is applied to monitoring of dense places such as banks and railway stations with high safety requirements by , and the human body tracking of real-time scenes is complex, has other interference factors such as background change and shielding, and is difficult to meet the requirements of detection accuracy, robustness and real-time performance.

The current human detection and tracking method is realized by a rectangular search box. The method has the following disadvantages:

1. the search box evaluates the detection result through the IOU, and even if the search box accords with the IOU index, an interference image still exists;

2. the detection target classification of the search box is limited to a large class such as human or animal at present, and the classification of details such as male and female or old and young cannot be further distinguished by ;

3. when a human body is detected under a complex background, the human body is greatly influenced by the surrounding environment; for example, when the color of the clothes worn by pedestrians is similar to the background color or the background light changes greatly, the moving human body is difficult to be separated from the background;

4. when a shadow or a mirror exists in a scene, the complexity of the features in the search frame is increased, the detection of the search frame is interfered, and the misjudgment that the portrait in the mirror is a person or the shadow area is a person is caused; or moving objects, such as cars or swinging trees, fluctuating water levels, may exist in the scene, which also increases the complexity of the features in the search box and increases the detection difficulty.

In view of the above problems, there is a need for methods of detecting targets that better eliminate the interference and distinguish false targets and make classification more detailed.

Disclosure of Invention

The invention provides methods for selecting detection frames by using Mask R-CNN, an electronic device and a computer readable storage medium, which mainly obtain a rectangular frame and a polygonal outline point set of a target by using an example segmentation technology, primarily screen the obtained rectangular frame by an IOU value, secondarily screen the polygonal outline point set by the IOU value, and continue target detection by using the rectangular frame which meets the two screens as a target detection frame.

In order to achieve the above object, the present invention further provides methods for selecting a detection box by using Mask R-CNN, which are applied to an electronic device, and the methods include:

s110, carrying out example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof, S120, respectively calculating IOU values of the candidate detection frame and the polygonal outline, and when the IOU value of the candidate detection frame is larger than , presetting a threshold IOU₁And the IOU value of the polygonal contour is larger than a second preset threshold IOU₂Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU₂Greater than predetermined threshold IOU₁。

Preferably, the calculating the IOU value of the polygon contour includes calculating the IOU value of the polygon contour by a two-dimensional array mapping coding method, mapping the polygon contour and its prediction frame onto plane templates divided by line segment combinations in advance, respectively, wherein the line segment combinations divide the plane templates into equal-sized blocks, mapping the polygon contour and its prediction frame onto a two-dimensional map equal to the plane templates, and representing each block as a two-dimensional array mapping coding (a, B) format, wherein a is assigned to the coding state of the polygon contour corresponding to a block, and B is assigned to the coding state of the prediction frame corresponding to a block, a is 1 when the block is located within the polygon contour, and a is 0 when the block is located outside the polygon contour, and B is 0 when the block is located outside the prediction frame.

Calculating an IOU value by counting codes of the segmentation blocks; here, IOU is the number of partitions coded as (1, 1/[ the number of partitions coded as (1, 0) + the number of partitions coded as (0, 1) + the number of partitions coded as (1, 1) ].

Preferably, the calculating the IOU value of the polygon outline comprises calculating the IOU value of the polygon outline by an intersection area method; the intersection set area method comprises the following steps: obtaining key points of the polygon outline and a prediction frame thereof, and labeling the key points, wherein the key points comprise each vertex of the polygon outline and the prediction frame thereof and each intersection point of the polygon outline and the prediction frame thereof; the intersection points and points inside the intersection points form a point set of an intersection polygon through sequencing; and calculating the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, and calculating the IOU value of the polygon outline according to the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, wherein the IOU is the area of the intersection polygon/(the area of the polygon outline + the area of the prediction frame-the area of the intersection polygon).

Preferably, the th preset threshold IOU₁And the second preset threshold IOU₂The value ranges of (A) are all 0.5-0.7.

Preferably, after the screening out the candidate detection frame as the target detection frame, the method further includes: carrying out two-dimensional array mapping coding on all the screened candidate detection frames; carrying out coincidence degree comparison on the coded candidate detection frames; and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.

In order to achieve the above object, the invention provides electronic devices, which include a memory and a processor, wherein the memory includes a selection program of a detection frame, and the selection program of the detection frame is executed by the processor to implement the following steps of S110, performing example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof, S120, respectively calculating IOU values of the candidate detection frame and the polygonal outline and comparing the IOU values with preset thresholds thereof, wherein the preset threshold of the candidate detection frame is IOU₁The preset threshold value of the polygonal contour is IOU₂，IOU₂Greater than IOU₁(ii) a S130, screening the IOU value of the candidate detection box to be larger than the IOU₁And the IOU value of the polygon outline is larger than the IOU₂Preferably, the calculating of the IOU value of the polygon contour includes calculating the IOU value of the polygon contour by a two-dimensional array mapping coding method, S210 mapping the polygon contour and its prediction frame onto plane templates divided by line segment combinations in advance, wherein the line segment combinations divide the plane templates into equal-sized blocks, S220 mapping the polygon contour and its prediction frame onto equal-sized binary maps with the plane templates, and representing each block as a two-dimensional array of mapping codes (a, B), wherein a block is assigned to the polygon contour in a coding state and a block is assigned to the prediction frame in a coding state, a is 1 when the block is located inside the polygon contour, a is 0 when the block is located outside the polygon contour, B is 1 when the block is located inside the prediction frame, B is 0 when the block is located outside the prediction frame, and the number of blocks is 0, S230 is 0, the number of blocks is 1, and the number of blocks (i.e., + 1) is 1, the number of coding blocks is 1, wherein the number of blocks is 1 (i 1, the number of blocks is 1, 1 + the number of coding blocks (i 1, i.]Preferably, the th preset threshold IOU₁And the second preset threshold IOU₂The value ranges of (A) are all 0.5-0.7. Preferably, after the screening out the candidate detection frame as the target detection frame, the method further includes: carrying out two-dimensional array mapping coding on all the screened candidate detection frames; carrying out coincidence degree comparison on the coded candidate detection frames; and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.

In addition, to achieve the above object, the present invention further provides computer-readable storage media, which store a computer program including a detection box selection program, wherein the detection box selection program, when executed by a processor, implements the steps of the above method for selecting a detection box using Mask R-CNN.

The invention provides a method for selecting a detection box by using Mask R-CNN (Mask R-CNN), an electronic device and a computer readable storage medium, wherein a monitoring image is continuously curled and pooled in a deep Neural Network by using an operation method of a Mask R-CNN (Mask region-based connected Neural Network) Neural Network, key features of the image are extracted and processed by using a Neural Network algorithm to obtain a detection result and a class (namely a rectangular frame of an object in the image), an overlapped part between the obtained rectangular frame and a real target is subjected to IOU value primary screening, then steps are carried out to obtain a polygon point set (namely a polygon outline obtained by example segmentation) by using the Mask, polygons between the polygon point set and the real target are subjected to secondary screening of the IOU value, and a frame which finally meets a set threshold value is taken as a detection box, and the beneficial effects are as follows:

(1) obtaining a polygon point set of a target through a Mask of a Mask R-CNN, and reducing a pixel range (namely reducing a bounding box range) on the basis of a rectangular candidate frame, thereby realizing more detailed target classification;

(2) according to the characteristics of the shadow, analysis methods for judging whether the mirror image exists are formed by combining with two-dimensional array coding, so that the aim of eliminating the false target of the shadow is fulfilled;

(3) the IOU of the polygonal contour is calculated in a two-dimensional group coding mode, so that the method is accurate and fast;

(4) and selecting the candidate frame, firstly carrying out primary screening on the IOU of the candidate frame, then carrying out secondary screening on the IOU of the polygonal point set, and further carrying out regression to obtain a more accurate target detection frame.

Drawings

FIG. 1 is a flowchart of a preferred embodiment of a method for selecting a test box using Mask R-CNN according to the present invention;

FIG. 2 is a flowchart of a preferred embodiment of the method for calculating IOU value by using two-dimensional array mapping coding method according to the present invention;

FIG. 3 is a diagram illustrating a two-dimensional array mapping encoding method according to a preferred embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an electronic device according to a preferred embodiment of the invention;

the objects, features, and advantages of the present invention are further described in with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that the words "", "second" are used herein only to distinguish the same names from each other, and do not imply a relationship or order between the names.

The purpose of target detection is to identify and locate objects of a specific class in a picture or video, and the detection process can be regarded as classification processes to distinguish a target from a background.

The invention provides methods for selecting a detection box by using Mask R-CNN, and referring to FIG. 1, the method can be executed by devices, and the devices can be realized by software and/or hardware.

Wherein, Mask R-CNN (Mask region-based connected Neural Network) is used for predicting the type of the detection object in the image, finely adjusting the frame and further segmenting the Mask of the polygon outline of the detection object; a bounding box (bounding box) is a smallest rectangular box that can contain a certain object of an image.

In this embodiment, the method for selecting a detection box by using Mask R-CNN includes: step S110-step S130.

S110, carrying out example segmentation on the target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof.

The Mask R-CNN example segmentation is divided into two steps, the action of step is to select the position and type of the selected candidate frame (i.e. to predict the class of the image object and to refine the frame), the selected candidate frame is a rectangle, and the action of step two is to segment the selected candidate frame into a polygon outline (obtained by Mask bridge).

S120, calculating the IOU values of the candidate detection frame and the polygon outline respectively, wherein the IOU value of the polygon outline is larger than a second preset threshold IOU₂Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU₂Greater than predetermined threshold IOU₁It should be noted that, the IOU (Intersection over Union), which is an Intersection ratio, may be understood as a degree of coincidence between a prediction box and a candidate detection box.

In specific embodiments, the preset threshold IOU₁And a second preset threshold IOU₂The setting can be carried out according to different scenes; moreover, in order to improve the detection precision of the rectangular detection frame, a second preset threshold IOU is set₂Greater than predetermined threshold IOU₁。

Firstly, th matching of the candidate detection box and the prediction target is carried out, and th matching result is screened, that is, the IOU value of the candidate detection box is larger than that of the candidate detection box₁Screening.

Then, carrying out the second matching of the polygon outline and the predicted target, and screening the second matching result, that is, carrying out the second matching of the polygon outline, wherein the IOU value of the polygon outline is larger than the IOU value₂Screening.

And taking the candidate detection frame after the two-time screening as a final target detection frame.

In a specific embodiment, the th preset threshold IOU₁And a second preset threshold IOU₂The value ranges of (A) are all 0.5-0.7.

In summary, the two branch results of the candidate detection frame and the polygonal outline obtained by dividing the Mask R-CNN example establish a new judgment relationship for the two parallel non-intersection branch results; carrying out IOU primary screening by using the candidate detection frame, and carrying out IOU secondary screening by using the polygonal outline; and further, the target detection frame with higher detection precision is obtained.

Referring to FIG. 2, a flow chart of a preferred embodiment of the method for calculating the IOU value using the two-dimensional array mapping coding method of the present invention is shown; fig. 2 shows that the method for calculating the IOU value using the two-dimensional array mapping encoding method includes the steps of: S210-S230;

s210, respectively mapping the polygonal outline and a prediction frame thereof to plane templates which are divided by line segment combinations in advance, wherein the line segment combinations divide the plane templates into equal-size division blocks;

referring to FIG. 3, a diagram of a two-dimensional array mapping coding method according to a preferred embodiment of the present invention is shown; fig. 3 shows an encoding process of the two-dimensional array mapping encoding method.

The right side is an object for target detection, and the outer side of the object is a polygonal outline; mapping the polygon outline to a binary image; as shown in fig. 3, the binary map is segmented into equal-sized blocks by segment selection and combination, and the blocks in the binary map include a block coded as 1 and a block coded as 0.

S220, respectively corresponding the mapping results of the polygonal outline and the prediction frame thereof to a binary image which is as large as the plane template, and representing each partition block into a mapping coding (A, B) form of a two-dimensional array; the coding state of the polygonal contour corresponding to the partition block is assigned to be A, and the coding state of the prediction frame corresponding to the partition block is assigned to be B; when the segmentation block is positioned in the polygonal contour, A is 1, and when the segmentation block is positioned outside the polygonal contour, A is 0; b is 1 when the partition block is located inside the prediction block and B is 0 when the partition block is located outside the prediction block.

As shown in fig. 3, the right-side human-shaped contour is mapped to the left-side binary map, and the segment is assigned to 1 when the segment is located inside the polygon contour and assigned to 0 when the segment is located outside the polygon contour. The assigned binary map is shown in fig. 3.

Specifically, each partition block may be assigned a different value when it corresponds to the polygon contour and the prediction frame corresponding to the polygon contour, if partition blocks are within both the polygon contour and the prediction frame of the polygon contour, the partition block is coded as (1, 1), if partition blocks are only within the polygon contour and not within the prediction frame of the polygon contour, the partition block is coded as (1, 0), if partition blocks are not within the polygon contour and only within the prediction frame of the polygon contour, the partition block is coded as (0, 1), and if partition blocks are not within both the polygon contour and the prediction frame of the polygon contour, the partition block is coded as (0, 0), so that the four coding cases of (1, 1), (1, 0), (0, 1), and (0, 0) occur in the coding of the partition block.

S230, calculating an IOU value through counting codes of the partition blocks; here, IOU is the number of partitions coded as (1, 1/[ the number of partitions coded as (1, 0) + the number of partitions coded as (0, 1) + the number of partitions coded as (1, 1) ].

IOU ═ area of intersecting polygons/(polygon outline area + prediction box area-intersecting polygon area);

therefore, the area of the intersection polygon is the area of intersection between the polygon outline and its prediction box; the area of the union polygon is equal to the area of the polygon outline, the area of the prediction frame and the area of the intersection polygon; the area of intersection between the polygon outline and its prediction box, i.e. the area of all the partitions coded as (1, 1); and the area of the union polygon is equal to the area of the partition block coded as (1, 0) + the area of the partition block coded as (0, 1) + the area of the partition block coded as (1, 1); therefore, the area of the intersection polygon/the area of the union polygon is IOU, which is the number of the partition blocks coded as (1, 1)/p [ the number of the partition blocks coded as (1, 0) + the number of the partition blocks coded as (0, 1) + the number of the partition blocks coded as (1, 1) ].

In a specific embodiment, when a "shadow" or a "mirror" exists in a detected scene, a detection frame is generated for a detected target and a "mirror" (or a "shadow") of the target at the same time, which is very easy to cause misjudgment that two detected targets exist. Carrying out two-dimensional array mapping coding on all the obtained candidate detection frames; carrying out coincidence degree comparison on the coded candidate detection frames; and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.

The coincidence threshold here is set to 75%; that is, if the code overlap ratio of the two candidate detection frames reaches 75%, it is determined that there is interference such as mirroring or video, and the interference is eliminated.

In embodiments, the calculating the IOU value of the polygon outline includes calculating the IOU value of the polygon outline by an intersection area method, where the intersection area method includes S310, obtaining key points of the polygon outline and its prediction frame, and labeling the key points, where the key points include vertices of the polygon outline and its prediction frame and intersections of the polygon outline and its prediction frame, S320, forming a point set of an intersection polygon by sorting the intersections and points inside the intersections, and S330, calculating the areas of the polygon outline and its prediction frame and the area of the intersection polygon, and calculating the IOU value of the polygon outline according to the areas of the polygon outline and its prediction frame and the area of the intersection polygon, and IOU is the area of the intersection polygon/(polygon outline area + prediction frame area-polygon area).

The neural network structures for improving the detection precision of the rectangular detection frame based on Mask R-CNN comprise:

the Mask R-CNN generally divides the target pixel while realizing the target detection; in other words, a Mask branch network is added to the basic frame recognition architecture, wherein the Mask branch network is used for segmenting the target pixels, so as to obtain the target polygon outline point set.

CNN convolutional layers are followed by the RoI Align layers, followed by the mask layer, classifier, and RoI border correction training (fully connected layers). Wherein the Mask R-CNN inherits the RPN portion of the Faster R-CNN.

The process of executing the task comprises the steps of extracting features for a detected target image by using a shared convolutional Layer, then sending obtained feature maps into an RPN (resilient packet network), generating a frame to be detected (the position of RoI is designated) by the RPN, and conducting -time correction on a bounding box of the RoI, then constructing Fast R-CNN, selecting features corresponding to each RoI on the feature maps by the RoIAlign according to the output of the RPN, setting the dimension as a fixed value, finally classifying the frames by using a full connection Layer (FC Layer), conducting second correction on the target bounding box, and finally obtaining a candidate detection frame (box regression) and classification (classification).

And branches are head parts, the Mask R-CNN finally expands the output dimension of the RoIAlign to predict masks, namely, the result obtained by the Mask branch is a point set of the polygonal outline.

The method comprises the steps of predicting a Mask R-CNN, setting a hyperparameter of the Mask R-CNN model as a parameter value of a FAster R-CNN model before training the Mask R-CNN model, pre-training the hyperparameter by using ResNet50, ResNet101 and an FPN network, training the Mask R-CNN model by using a large number of samples to obtain the Mask R-CNN model, testing the Mask R-CNN model by using test samples after obtaining the Mask R-CNN model by training, and verifying the accuracy of the Mask R-CNN model.

In specific examples, the trained dataset was COCO train val35k with 80 object classes and 150 ten thousand object instances.

In specific embodiments, the result obtained by the detection of the trained Mask R-CNN model is stored in a distributed database, so that the trained Mask R-CNN model is updated by using the distributed database.

In conclusion, the input image is the target multi-angle image, and a sample library is formed; and (3) sending the sample into a Mask R-CNN detection and recognition model for training, extracting image characteristics from the convolutional layer, and finally obtaining a quasi-removed target classification frame, a corresponding target state and a polygon point set segmented by an example.

The invention provides methods for selecting a detection frame by using Mask R-CNN, which are applied to electronic devices 4. referring to FIG. 4, the invention is a schematic diagram of an application environment of a preferred embodiment of the method for selecting a detection frame by using Mask R-CNN.

In the present embodiment, the electronic device 4 may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.

The electronic device 4 includes: a processor 42, a memory 41, a communication bus 43, and a network interface 44.

The memory 41 includes at least types of readable storage Media, the at least types of readable storage Media can be non-volatile storage Media such as Flash memory, a hard disk, a multi-Media Card, a Card type memory 41, etc. in embodiments, the readable storage Media can be an internal storage unit of the electronic device 4, such as a hard disk of the electronic device 4. in further embodiments, the readable storage Media can also be an external memory 41 of the electronic device 4, such as a plug-in hard disk provided on the electronic device 4, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), etc.

In the present embodiment, the readable storage medium of the memory 41 is generally used for storing the selection program 40 and the like installed in the detection frame of the electronic device 4. The memory 41 may also be used to temporarily store data that has been output or is to be output.

Processor 42, which in embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, is configured to run program code stored in memory 41 or to process data, such as to execute a check box selector 40.

The communication bus 43 is used to realize connection communication between these components.

The network interface 44 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface), and is typically used to establish a communication link between the electronic apparatus 4 and other electronic devices.

Fig. 4 only shows the electronic device 4 with components 41-44, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.

Optionally, the electronic device 4 may further include a user interface, which may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other equipment with voice recognition function, a voice output device such as a sound box, a headset, etc., and optionally may also include a standard wired interface or a wireless interface.

embodiments may include an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch screen, etc. the display is used for displaying information processed in the electronic device 4 and displaying a visual user interface.

Optionally, the electronic device 4 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.

In the embodiment of the apparatus shown in fig. 4, the memory 41 as computer storage media may include an operating system and a selection program 40 of a detection frame, and the processor 42, when executing the selection program 40 of the detection frame stored in the memory 41, implements the following steps, S110, performing instance segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal contour corresponding to the candidate detection frame, S120, calculating IOU values of the candidate detection frame and the polygonal contour respectively, and the IOU value of the polygonal contour is greater than a second preset threshold IOU₂Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU₂Greater than predetermined threshold IOU₁。

In other embodiments, the selection routine 40 of the detection block may be further divided into or more modules, or more modules being stored in the memory 41 and executed by the processor 42 to implement the present invention.

Furthermore, an embodiment of the present invention further provides computer-readable storage media, where the computer-readable storage media include a selection program of a detection box, and the selection program of the detection box is executed by a processor in real timeThe method comprises the following operations of S110, carrying out example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof, S120, respectively calculating IOU values of the candidate detection frame and the polygonal outline, and when the IOU value of the candidate detection frame is larger than preset threshold IOU₁And the IOU value of the polygonal contour is larger than a second preset threshold IOU₂Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU₂Greater than predetermined threshold IOU₁。

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the above-mentioned method and electronic device for selecting a detection frame by using Mask R-CNN, and will not be described herein again.

In summary, the operation method using Mask R-CNN neural network of the invention comprises the steps of continuously rolling and pooling the monitored image in the deep neural network, extracting and processing the key features of the image by using neural network algorithm to obtain the rectangular frame of the object in the image, primarily screening the IOU value of the overlapping part between the rectangular frame and the real target, then secondarily screening the IOU value of the polygon between the polygon point set and the real target by using the polygon outline obtained by Mask, and finally using the frame conforming to the set threshold as the detection frame.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises an series of elements does not include only those elements, but may include other elements not expressly listed or inherent to such process, apparatus, article, or method.

Based on the understanding that the technical solution of the present invention per se or a part contributing to the prior art can be embodied in the form of a software product stored in storage media (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes several instructions for causing terminal devices (such as mobile phone, computer, server, or network device) to execute the methods described in the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1, A method for selecting detection box by Mask R-CNN, applied to electronic device, the method includes:

performing example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline corresponding to the candidate detection frame;

respectively calculating the IOU values of the candidate detection frame and the polygon outline, and when the IOU value of the candidate detection frame is greater than preset threshold IOU₁And the IOU value of the polygonal contour is larger than a second preset threshold IOU₂Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU₂Greater than predetermined threshold IOU₁。

2. The method of claim 1, wherein the calculating the IOU value of the polygon outline comprises calculating the IOU value of the polygon outline by a two-dimensional array mapping encoding method;

the two-dimensional array mapping coding method comprises the following steps:

mapping the polygonal outline and a prediction frame thereof to plane templates which are divided by line segment combinations in advance, wherein the line segment combinations divide the plane templates into equally large division blocks;

respectively corresponding the mapping results of the polygonal contour and the prediction frame thereof to a binary image which is as large as the plane template, and representing each partition block as a mapping coding (A, B) form of a two-dimensional array; the coding state of the polygonal contour corresponding to the partition block is assigned to be A, and the coding state of the prediction frame corresponding to the partition block is assigned to be B;

when the segmentation block is positioned in the polygonal contour, A is 1, and when the segmentation block is positioned outside the polygonal contour, A is 0; when the partition block is located inside the prediction box, B is 1, and when the partition block is located outside the prediction box, B is 0;

3. The method of claim 1, wherein the calculating the IOU value of the polygon outline comprises calculating the IOU value of the polygon outline by an intersection area method;

the intersection set area method comprises the following steps:

obtaining key points of the polygon outline and a prediction frame thereof, and labeling the key points, wherein the key points comprise each vertex of the polygon outline and the prediction frame thereof and each intersection point of the polygon outline and the prediction frame thereof;

the intersection points and points inside the intersection points form a point set of an intersection polygon through sequencing;

and calculating the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, and calculating the IOU value of the polygon outline according to the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, wherein the IOU is the area of the intersection polygon/(the area of the polygon outline + the area of the prediction frame-the area of the intersection polygon).

4. The method for selecting a detection box using Mask R-CNN as claimed in claim 1, wherein the th preset threshold IOU₁And the second preset threshold IOU₂The value ranges of (A) are all 0.5-0.7.

5. The method for selecting a detection box by using Mask R-CNN according to claim 2, further comprising, after screening out the candidate detection boxes as target detection boxes:

carrying out two-dimensional array mapping coding on all the screened candidate detection frames;

carrying out coincidence degree comparison on the coded candidate detection frames;

and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.

an electronic device, comprising a memory and a processor, wherein the memory includes a selection program of a test frame, and the selection program of the test frame realizes the following steps when executed by the processor:

7. The electronic device of claim 6, wherein calculating the IOU value for the polygon outline comprises calculating the IOU value for the polygon outline by a two-dimensional array mapping encoding method;

8. The electronic device of claim 6,

th preset threshold IOU₁And the second preset threshold IOU₂The value ranges of (A) are all 0.5-0.7.

9. The electronic device according to claim 7, further comprising, after said screening out the candidate detection frame as a target detection frame:

computer-readable storage medium, characterized in that it stores a computer program comprising a selection program of a test frame, which when executed by a processor implements the steps of the method of selecting a test frame using Mask R-CNN according to any of claims 1 to 5.