CN110738125B

CN110738125B - Method, device and storage medium for selecting detection frame by Mask R-CNN

Info

Publication number: CN110738125B
Application number: CN201910885674.7A
Authority: CN
Inventors: 陈欣
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2023-08-01
Anticipated expiration: 2039-09-19
Also published as: CN110738125A; WO2021051601A1

Abstract

The invention relates to the technical field of image recognition, and provides a method, a device and a storage medium for selecting a detection frame by using Mask R-CNN, wherein the method comprises the following steps: performing instance segmentation on a target image by using Mask R-CNN to obtain rectangular candidate detection frames and polygonal outlines corresponding to the candidate detection frames; respectively calculating IOU values of the candidate detection frame and the polygonal outline; when the IOU value of the candidate detection frame is greater than a first preset threshold value IOU ₁ And the IOU value of the polygonal outline is larger than a second preset threshold value IOU ₂ Screening the candidate detection frames to serve as target detection frames; wherein the second preset threshold IOU ₂ Greater than a first preset threshold IOU ₁ . According to the invention, through the secondary screening of the IOU of the polygonal outline, the detection accuracy of the detection frame is improved.

Description

Method, device and storage medium for selecting detection frame by Mask R-CNN

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to a method, an apparatus, and a storage medium for selecting a detection frame by using Mask R-CNN.

Background

The video-based moving body detection and tracking is widely applied to the monitoring of people-intensive places with higher safety requirements such as banks, railway stations and the like. The human body tracking of the real-time scene is complex, other interference factors such as background change and shielding exist, and the requirements of detection accuracy, robustness and real-time are difficult to meet at the same time.

Current human detection and tracking methods are implemented by rectangular search boxes. The disadvantages are as follows:

1. the search box evaluates the detection result through the IOU, and even if the search box accords with the IOU index, an interference image still exists;

2. currently, the detection target classification of the search box is limited to a large class, such as a human or an animal; and cannot be further distinguished for detail classification such as male and female or old and feeble;

3. when detecting a human body under a complex background, the human body is greatly influenced by surrounding environment; for example, when the color of clothes worn by pedestrians is similar to the color of the background or the change of the background light is large, it is difficult to divide the moving human body from the background;

4. when the 'shadow' and the 'mirror' exist in the scene, the complexity of the features in the search box is increased, and the detection of the search box is interfered, so that the false judgment that the figure in the mirror is a person 'or the shadow area is a person' can be caused; or moving objects in the scene, such as automobiles or rocking trees, fluctuating water surfaces, can also increase the complexity of features in the search box, increasing the detection difficulty.

In view of the above, there is a need for a target detection method that better eliminates interference to distinguish false targets and performs classification more carefully.

Disclosure of Invention

The invention provides a method for selecting a detection frame by using Mask R-CNN, an electronic device and a computer readable storage medium, wherein a rectangular frame and a polygon outline point set of a target are obtained mainly through an example segmentation technology, and the obtained rectangular frame is subjected to initial screening by using an IOU value; and secondly screening the polygon contour point set by using the IOU value, taking the rectangular frame conforming to the twice screening as a target detection frame, and continuing to perform target detection.

In order to achieve the above object, the present invention further provides a method for selecting a detection frame by using Mask R-CNN, which is applied to an electronic device, and the method includes:

s110, performing instance segmentation on the target image by using Mask R-CNN to obtain a rectangleCandidate detection frames and polygonal contours thereof; s120, respectively calculating IOU values of the candidate detection frame and the polygonal outline; when the IOU value of the candidate detection frame is greater than a first preset threshold value IOU ₁ And the IOU value of the polygonal outline is larger than a second preset threshold value IOU ₂ Screening the candidate detection frames to serve as target detection frames; wherein the second preset threshold IOU ₂ Greater than a first preset threshold IOU ₁ 。

Preferably, calculating the IOU value of the polygonal contour comprises calculating the IOU value of the polygonal contour through a two-dimensional array mapping coding method; mapping the polygonal outline and a prediction frame thereof onto a planar template which is segmented by line segment combinations in advance, wherein the line segment combinations segment the planar template into equally large segmentation blocks; mapping results of the polygonal outline and the prediction frame thereof are respectively corresponding to a binary image which is equal to the planar template in size, and each segmented block is expressed in a mapping coding (A, B) form of a two-dimensional array; wherein, the coding state of the polygon outline corresponding to the dividing block is assigned as A, and the coding state of the prediction frame corresponding to the dividing block is assigned as B; a=1 when the segment is inside the polygonal contour and a=0 when the segment is outside the polygonal contour; b=1 when the partition is located inside the prediction frame and b=0 when the partition is located outside the prediction frame.

Calculating an IOU value through counting codes of the segmentation blocks; where iou=number of partitions coded as (1, 1)/[ number of partitions coded as (1, 0) +number of partitions coded as (0, 1) +number of partitions coded as (1, 1) ].

Preferably, calculating the IOU value of the polygonal contour includes calculating the IOU value of the polygonal contour by an intersection area method; the intersection area method comprises the following steps: obtaining key points of the polygonal contour and the prediction frame thereof, and marking the key points, wherein the key points comprise vertexes of the polygonal contour and the prediction frame thereof and intersection points of the polygonal contour and the prediction frame thereof; forming a point set of an intersection polygon by ordering the intersection points and the points inside the intersection points; and calculating the areas of the polygonal outline and the prediction frame thereof and the area of the intersection polygon, and calculating the IOU value of the polygonal outline according to the areas of the polygonal outline and the prediction frame thereof and the area of the intersection polygon, wherein IOU=the area of the intersection polygon/(the area of the polygonal outline+the area of the prediction frame-the area of the intersection polygon).

Preferably, the first preset threshold value IOU ₁ The second preset threshold IOU ₂ The range of the values of the (C) is 0.5-0.7.

Preferably, after the screening of the candidate detection frame as the target detection frame, the method further includes: performing two-dimensional array mapping coding on all the screened candidate detection frames; performing coincidence ratio comparison on the encoded candidate detection frames; and when the coincidence ratio of the two candidate detection frames is larger than the coincidence threshold value, judging that the mirror image exists in the targets detected by the two candidate detection frames.

To achieve the above object, the present invention provides an electronic device including: the device comprises a memory and a processor, wherein the memory comprises a selection program of a detection frame, and the selection program of the detection frame realizes the following steps when being executed by the processor: s110, performing instance segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof; s120, respectively calculating IOU values of the candidate detection frame and the polygonal outline, and comparing the IOU values with preset thresholds of the candidate detection frame and the polygonal outline respectively; wherein the preset threshold of the candidate detection frame is IOU ₁ The preset threshold value of the polygonal outline is IOU ₂ ，IOU ₂ Greater than IOU ₁ The method comprises the steps of carrying out a first treatment on the surface of the S130, screening that the IOU value of the candidate detection frame is larger than the IOU ₁ And the IOU value of the polygonal outline is larger than that of the IOU ₂ Is used as the target detection frame. Preferably, calculating the IOU value of the polygonal contour comprises calculating the IOU value of the polygonal contour through a two-dimensional array mapping coding method; s210, mapping the polygonal outline and a prediction frame thereof onto a planar template which is segmented by line segment combinations in advance, wherein the line segment combinations segment the planar template into equally large segmentation blocks; s220, mapping the polygon outline and the predicted frame thereof to the corresponding valuesEach partition block is expressed as a mapping coding (A, B) form of a two-dimensional array on a binary diagram with the same size of the plane template; wherein, the coding state of the polygon outline corresponding to the dividing block is assigned as A, and the coding state of the prediction frame corresponding to the dividing block is assigned as B; a=1 when the segment is inside the polygonal contour and a=0 when the segment is outside the polygonal contour; b=1 when the partition is located inside the prediction frame and b=0 when the partition is located outside the prediction frame. S230, calculating an IOU value by counting codes of the segmentation blocks; where IOU = number of partitions encoded as (1, 1/[ number of partitions encoded as (1, 0) +number of partitions encoded as (0, 1) +number of partitions encoded as (1, 1) ]]. Preferably, the first preset threshold value IOU ₁ The second preset threshold IOU ₂ The range of the values of the (C) is 0.5-0.7. Preferably, after the screening of the candidate detection frame as the target detection frame, the method further includes: performing two-dimensional array mapping coding on all the screened candidate detection frames; performing coincidence ratio comparison on the encoded candidate detection frames; and when the coincidence ratio of the two candidate detection frames is larger than the coincidence threshold value, judging that the mirror image exists in the targets detected by the two candidate detection frames.

In addition, in order to achieve the above object, the present invention further provides a computer readable storage medium storing a computer program, the computer program including a selection program of a detection frame, the selection program of the detection frame, when executed by a processor, implementing the steps of the method for selecting a detection frame using Mask R-CNN.

The method, the electronic device and the computer readable storage medium for selecting the detection frame by using the Mask R-CNN, provided by the invention, utilize the operation method of the Mask R-CNN (Mask region-based Convolutional Neural Network) neural network to continuously roll and pool the monitoring image in the deep neural network, extract and process the key features of the image by using the neural network algorithm, and obtain the detection result and the category (namely, obtain the rectangular frame of the object in the image); performing initial screening on the IOU value of the overlapping part between the obtained rectangular frame and the real target; and then, further utilizing a polygonal point set (namely a polygonal outline obtained by example segmentation) obtained by the Mask, carrying out secondary screening on the IOU value of the polygon between the polygonal point set and the real target, and finally taking a frame which accords with a set threshold value as a detection frame. The beneficial effects are as follows:

(1) Obtaining a polygonal point set of the target through a Mask of the Mask R-CNN, and narrowing the pixel range (namely narrowing the bounding box range) on the basis of the rectangular candidate frame so as to realize finer target classification;

(2) According to the characteristics of the shadow, combining two-dimensional array codes to form an analysis method for judging whether the mirror image exists, so that the purpose of eliminating the false target of the shadow is achieved;

(3) The IOU of the polygonal outline is calculated by utilizing a two-dimensional array coding mode, so that the method is accurate and quick;

(4) And selecting the candidate frame, firstly performing initial screening on the IOU of the candidate frame, and then performing secondary screening on the IOU of the polygon point set, and further performing regression to obtain a more accurate target detection frame.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of a method for selecting a detection box using Mask R-CNN according to the present invention;

FIG. 2 is a flow chart of a method for calculating IOU values using a two-dimensional array mapping encoding method according to the present invention;

FIG. 3 is a schematic diagram of a two-dimensional array mapping encoding method according to a preferred embodiment of the present invention;

FIG. 4 is a schematic diagram of an electronic device according to a preferred embodiment of the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It should be noted that, herein, the words "first" and "second" are merely used to distinguish the same names, and do not imply a relationship or order between the names.

The object detection aims at identifying and locating objects of a specific category in a picture or video, and the detection process can be regarded as a classification process to distinguish objects from backgrounds. The selection of the detection frame during the detection process affects the interference elimination effect during the detection and the classification fineness during the detection.

The invention provides a method for selecting a detection frame by using Mask R-CNN. Referring to FIG. 1, a flow chart of a preferred embodiment of a method for selecting a detection box using Mask R-CNN according to the present invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.

The Mask R-CNN (Mask region-based Convolutional Neural Network) is used for predicting the type of the detection object in the image and finely adjusting the frame so as to divide the polygonal outline of the detection object; wherein the border (bounding box) is the smallest rectangular box that can contain a certain object of the image.

In this embodiment, the method for selecting a detection frame by using Mask R-CNN includes: step S110-step S130.

S110, performing instance segmentation on the target image by using Mask R-CNN to obtain rectangular candidate detection frames and polygonal outlines thereof.

The example segmentation of Mask R-CNN is divided into two steps: the first step is to select the position and the type of the selected candidate frame (i.e. predicting the category of the object in the image and fine-tuning the frame), and the selected candidate frame is rectangular; the second step is the action of segmentation into polygonal contours (obtained by Mask candidates).

S120, respectively calculating the candidate detection frame and the IOU value of the polygonal contour, wherein the IOU value of the polygonal contour is larger than a second preset threshold IOU ₂ Screening the candidate detection frames to serve as target detection frames; wherein the second preset threshold IOU ₂ Greater than a first preset threshold IOU ₁ The IOU (Intersection over Union, cross ratio) may be understood as the degree of coincidence between the prediction frame and the candidate detection frame.

In a specific embodiment, a first preset threshold IOU ₁ And a second preset threshold IOU ₂ Setting can be performed according to different scenes; in order to improve the detection precision of the rectangular detection frame, a second preset threshold value IOU is adopted ₂ Greater than a first preset threshold IOU ₁ 。

First matching the candidate detection frame with the prediction target, and screening the first matching result, that is, the IOU value of the candidate detection frame is larger than that of the IOU ₁ Is selected from the group consisting of a screening of the above.

Then, performing a second matching of the polygonal contour and the predicted target, and screening the result of the second matching, that is, performing an IOU value of the polygonal contour larger than the IOU ₂ Is selected from the group consisting of a screening of the above.

And the candidate detection frames after the twice screening are used as final target detection frames.

In a particular embodiment, a first preset threshold IOU ₁ And a second preset threshold IOU ₂ The range of the values of the (C) is 0.5-0.7.

In summary, the invention establishes a new judgment relationship with two parallel branch results without intersection by dividing the candidate detection frame and the polygon outline obtained by the Mask R-CNN example; performing IOU primary screening by utilizing a candidate detection frame, and performing IOU secondary screening by utilizing a polygonal contour; and further, a target detection frame with higher detection precision is obtained.

Referring to FIG. 2, a flow chart of a preferred embodiment of a method for calculating IOU values using a two-dimensional array mapping encoding method of the present invention is shown; FIG. 2 shows that the method for calculating the IOU value by using the two-dimensional array mapping encoding method comprises the following steps: S210-S230;

s210, mapping the polygonal outline and a prediction frame thereof onto a planar template which is segmented by line segment combinations in advance, wherein the line segment combinations segment the planar template into equally large segmentation blocks;

referring to FIG. 3, a schematic diagram of a two-dimensional array mapping encoding method according to a preferred embodiment of the present invention is shown; fig. 3 shows the encoding process of the two-dimensional array mapping encoding method.

The right side is an object for target detection, and the outer side is a polygonal outline; mapping the polygonal contour onto a binary image; as shown in fig. 3, the binary image is divided into equal-sized divided blocks by the selected segment combination, and the divided blocks in the binary image comprise a divided block coded as 1 and a divided block coded as 0.

S220, mapping results of the polygonal outline and the prediction frame thereof are respectively corresponding to a binary image which is equal to the planar template in size, and each segmented block is expressed in a mapping coding (A, B) form of a two-dimensional array; wherein, the coding state of the polygon outline corresponding to the dividing block is assigned as A, and the coding state of the prediction frame corresponding to the dividing block is assigned as B; a=1 when the segment is inside the polygonal contour and a=0 when the segment is outside the polygonal contour; b=1 when the partition is located inside the prediction frame and b=0 when the partition is located outside the prediction frame.

As shown in fig. 3, the right human-shaped contour is mapped to the left binary image, the segment is assigned 1 when the segment is located inside the polygonal contour, and the segment is assigned 0 when the segment is located outside the polygonal contour. The assigned binary diagram is shown in fig. 3.

Specifically, because there is a difference between the polygonal contour and the prediction frame of the polygonal contour, each of the divided blocks may be assigned differently when it corresponds to the polygonal contour and the prediction frame of the polygonal contour. If a segment is within both the polygonal contour and the prediction box of the polygonal contour, then the segment is encoded as (1, 1); if a segment is only within the polygonal contour and not within the prediction box of the polygonal contour, then the segment is encoded as (1, 0); if a segment is not within the polygonal contour, but only within the prediction box of the polygonal contour, the segment is encoded as (0, 1); if a segment is neither within the polygonal contour nor within the prediction box of the polygonal contour, then the segment is encoded as (0, 0). So, the coding of the partition block occurs in four coding cases (1, 1), (1, 0), (0, 1) and (0, 0) described above.

S230, calculating an IOU value by counting codes of the segmentation blocks; where iou=number of partitions coded as (1, 1)/[ number of partitions coded as (1, 0) +number of partitions coded as (0, 1) +number of partitions coded as (1, 1) ].

IOU = area of intersection polygon/(polygon outline area + prediction frame area-intersection polygon area);

thus, the area of the intersection polygon = the area of intersection between the polygon contour and its prediction box; union polygon area = polygon contour area + prediction frame area-intersection polygon area; the area of intersection between the polygonal contour and its prediction box, i.e., the area of all segments encoded as (1, 1); and the area of the union polygon is equal to the area of the segment encoded as (1, 0), the area of the segment encoded as (0, 1), and the area of the segment encoded as (1, 1); therefore, the area of the intersection polygon/the area of the union polygon=iou=the number of divided blocks encoded as (1, 1)/[ the number of divided blocks encoded as (1, 0+the number of divided blocks encoded as (0, 1) +the number of divided blocks encoded as (1, 1) ].

In a specific embodiment, when a "shadow" or a "mirror" exists in the detected scene, a detection frame is generated for the detected target and the "mirror image" (or shadow) of the target at the same time, which is very easy to cause misjudgment that two detected targets exist. Performing two-dimensional array mapping coding on all obtained candidate detection frames; performing coincidence ratio comparison on the encoded candidate detection frames; and when the coincidence ratio of the two candidate detection frames is larger than the coincidence threshold value, judging that the mirror image exists in the targets detected by the two candidate detection frames.

The coincidence threshold here is set to 75%; that is, if the code overlap ratio of the two candidate detection frames reaches 75%, it is determined that there is interference such as a mirror image or an image, thereby excluding the interference.

In a specific embodiment, calculating the IOU value of the polygonal contour includes calculating the IOU value of the polygonal contour by an intersection area method; the intersection area method comprises the following steps: s310, obtaining key points of the polygonal outline and the prediction frame thereof, and marking the key points, wherein the key points comprise vertexes of the polygonal outline and the prediction frame thereof and intersection points of the polygonal outline and the prediction frame thereof; s320, forming a point set of an intersection polygon by ordering the intersection points and points inside the intersection points; s330, calculating the areas of the polygonal outline and the prediction frame thereof and the areas of the intersecting polygons, and calculating the IOU value of the polygonal outline according to the areas of the polygonal outline and the prediction frame thereof and the areas of the intersecting polygons, wherein IOU=the area of the intersecting polygons/(the area of the polygonal outline+the area of the prediction frame-the area of the intersecting polygons).

The invention discloses a neural network structure for improving detection accuracy of a rectangular detection frame based on Mask R-CNN, which comprises the following steps:

in general, mask R-CNN is to divide the target pixels while realizing target detection; in other words, a Mask branch network is added to the basic frame recognition architecture, wherein the Mask branch network is used for dividing target pixels, so as to obtain a polygonal outline point set of the target.

The CNN convolution layer is followed by the RoI alignment layer, followed by the masking layer, classifier, and RoI border correction training (fully connected layer). Wherein Mask R-CNN inherits the RPN portion of Faster R-CNN.

The process of executing the task comprises the following steps: features are extracted for the detection target image by using the shared convolution layer, and then the obtained feature maps are sent to the RPN, and the RPN generates a frame to be detected (designating the position of the RoI) and carries out first correction on the surrounding frame of the RoI. Then, fast R-CNN is constructed, roIALign selects the corresponding feature of each RoI on the feature map according to the output of RPN, and the dimension is set to be a fixed value. Finally, classifying the frames by using a full connection Layer (FC Layer), and performing second correction of the target bounding frame; and finally obtaining a candidate detection frame (box regression) and classification (classification).

The other branch is a head part, and the Mask R-CNN finally expands the output dimension of RoIAlign to predict a Mask; that is, the result obtained by Mask branch is the point set of the polygonal contour.

For Mask R-CNN, the predictive Mask and classification (and candidate detection boxes) are each training parameters. Before Mask R-CNN model training, setting the super parameters of the Mask R-CNN model as parameter values of a FAster R-CNN model, and pre-training the super parameters by utilizing ResNet50, resNet101 and FPN networks; and training the Mask R-CNN model by further utilizing a large number of samples to obtain the Mask R-CNN model. After training to obtain a Mask R-CNN model, testing the Mask R-CNN model by using a test sample to verify the accuracy of the Mask R-CNN model.

In a specific embodiment, the trained dataset is COCO train 35k with 80 object categories and 150 ten thousand object instances.

In a specific embodiment, the result obtained by detecting the trained Mask R-CNN model is stored in a distributed database, so that the trained Mask R-CNN model is updated by using the distributed database.

In summary, the input image is a multi-angle image of the target, and a sample library is formed; and (3) sending the sample into a Mask R-CNN detection and identification model for training, extracting image features in a convolution layer, and finally obtaining a quasi-removed target classification frame, a corresponding target state and a polygonal point set for example segmentation.

The invention provides a method for selecting a detection frame by using Mask R-CNN, which is applied to an electronic device 4. Referring to FIG. 4, a schematic view of an application environment of a preferred embodiment of a method for selecting a detection frame using Mask R-CNN according to the present invention is shown.

In this embodiment, the electronic device 4 may be a terminal device with an operation function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.

The electronic device 4 includes: a processor 42, a memory 41, a communication bus 43 and a network interface 44.

The memory 41 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as flash memory, a hard disk, a multimedia card, a card memory 41, etc. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 4, such as a hard disk of the electronic device 4. In other embodiments, the readable storage medium may also be an external memory 41 of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 4.

In the present embodiment, the readable storage medium of the memory 41 is generally used to store the selection program 40 or the like installed in the detection frame of the electronic device 4. The memory 41 may also be used for temporarily storing data that has been output or is to be output.

The processor 42 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for running program code or processing data stored in the memory 41, e.g. executing the selection program 40 of the detection block, etc.

The communication bus 43 is used to enable connection communication between these components.

The network interface 44 may optionally comprise a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication connection between the electronic apparatus 4 and other electronic devices.

Fig. 4 shows only an electronic device 4 having components 41-44, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.

Optionally, the electronic device 4 may further comprise a user interface, which may comprise an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone or the like with voice recognition function, a voice output device such as a sound box, a headset or the like, and optionally a standard wired interface, a wireless interface.

Optionally, the electronic device 4 may also comprise a display, which may also be referred to as a display screen or display unit. In some embodiments, the display may be an LED display, a liquid crystal display, a touch-control liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device 4 and for displaying a visualized user interface.

Optionally, the electronic device 4 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, etc., which are not described herein.

In the embodiment of the apparatus shown in fig. 4, the memory 41, which is a computer storage medium, may include an operating system, and a selection program 40 of a detection frame; the processor 42, when executing the selection program 40 of the detection frame stored in the memory 41, implements the following steps: s110, performing instance segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal contour corresponding to the candidate detection frame; s120, respectively calculating the candidate detection frame and the IOU value of the polygonal contour, wherein the IOU value of the polygonal contour is larger than a second preset threshold IOU ₂ Screening the candidate detection frames to serve as target detection frames; wherein the second preset threshold IOU ₂ Greater than a first preset threshold IOU ₁ 。

In other embodiments, the selection procedure 40 of the detection frame may also be divided into one or more modules, one or more modules being stored in the memory 41 and executed by the processor 42 to complete the present invention. The invention may refer to a series of computer program instruction segments capable of performing a specified function.

In addition, an embodiment of the present invention also proposes a computer-readable storage medium, in which a selection program of a detection frame is included, the selection program of the detection frame implementing the following operations when executed by a processor: s110, performing instance segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof; s120, respectively calculating IOU values of the candidate detection frame and the polygonal outline; when the IOU value of the candidate detection frame is greater than a first preset threshold value IOU ₁ And the IOU value of the polygonal outline is larger than a second preset threshold value IOU ₂ Screening the candidate detection frames to serve as target detection frames; wherein the second preset threshold IOU ₂ Greater than a first preset threshold IOU ₁ 。

The embodiment of the computer readable storage medium of the present invention is substantially the same as the embodiment of the method and the electronic device for selecting a detection frame by using Mask R-CNN, and will not be described herein.

In general, according to the operation method using the Mask R-CNN neural network, the monitoring image is continuously rolled and pooled in the deep neural network, and key features of the image are extracted and processed by using a neural network algorithm to obtain a rectangular frame of an object in the image; performing initial screening on the IOU value of the overlapping part between the obtained rectangular frame and the real target; and then, performing secondary screening on the IOU value of the polygon between the polygon point set and the real target by further utilizing the polygon outline obtained by the Mask, and finally taking a frame which accords with the set threshold value as a detection frame.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for selecting a detection frame by using Mask R-CNN, which is applied to an electronic device, and is characterized in that the method comprises the following steps:

performing instance segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal contour corresponding to the candidate detection frame;

respectively calculating IOU values of the candidate detection frame and the polygonal outline; when the IOU value of the candidate detection frame is greater than a first preset threshold value IOU ₁ And the IOU value of the polygonal outline is larger than a second preset threshold value IOU ₂ Screening the candidate detection frames to serve as target detection frames; wherein the second preset threshold IOU ₂ Greater than a first preset threshold IOU ₁ ；

Calculating the IOU value of the polygonal contour comprises calculating the IOU value of the polygonal contour through a two-dimensional array mapping coding method;

the two-dimensional array mapping coding method comprises the following steps:

mapping the polygonal outline and a prediction frame thereof onto a planar template which is segmented by line segment combinations in advance, wherein the line segment combinations segment the planar template into equally large segmentation blocks;

mapping results of the polygonal outline and the prediction frame thereof are respectively corresponding to a binary image which is equal to the planar template in size, and each segmented block is expressed in a mapping coding (A, B) form of a two-dimensional array; wherein, the coding state of the polygon outline corresponding to the dividing block is assigned as A, and the coding state of the prediction frame corresponding to the dividing block is assigned as B;

a=1 when the segment is inside the polygonal contour and a=0 when the segment is outside the polygonal contour; b=1 when the partition is located inside the prediction frame, and b=0 when the partition is located outside the prediction frame;

2. The method for selecting a detection box using Mask R-CNN according to claim 1, wherein the step of calculating the IOU value for the polygonal contour is replaced by a method comprising calculating the IOU value for the polygonal contour by a cross-union area method;

the intersection area method comprises the following steps:

obtaining key points of the polygonal contour and the prediction frame thereof, and marking the key points, wherein the key points comprise vertexes of the polygonal contour and the prediction frame thereof and intersection points of the polygonal contour and the prediction frame thereof;

forming a point set of an intersection polygon by ordering the intersection points and the points inside the intersection points;

and calculating the areas of the polygonal outline and the prediction frame thereof and the area of the intersection polygon, and calculating the IOU value of the polygonal outline according to the areas of the polygonal outline and the prediction frame thereof and the area of the intersection polygon, wherein IOU=the area of the intersection polygon/(the area of the polygonal outline+the area of the prediction frame-the area of the intersection polygon).

3. The method for selecting a detection frame using Mask R-CNN according to claim 1, wherein the first preset threshold IOU ₁ The second preset threshold IOU ₂ The range of the values of the (C) is 0.5-0.7.

4. The method for selecting a detection frame using Mask R-CNN according to claim 1, further comprising, after said screening the candidate detection frame as a target detection frame:

performing two-dimensional array mapping coding on all the screened candidate detection frames;

performing coincidence ratio comparison on the encoded candidate detection frames;

and when the coincidence ratio of the two candidate detection frames is larger than the coincidence threshold value, judging that the mirror image exists in the targets detected by the two candidate detection frames.

5. An electronic device, comprising: the device comprises a memory and a processor, wherein the memory comprises a selection program of a detection frame, and the selection program of the detection frame realizes the following steps when being executed by the processor:

the two-dimensional array mapping coding method comprises the following steps:

6. The electronic device of claim 5, wherein the electronic device comprises a plurality of electronic devices,

the first preset threshold IOU ₁ The second preset threshold IOU ₂ The range of the values of the (C) is 0.5-0.7.

7. The electronic device of claim 5, further comprising, after said screening said candidate detection frames as target detection frames:

8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising a selection program of detection boxes, which, when executed by a processor, implements the steps of the method of selecting detection boxes with Mask R-CNN according to any one of claims 1 to 4.