CN110738125A - Method, device and storage medium for selecting detection frame by using Mask R-CNN - Google Patents

Method, device and storage medium for selecting detection frame by using Mask R-CNN Download PDF

Info

Publication number
CN110738125A
CN110738125A CN201910885674.7A CN201910885674A CN110738125A CN 110738125 A CN110738125 A CN 110738125A CN 201910885674 A CN201910885674 A CN 201910885674A CN 110738125 A CN110738125 A CN 110738125A
Authority
CN
China
Prior art keywords
iou
frame
polygon
candidate detection
outline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910885674.7A
Other languages
Chinese (zh)
Other versions
CN110738125B (en
Inventor
陈欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910885674.7A priority Critical patent/CN110738125B/en
Priority to PCT/CN2019/118279 priority patent/WO2021051601A1/en
Publication of CN110738125A publication Critical patent/CN110738125A/en
Application granted granted Critical
Publication of CN110738125B publication Critical patent/CN110738125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention relates to the technical field of image recognition, and provides a method, a device and a storage medium for selecting a detection frame by using Mask R-CNN, wherein the method comprises the following steps: performing example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline corresponding to the candidate detection frame; calculating the candidate detection box and the polygon respectivelyThe IOU value of the contour is larger than preset threshold IOU1And the IOU value of the polygonal contour is larger than a second preset threshold IOU2Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU2Greater than predetermined threshold IOU1. According to the invention, the detection precision of the detection frame is improved through the IOU secondary screening of the polygonal outline.

Description

Method, device and storage medium for selecting detection frame by using Mask R-CNN
Technical Field
The invention relates to the technical field of image recognition, in particular to methods and devices for selecting a detection frame by using Mask R-CNN and a storage medium.
Background
The video-based motion human body detection and tracking is applied to monitoring of dense places such as banks and railway stations with high safety requirements by , and the human body tracking of real-time scenes is complex, has other interference factors such as background change and shielding, and is difficult to meet the requirements of detection accuracy, robustness and real-time performance.
The current human detection and tracking method is realized by a rectangular search box. The method has the following disadvantages:
1. the search box evaluates the detection result through the IOU, and even if the search box accords with the IOU index, an interference image still exists;
2. the detection target classification of the search box is limited to a large class such as human or animal at present, and the classification of details such as male and female or old and young cannot be further distinguished by ;
3. when a human body is detected under a complex background, the human body is greatly influenced by the surrounding environment; for example, when the color of the clothes worn by pedestrians is similar to the background color or the background light changes greatly, the moving human body is difficult to be separated from the background;
4. when a shadow or a mirror exists in a scene, the complexity of the features in the search frame is increased, the detection of the search frame is interfered, and the misjudgment that the portrait in the mirror is a person or the shadow area is a person is caused; or moving objects, such as cars or swinging trees, fluctuating water levels, may exist in the scene, which also increases the complexity of the features in the search box and increases the detection difficulty.
In view of the above problems, there is a need for methods of detecting targets that better eliminate the interference and distinguish false targets and make classification more detailed.
Disclosure of Invention
The invention provides methods for selecting detection frames by using Mask R-CNN, an electronic device and a computer readable storage medium, which mainly obtain a rectangular frame and a polygonal outline point set of a target by using an example segmentation technology, primarily screen the obtained rectangular frame by an IOU value, secondarily screen the polygonal outline point set by the IOU value, and continue target detection by using the rectangular frame which meets the two screens as a target detection frame.
In order to achieve the above object, the present invention further provides methods for selecting a detection box by using Mask R-CNN, which are applied to an electronic device, and the methods include:
s110, carrying out example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof, S120, respectively calculating IOU values of the candidate detection frame and the polygonal outline, and when the IOU value of the candidate detection frame is larger than , presetting a threshold IOU1And the IOU value of the polygonal contour is larger than a second preset threshold IOU2Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU2Greater than predetermined threshold IOU1
Preferably, the calculating the IOU value of the polygon contour includes calculating the IOU value of the polygon contour by a two-dimensional array mapping coding method, mapping the polygon contour and its prediction frame onto plane templates divided by line segment combinations in advance, respectively, wherein the line segment combinations divide the plane templates into equal-sized blocks, mapping the polygon contour and its prediction frame onto a two-dimensional map equal to the plane templates, and representing each block as a two-dimensional array mapping coding (a, B) format, wherein a is assigned to the coding state of the polygon contour corresponding to a block, and B is assigned to the coding state of the prediction frame corresponding to a block, a is 1 when the block is located within the polygon contour, and a is 0 when the block is located outside the polygon contour, and B is 0 when the block is located outside the prediction frame.
Calculating an IOU value by counting codes of the segmentation blocks; here, IOU is the number of partitions coded as (1, 1/[ the number of partitions coded as (1, 0) + the number of partitions coded as (0, 1) + the number of partitions coded as (1, 1) ].
Preferably, the calculating the IOU value of the polygon outline comprises calculating the IOU value of the polygon outline by an intersection area method; the intersection set area method comprises the following steps: obtaining key points of the polygon outline and a prediction frame thereof, and labeling the key points, wherein the key points comprise each vertex of the polygon outline and the prediction frame thereof and each intersection point of the polygon outline and the prediction frame thereof; the intersection points and points inside the intersection points form a point set of an intersection polygon through sequencing; and calculating the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, and calculating the IOU value of the polygon outline according to the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, wherein the IOU is the area of the intersection polygon/(the area of the polygon outline + the area of the prediction frame-the area of the intersection polygon).
Preferably, the th preset threshold IOU1And the second preset threshold IOU2The value ranges of (A) are all 0.5-0.7.
Preferably, after the screening out the candidate detection frame as the target detection frame, the method further includes: carrying out two-dimensional array mapping coding on all the screened candidate detection frames; carrying out coincidence degree comparison on the coded candidate detection frames; and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.
In order to achieve the above object, the invention provides electronic devices, which include a memory and a processor, wherein the memory includes a selection program of a detection frame, and the selection program of the detection frame is executed by the processor to implement the following steps of S110, performing example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof, S120, respectively calculating IOU values of the candidate detection frame and the polygonal outline and comparing the IOU values with preset thresholds thereof, wherein the preset threshold of the candidate detection frame is IOU1The preset threshold value of the polygonal contour is IOU2,IOU2Greater than IOU1(ii) a S130, screening the IOU value of the candidate detection box to be larger than the IOU1And the IOU value of the polygon outline is larger than the IOU2Preferably, the calculating of the IOU value of the polygon contour includes calculating the IOU value of the polygon contour by a two-dimensional array mapping coding method, S210 mapping the polygon contour and its prediction frame onto plane templates divided by line segment combinations in advance, wherein the line segment combinations divide the plane templates into equal-sized blocks, S220 mapping the polygon contour and its prediction frame onto equal-sized binary maps with the plane templates, and representing each block as a two-dimensional array of mapping codes (a, B), wherein a block is assigned to the polygon contour in a coding state and a block is assigned to the prediction frame in a coding state, a is 1 when the block is located inside the polygon contour, a is 0 when the block is located outside the polygon contour, B is 1 when the block is located inside the prediction frame, B is 0 when the block is located outside the prediction frame, and the number of blocks is 0, S230 is 0, the number of blocks is 1, and the number of blocks (i.e., + 1) is 1, the number of coding blocks is 1, wherein the number of blocks is 1 (i 1, the number of blocks is 1, 1 + the number of coding blocks (i 1, i.]Preferably, the th preset threshold IOU1And the second preset threshold IOU2The value ranges of (A) are all 0.5-0.7. Preferably, after the screening out the candidate detection frame as the target detection frame, the method further includes: carrying out two-dimensional array mapping coding on all the screened candidate detection frames; carrying out coincidence degree comparison on the coded candidate detection frames; and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.
In addition, to achieve the above object, the present invention further provides computer-readable storage media, which store a computer program including a detection box selection program, wherein the detection box selection program, when executed by a processor, implements the steps of the above method for selecting a detection box using Mask R-CNN.
The invention provides a method for selecting a detection box by using Mask R-CNN (Mask R-CNN), an electronic device and a computer readable storage medium, wherein a monitoring image is continuously curled and pooled in a deep Neural Network by using an operation method of a Mask R-CNN (Mask region-based connected Neural Network) Neural Network, key features of the image are extracted and processed by using a Neural Network algorithm to obtain a detection result and a class (namely a rectangular frame of an object in the image), an overlapped part between the obtained rectangular frame and a real target is subjected to IOU value primary screening, then steps are carried out to obtain a polygon point set (namely a polygon outline obtained by example segmentation) by using the Mask, polygons between the polygon point set and the real target are subjected to secondary screening of the IOU value, and a frame which finally meets a set threshold value is taken as a detection box, and the beneficial effects are as follows:
(1) obtaining a polygon point set of a target through a Mask of a Mask R-CNN, and reducing a pixel range (namely reducing a bounding box range) on the basis of a rectangular candidate frame, thereby realizing more detailed target classification;
(2) according to the characteristics of the shadow, analysis methods for judging whether the mirror image exists are formed by combining with two-dimensional array coding, so that the aim of eliminating the false target of the shadow is fulfilled;
(3) the IOU of the polygonal contour is calculated in a two-dimensional group coding mode, so that the method is accurate and fast;
(4) and selecting the candidate frame, firstly carrying out primary screening on the IOU of the candidate frame, then carrying out secondary screening on the IOU of the polygonal point set, and further carrying out regression to obtain a more accurate target detection frame.
Drawings
FIG. 1 is a flowchart of a preferred embodiment of a method for selecting a test box using Mask R-CNN according to the present invention;
FIG. 2 is a flowchart of a preferred embodiment of the method for calculating IOU value by using two-dimensional array mapping coding method according to the present invention;
FIG. 3 is a diagram illustrating a two-dimensional array mapping encoding method according to a preferred embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an electronic device according to a preferred embodiment of the invention;
the objects, features, and advantages of the present invention are further described in with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that the words "", "second" are used herein only to distinguish the same names from each other, and do not imply a relationship or order between the names.
The purpose of target detection is to identify and locate objects of a specific class in a picture or video, and the detection process can be regarded as classification processes to distinguish a target from a background.
The invention provides methods for selecting a detection box by using Mask R-CNN, and referring to FIG. 1, the method can be executed by devices, and the devices can be realized by software and/or hardware.
Wherein, Mask R-CNN (Mask region-based connected Neural Network) is used for predicting the type of the detection object in the image, finely adjusting the frame and further segmenting the Mask of the polygon outline of the detection object; a bounding box (bounding box) is a smallest rectangular box that can contain a certain object of an image.
In this embodiment, the method for selecting a detection box by using Mask R-CNN includes: step S110-step S130.
S110, carrying out example segmentation on the target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof.
The Mask R-CNN example segmentation is divided into two steps, the action of step is to select the position and type of the selected candidate frame (i.e. to predict the class of the image object and to refine the frame), the selected candidate frame is a rectangle, and the action of step two is to segment the selected candidate frame into a polygon outline (obtained by Mask bridge).
S120, calculating the IOU values of the candidate detection frame and the polygon outline respectively, wherein the IOU value of the polygon outline is larger than a second preset threshold IOU2Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU2Greater than predetermined threshold IOU1It should be noted that, the IOU (Intersection over Union), which is an Intersection ratio, may be understood as a degree of coincidence between a prediction box and a candidate detection box.
In specific embodiments, the preset threshold IOU1And a second preset threshold IOU2The setting can be carried out according to different scenes; moreover, in order to improve the detection precision of the rectangular detection frame, a second preset threshold IOU is set2Greater than predetermined threshold IOU1
Firstly, th matching of the candidate detection box and the prediction target is carried out, and th matching result is screened, that is, the IOU value of the candidate detection box is larger than that of the candidate detection box1Screening.
Then, carrying out the second matching of the polygon outline and the predicted target, and screening the second matching result, that is, carrying out the second matching of the polygon outline, wherein the IOU value of the polygon outline is larger than the IOU value2Screening.
And taking the candidate detection frame after the two-time screening as a final target detection frame.
In a specific embodiment, the th preset threshold IOU1And a second preset threshold IOU2The value ranges of (A) are all 0.5-0.7.
In summary, the two branch results of the candidate detection frame and the polygonal outline obtained by dividing the Mask R-CNN example establish a new judgment relationship for the two parallel non-intersection branch results; carrying out IOU primary screening by using the candidate detection frame, and carrying out IOU secondary screening by using the polygonal outline; and further, the target detection frame with higher detection precision is obtained.
Referring to FIG. 2, a flow chart of a preferred embodiment of the method for calculating the IOU value using the two-dimensional array mapping coding method of the present invention is shown; fig. 2 shows that the method for calculating the IOU value using the two-dimensional array mapping encoding method includes the steps of: S210-S230;
s210, respectively mapping the polygonal outline and a prediction frame thereof to plane templates which are divided by line segment combinations in advance, wherein the line segment combinations divide the plane templates into equal-size division blocks;
referring to FIG. 3, a diagram of a two-dimensional array mapping coding method according to a preferred embodiment of the present invention is shown; fig. 3 shows an encoding process of the two-dimensional array mapping encoding method.
The right side is an object for target detection, and the outer side of the object is a polygonal outline; mapping the polygon outline to a binary image; as shown in fig. 3, the binary map is segmented into equal-sized blocks by segment selection and combination, and the blocks in the binary map include a block coded as 1 and a block coded as 0.
S220, respectively corresponding the mapping results of the polygonal outline and the prediction frame thereof to a binary image which is as large as the plane template, and representing each partition block into a mapping coding (A, B) form of a two-dimensional array; the coding state of the polygonal contour corresponding to the partition block is assigned to be A, and the coding state of the prediction frame corresponding to the partition block is assigned to be B; when the segmentation block is positioned in the polygonal contour, A is 1, and when the segmentation block is positioned outside the polygonal contour, A is 0; b is 1 when the partition block is located inside the prediction block and B is 0 when the partition block is located outside the prediction block.
As shown in fig. 3, the right-side human-shaped contour is mapped to the left-side binary map, and the segment is assigned to 1 when the segment is located inside the polygon contour and assigned to 0 when the segment is located outside the polygon contour. The assigned binary map is shown in fig. 3.
Specifically, each partition block may be assigned a different value when it corresponds to the polygon contour and the prediction frame corresponding to the polygon contour, if partition blocks are within both the polygon contour and the prediction frame of the polygon contour, the partition block is coded as (1, 1), if partition blocks are only within the polygon contour and not within the prediction frame of the polygon contour, the partition block is coded as (1, 0), if partition blocks are not within the polygon contour and only within the prediction frame of the polygon contour, the partition block is coded as (0, 1), and if partition blocks are not within both the polygon contour and the prediction frame of the polygon contour, the partition block is coded as (0, 0), so that the four coding cases of (1, 1), (1, 0), (0, 1), and (0, 0) occur in the coding of the partition block.
S230, calculating an IOU value through counting codes of the partition blocks; here, IOU is the number of partitions coded as (1, 1/[ the number of partitions coded as (1, 0) + the number of partitions coded as (0, 1) + the number of partitions coded as (1, 1) ].
IOU ═ area of intersecting polygons/(polygon outline area + prediction box area-intersecting polygon area);
therefore, the area of the intersection polygon is the area of intersection between the polygon outline and its prediction box; the area of the union polygon is equal to the area of the polygon outline, the area of the prediction frame and the area of the intersection polygon; the area of intersection between the polygon outline and its prediction box, i.e. the area of all the partitions coded as (1, 1); and the area of the union polygon is equal to the area of the partition block coded as (1, 0) + the area of the partition block coded as (0, 1) + the area of the partition block coded as (1, 1); therefore, the area of the intersection polygon/the area of the union polygon is IOU, which is the number of the partition blocks coded as (1, 1)/p [ the number of the partition blocks coded as (1, 0) + the number of the partition blocks coded as (0, 1) + the number of the partition blocks coded as (1, 1) ].
In a specific embodiment, when a "shadow" or a "mirror" exists in a detected scene, a detection frame is generated for a detected target and a "mirror" (or a "shadow") of the target at the same time, which is very easy to cause misjudgment that two detected targets exist. Carrying out two-dimensional array mapping coding on all the obtained candidate detection frames; carrying out coincidence degree comparison on the coded candidate detection frames; and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.
The coincidence threshold here is set to 75%; that is, if the code overlap ratio of the two candidate detection frames reaches 75%, it is determined that there is interference such as mirroring or video, and the interference is eliminated.
In embodiments, the calculating the IOU value of the polygon outline includes calculating the IOU value of the polygon outline by an intersection area method, where the intersection area method includes S310, obtaining key points of the polygon outline and its prediction frame, and labeling the key points, where the key points include vertices of the polygon outline and its prediction frame and intersections of the polygon outline and its prediction frame, S320, forming a point set of an intersection polygon by sorting the intersections and points inside the intersections, and S330, calculating the areas of the polygon outline and its prediction frame and the area of the intersection polygon, and calculating the IOU value of the polygon outline according to the areas of the polygon outline and its prediction frame and the area of the intersection polygon, and IOU is the area of the intersection polygon/(polygon outline area + prediction frame area-polygon area).
The neural network structures for improving the detection precision of the rectangular detection frame based on Mask R-CNN comprise:
the Mask R-CNN generally divides the target pixel while realizing the target detection; in other words, a Mask branch network is added to the basic frame recognition architecture, wherein the Mask branch network is used for segmenting the target pixels, so as to obtain the target polygon outline point set.
CNN convolutional layers are followed by the RoI Align layers, followed by the mask layer, classifier, and RoI border correction training (fully connected layers). Wherein the Mask R-CNN inherits the RPN portion of the Faster R-CNN.
The process of executing the task comprises the steps of extracting features for a detected target image by using a shared convolutional Layer, then sending obtained feature maps into an RPN (resilient packet network), generating a frame to be detected (the position of RoI is designated) by the RPN, and conducting -time correction on a bounding box of the RoI, then constructing Fast R-CNN, selecting features corresponding to each RoI on the feature maps by the RoIAlign according to the output of the RPN, setting the dimension as a fixed value, finally classifying the frames by using a full connection Layer (FC Layer), conducting second correction on the target bounding box, and finally obtaining a candidate detection frame (box regression) and classification (classification).
And branches are head parts, the Mask R-CNN finally expands the output dimension of the RoIAlign to predict masks, namely, the result obtained by the Mask branch is a point set of the polygonal outline.
The method comprises the steps of predicting a Mask R-CNN, setting a hyperparameter of the Mask R-CNN model as a parameter value of a FAster R-CNN model before training the Mask R-CNN model, pre-training the hyperparameter by using ResNet50, ResNet101 and an FPN network, training the Mask R-CNN model by using a large number of samples to obtain the Mask R-CNN model, testing the Mask R-CNN model by using test samples after obtaining the Mask R-CNN model by training, and verifying the accuracy of the Mask R-CNN model.
In specific examples, the trained dataset was COCO train val35k with 80 object classes and 150 ten thousand object instances.
In specific embodiments, the result obtained by the detection of the trained Mask R-CNN model is stored in a distributed database, so that the trained Mask R-CNN model is updated by using the distributed database.
In conclusion, the input image is the target multi-angle image, and a sample library is formed; and (3) sending the sample into a Mask R-CNN detection and recognition model for training, extracting image characteristics from the convolutional layer, and finally obtaining a quasi-removed target classification frame, a corresponding target state and a polygon point set segmented by an example.
The invention provides methods for selecting a detection frame by using Mask R-CNN, which are applied to electronic devices 4. referring to FIG. 4, the invention is a schematic diagram of an application environment of a preferred embodiment of the method for selecting a detection frame by using Mask R-CNN.
In the present embodiment, the electronic device 4 may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, or a desktop computer.
The electronic device 4 includes: a processor 42, a memory 41, a communication bus 43, and a network interface 44.
The memory 41 includes at least types of readable storage Media, the at least types of readable storage Media can be non-volatile storage Media such as Flash memory, a hard disk, a multi-Media Card, a Card type memory 41, etc. in embodiments, the readable storage Media can be an internal storage unit of the electronic device 4, such as a hard disk of the electronic device 4. in further embodiments, the readable storage Media can also be an external memory 41 of the electronic device 4, such as a plug-in hard disk provided on the electronic device 4, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card), etc.
In the present embodiment, the readable storage medium of the memory 41 is generally used for storing the selection program 40 and the like installed in the detection frame of the electronic device 4. The memory 41 may also be used to temporarily store data that has been output or is to be output.
Processor 42, which in embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, is configured to run program code stored in memory 41 or to process data, such as to execute a check box selector 40.
The communication bus 43 is used to realize connection communication between these components.
The network interface 44 may optionally include a standard wired interface, a wireless interface (e.g., a WI-FI interface), and is typically used to establish a communication link between the electronic apparatus 4 and other electronic devices.
Fig. 4 only shows the electronic device 4 with components 41-44, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.
Optionally, the electronic device 4 may further include a user interface, which may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other equipment with voice recognition function, a voice output device such as a sound box, a headset, etc., and optionally may also include a standard wired interface or a wireless interface.
embodiments may include an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch screen, etc. the display is used for displaying information processed in the electronic device 4 and displaying a visual user interface.
Optionally, the electronic device 4 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.
In the embodiment of the apparatus shown in fig. 4, the memory 41 as computer storage media may include an operating system and a selection program 40 of a detection frame, and the processor 42, when executing the selection program 40 of the detection frame stored in the memory 41, implements the following steps, S110, performing instance segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal contour corresponding to the candidate detection frame, S120, calculating IOU values of the candidate detection frame and the polygonal contour respectively, and the IOU value of the polygonal contour is greater than a second preset threshold IOU2Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU2Greater than predetermined threshold IOU1
In other embodiments, the selection routine 40 of the detection block may be further divided into or more modules, or more modules being stored in the memory 41 and executed by the processor 42 to implement the present invention.
Furthermore, an embodiment of the present invention further provides computer-readable storage media, where the computer-readable storage media include a selection program of a detection box, and the selection program of the detection box is executed by a processor in real timeThe method comprises the following operations of S110, carrying out example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline thereof, S120, respectively calculating IOU values of the candidate detection frame and the polygonal outline, and when the IOU value of the candidate detection frame is larger than preset threshold IOU1And the IOU value of the polygonal contour is larger than a second preset threshold IOU2Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU2Greater than predetermined threshold IOU1
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the above-mentioned method and electronic device for selecting a detection frame by using Mask R-CNN, and will not be described herein again.
In summary, the operation method using Mask R-CNN neural network of the invention comprises the steps of continuously rolling and pooling the monitored image in the deep neural network, extracting and processing the key features of the image by using neural network algorithm to obtain the rectangular frame of the object in the image, primarily screening the IOU value of the overlapping part between the rectangular frame and the real target, then secondarily screening the IOU value of the polygon between the polygon point set and the real target by using the polygon outline obtained by Mask, and finally using the frame conforming to the set threshold as the detection frame.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises an series of elements does not include only those elements, but may include other elements not expressly listed or inherent to such process, apparatus, article, or method.
Based on the understanding that the technical solution of the present invention per se or a part contributing to the prior art can be embodied in the form of a software product stored in storage media (such as ROM/RAM, magnetic disk, optical disk) as described above, and includes several instructions for causing terminal devices (such as mobile phone, computer, server, or network device) to execute the methods described in the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1, A method for selecting detection box by Mask R-CNN, applied to electronic device, the method includes:
performing example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline corresponding to the candidate detection frame;
respectively calculating the IOU values of the candidate detection frame and the polygon outline, and when the IOU value of the candidate detection frame is greater than preset threshold IOU1And the IOU value of the polygonal contour is larger than a second preset threshold IOU2Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU2Greater than predetermined threshold IOU1
2. The method of claim 1, wherein the calculating the IOU value of the polygon outline comprises calculating the IOU value of the polygon outline by a two-dimensional array mapping encoding method;
the two-dimensional array mapping coding method comprises the following steps:
mapping the polygonal outline and a prediction frame thereof to plane templates which are divided by line segment combinations in advance, wherein the line segment combinations divide the plane templates into equally large division blocks;
respectively corresponding the mapping results of the polygonal contour and the prediction frame thereof to a binary image which is as large as the plane template, and representing each partition block as a mapping coding (A, B) form of a two-dimensional array; the coding state of the polygonal contour corresponding to the partition block is assigned to be A, and the coding state of the prediction frame corresponding to the partition block is assigned to be B;
when the segmentation block is positioned in the polygonal contour, A is 1, and when the segmentation block is positioned outside the polygonal contour, A is 0; when the partition block is located inside the prediction box, B is 1, and when the partition block is located outside the prediction box, B is 0;
calculating an IOU value by counting codes of the segmentation blocks; here, IOU is the number of partitions coded as (1, 1/[ the number of partitions coded as (1, 0) + the number of partitions coded as (0, 1) + the number of partitions coded as (1, 1) ].
3. The method of claim 1, wherein the calculating the IOU value of the polygon outline comprises calculating the IOU value of the polygon outline by an intersection area method;
the intersection set area method comprises the following steps:
obtaining key points of the polygon outline and a prediction frame thereof, and labeling the key points, wherein the key points comprise each vertex of the polygon outline and the prediction frame thereof and each intersection point of the polygon outline and the prediction frame thereof;
the intersection points and points inside the intersection points form a point set of an intersection polygon through sequencing;
and calculating the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, and calculating the IOU value of the polygon outline according to the area of the polygon outline and the prediction frame thereof and the area of the intersection polygon, wherein the IOU is the area of the intersection polygon/(the area of the polygon outline + the area of the prediction frame-the area of the intersection polygon).
4. The method for selecting a detection box using Mask R-CNN as claimed in claim 1, wherein the th preset threshold IOU1And the second preset threshold IOU2The value ranges of (A) are all 0.5-0.7.
5. The method for selecting a detection box by using Mask R-CNN according to claim 2, further comprising, after screening out the candidate detection boxes as target detection boxes:
carrying out two-dimensional array mapping coding on all the screened candidate detection frames;
carrying out coincidence degree comparison on the coded candidate detection frames;
and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.
an electronic device, comprising a memory and a processor, wherein the memory includes a selection program of a test frame, and the selection program of the test frame realizes the following steps when executed by the processor:
performing example segmentation on a target image by using Mask R-CNN to obtain a rectangular candidate detection frame and a polygonal outline corresponding to the candidate detection frame;
respectively calculating the IOU values of the candidate detection frame and the polygon outline, and when the IOU value of the candidate detection frame is greater than preset threshold IOU1And the IOU value of the polygonal contour is larger than a second preset threshold IOU2Screening the candidate detection frame as a target detection frame; wherein the second preset threshold IOU2Greater than predetermined threshold IOU1
7. The electronic device of claim 6, wherein calculating the IOU value for the polygon outline comprises calculating the IOU value for the polygon outline by a two-dimensional array mapping encoding method;
mapping the polygonal outline and a prediction frame thereof to plane templates which are divided by line segment combinations in advance, wherein the line segment combinations divide the plane templates into equally large division blocks;
respectively corresponding the mapping results of the polygonal contour and the prediction frame thereof to a binary image which is as large as the plane template, and representing each partition block as a mapping coding (A, B) form of a two-dimensional array; the coding state of the polygonal contour corresponding to the partition block is assigned to be A, and the coding state of the prediction frame corresponding to the partition block is assigned to be B;
when the segmentation block is positioned in the polygonal contour, A is 1, and when the segmentation block is positioned outside the polygonal contour, A is 0; when the partition block is located inside the prediction box, B is 1, and when the partition block is located outside the prediction box, B is 0;
calculating an IOU value by counting codes of the segmentation blocks; here, IOU is the number of partitions coded as (1, 1/[ the number of partitions coded as (1, 0) + the number of partitions coded as (0, 1) + the number of partitions coded as (1, 1) ].
8. The electronic device of claim 6,
th preset threshold IOU1And the second preset threshold IOU2The value ranges of (A) are all 0.5-0.7.
9. The electronic device according to claim 7, further comprising, after said screening out the candidate detection frame as a target detection frame:
carrying out two-dimensional array mapping coding on all the screened candidate detection frames;
carrying out coincidence degree comparison on the coded candidate detection frames;
and when the coincidence degree of the two candidate detection frames is greater than the coincidence threshold value, judging that the mirror image exists in the target detected by the two candidate detection frames.
computer-readable storage medium, characterized in that it stores a computer program comprising a selection program of a test frame, which when executed by a processor implements the steps of the method of selecting a test frame using Mask R-CNN according to any of claims 1 to 5.
CN201910885674.7A 2019-09-19 2019-09-19 Method, device and storage medium for selecting detection frame by Mask R-CNN Active CN110738125B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910885674.7A CN110738125B (en) 2019-09-19 2019-09-19 Method, device and storage medium for selecting detection frame by Mask R-CNN
PCT/CN2019/118279 WO2021051601A1 (en) 2019-09-19 2019-11-14 Method and system for selecting detection box using mask r-cnn, and electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910885674.7A CN110738125B (en) 2019-09-19 2019-09-19 Method, device and storage medium for selecting detection frame by Mask R-CNN

Publications (2)

Publication Number Publication Date
CN110738125A true CN110738125A (en) 2020-01-31
CN110738125B CN110738125B (en) 2023-08-01

Family

ID=69268320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910885674.7A Active CN110738125B (en) 2019-09-19 2019-09-19 Method, device and storage medium for selecting detection frame by Mask R-CNN

Country Status (2)

Country Link
CN (1) CN110738125B (en)
WO (1) WO2021051601A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507341A (en) * 2020-04-20 2020-08-07 广州文远知行科技有限公司 Method, device and equipment for adjusting target bounding box and storage medium
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium
CN112861711A (en) * 2021-02-05 2021-05-28 深圳市安软科技股份有限公司 Regional intrusion detection method and device, electronic equipment and storage medium
CN113343779A (en) * 2021-05-14 2021-09-03 南方电网调峰调频发电有限公司 Environment anomaly detection method and device, computer equipment and storage medium
CN113408531A (en) * 2021-07-19 2021-09-17 北博(厦门)智能科技有限公司 Target object shape framing method based on image recognition and terminal
CN113705643A (en) * 2021-08-17 2021-11-26 荣耀终端有限公司 Target detection method and device and electronic equipment
WO2022037170A1 (en) * 2020-08-21 2022-02-24 苏州浪潮智能科技有限公司 Instance segmentation method and system for enhanced image, and device and medium
WO2023109151A1 (en) * 2021-12-14 2023-06-22 青岛海尔电冰箱有限公司 Method for identifying information of item in refrigerator, and refrigerator
WO2023185779A1 (en) * 2022-03-29 2023-10-05 青岛海尔电冰箱有限公司 Method for identifying article information in refrigerator

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409255A (en) * 2021-06-07 2021-09-17 同济大学 Zebra fish morphological classification method based on Mask R-CNN
CN113409267B (en) * 2021-06-17 2023-04-18 西安热工研究院有限公司 Pavement crack detection and segmentation method based on deep learning
CN113591734B (en) * 2021-08-03 2024-02-20 中国科学院空天信息创新研究院 Target detection method based on improved NMS algorithm
CN113469302A (en) * 2021-09-06 2021-10-01 南昌工学院 Multi-circular target identification method and system for video image
CN114526709A (en) * 2022-02-21 2022-05-24 中国科学技术大学先进技术研究院 Area measurement method and device based on unmanned aerial vehicle and storage medium
CN116486265B (en) * 2023-04-26 2023-12-19 北京卫星信息工程研究所 Airplane fine granularity identification method based on target segmentation and graph classification

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150071529A1 (en) * 2013-09-12 2015-03-12 Kabushiki Kaisha Toshiba Learning image collection apparatus, learning apparatus, and target object detection apparatus
CN106529565A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Target identification model training and target identification method and device, and computing equipment
US20170287137A1 (en) * 2016-03-31 2017-10-05 Adobe Systems Incorporated Utilizing deep learning for boundary-aware image segmentation
CN108875577A (en) * 2018-05-11 2018-11-23 深圳市易成自动驾驶技术有限公司 Object detection method, device and computer readable storage medium
US20190147372A1 (en) * 2017-11-15 2019-05-16 Uber Technologies, Inc. Systems and Methods for Object Detection, Tracking, and Motion Prediction
CN109903310A (en) * 2019-01-23 2019-06-18 平安科技(深圳)有限公司 Method for tracking target, device, computer installation and computer storage medium
CN109977943A (en) * 2019-02-14 2019-07-05 平安科技(深圳)有限公司 A kind of images steganalysis method, system and storage medium based on YOLO

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009554A (en) * 2017-12-01 2018-05-08 国信优易数据有限公司 A kind of image processing method and device
CN109389640A (en) * 2018-09-29 2019-02-26 北京字节跳动网络技术有限公司 Image processing method and device
CN110047095B (en) * 2019-03-06 2023-07-21 平安科技(深圳)有限公司 Tracking method and device based on target detection and terminal equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150071529A1 (en) * 2013-09-12 2015-03-12 Kabushiki Kaisha Toshiba Learning image collection apparatus, learning apparatus, and target object detection apparatus
US20170287137A1 (en) * 2016-03-31 2017-10-05 Adobe Systems Incorporated Utilizing deep learning for boundary-aware image segmentation
CN106529565A (en) * 2016-09-23 2017-03-22 北京市商汤科技开发有限公司 Target identification model training and target identification method and device, and computing equipment
US20190147372A1 (en) * 2017-11-15 2019-05-16 Uber Technologies, Inc. Systems and Methods for Object Detection, Tracking, and Motion Prediction
CN108875577A (en) * 2018-05-11 2018-11-23 深圳市易成自动驾驶技术有限公司 Object detection method, device and computer readable storage medium
CN109903310A (en) * 2019-01-23 2019-06-18 平安科技(深圳)有限公司 Method for tracking target, device, computer installation and computer storage medium
CN109977943A (en) * 2019-02-14 2019-07-05 平安科技(深圳)有限公司 A kind of images steganalysis method, system and storage medium based on YOLO

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507341A (en) * 2020-04-20 2020-08-07 广州文远知行科技有限公司 Method, device and equipment for adjusting target bounding box and storage medium
CN111898411A (en) * 2020-06-16 2020-11-06 华南理工大学 Text image labeling system, method, computer device and storage medium
CN111898411B (en) * 2020-06-16 2021-08-31 华南理工大学 Text image labeling system, method, computer device and storage medium
US11748890B2 (en) 2020-08-21 2023-09-05 Inspur Suzhou Intelligent Technology Co., Ltd. Instance segmentation method and system for enhanced image, and device and medium
WO2022037170A1 (en) * 2020-08-21 2022-02-24 苏州浪潮智能科技有限公司 Instance segmentation method and system for enhanced image, and device and medium
CN112861711A (en) * 2021-02-05 2021-05-28 深圳市安软科技股份有限公司 Regional intrusion detection method and device, electronic equipment and storage medium
CN113343779A (en) * 2021-05-14 2021-09-03 南方电网调峰调频发电有限公司 Environment anomaly detection method and device, computer equipment and storage medium
CN113343779B (en) * 2021-05-14 2024-03-12 南方电网调峰调频发电有限公司 Environment abnormality detection method, device, computer equipment and storage medium
CN113408531B (en) * 2021-07-19 2023-07-14 北博(厦门)智能科技有限公司 Target object shape frame selection method and terminal based on image recognition
CN113408531A (en) * 2021-07-19 2021-09-17 北博(厦门)智能科技有限公司 Target object shape framing method based on image recognition and terminal
CN113705643A (en) * 2021-08-17 2021-11-26 荣耀终端有限公司 Target detection method and device and electronic equipment
WO2023109151A1 (en) * 2021-12-14 2023-06-22 青岛海尔电冰箱有限公司 Method for identifying information of item in refrigerator, and refrigerator
WO2023185779A1 (en) * 2022-03-29 2023-10-05 青岛海尔电冰箱有限公司 Method for identifying article information in refrigerator

Also Published As

Publication number Publication date
WO2021051601A1 (en) 2021-03-25
CN110738125B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN110738125A (en) Method, device and storage medium for selecting detection frame by using Mask R-CNN
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN109657533B (en) Pedestrian re-identification method and related product
CN109961009B (en) Pedestrian detection method, system, device and storage medium based on deep learning
CN108304835B (en) character detection method and device
CN110852285B (en) Object detection method and device, computer equipment and storage medium
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN109815843B (en) Image processing method and related product
CN109145766B (en) Model training method and device, recognition method, electronic device and storage medium
CN111368788B (en) Training method and device for image recognition model and electronic equipment
KR101896357B1 (en) Method, device and program for detecting an object
US10885660B2 (en) Object detection method, device, system and storage medium
US20120027252A1 (en) Hand gesture detection
CN109343920B (en) Image processing method and device, equipment and storage medium thereof
CN110084299B (en) Target detection method and device based on multi-head fusion attention
KR101935010B1 (en) Apparatus and method for recognizing license plate of car based on image
CN105046254A (en) Character recognition method and apparatus
CN111368636B (en) Object classification method, device, computer equipment and storage medium
Han et al. Moving object detection revisited: Speed and robustness
WO2020258077A1 (en) Pedestrian detection method and device
CN112200081A (en) Abnormal behavior identification method and device, electronic equipment and storage medium
CN110502977B (en) Building change classification detection method, system, device and storage medium
CN111738164B (en) Pedestrian detection method based on deep learning
CN110889437A (en) Image processing method and device, electronic equipment and storage medium
CN116994084A (en) Regional intrusion detection model training method and regional intrusion detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant