CN111191621A - Rapid and accurate identification method for multi-scale target under large-focus monitoring scene - Google Patents
Rapid and accurate identification method for multi-scale target under large-focus monitoring scene Download PDFInfo
- Publication number
- CN111191621A CN111191621A CN202010004300.2A CN202010004300A CN111191621A CN 111191621 A CN111191621 A CN 111191621A CN 202010004300 A CN202010004300 A CN 202010004300A CN 111191621 A CN111191621 A CN 111191621A
- Authority
- CN
- China
- Prior art keywords
- target
- anchor
- detection
- branch
- target detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012544 monitoring process Methods 0.000 title claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 63
- 230000011218 segmentation Effects 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000013461 design Methods 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims abstract description 4
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 4
- 238000013473 artificial intelligence Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000010191 image analysis Methods 0.000 description 2
- 230000001464 adherent effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
A quick and accurate identification method for multi-scale targets in a large-focus monitoring scene relates to the field of artificial intelligence and the field of computer vision. The method comprises the following steps: 1) dynamic anchor setting: and acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor. 2) Designing a network structure of DANCHORNet: and designing a target detection branch and a target segmentation branch in the DANCHORNet, and solving the setting of a target detection parameter-exceeding threshold value through the combination of the target detection branch and the segmentation branch. 3) Design the loss function of dankhornet: and optimizing a loss function in the training process through a dynamic weight design scheme, and adjusting the total loss by focusing on the average probability value of the target region. According to the method, the detection rate of the multi-scale target under the large-focus monitoring scene can be effectively improved through the dynamic anchor, the accuracy of target detection can be effectively improved through the network structure combining segmentation and dynamic anchor detection, and the overall effect of target identification is further effectively improved.
Description
Technical Field
The invention relates to the field of artificial intelligence and the field of computer vision, in particular to a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene.
Background
Object detection and recognition are widely used in many areas of life, and distinguish objects in images or videos from portions that are not of interest to determine whether an object is present. If the target exists, the position of the target is determined, and the target is identified as a computer vision task. The target detection and identification are a very important research direction in the field of computer vision, and with the rapid development of the internet, artificial intelligence technology and intelligent hardware, a large amount of image and video data exist in human life, so that the computer vision technology plays an increasingly greater role in human life, and the research on the computer vision is more and more hot. Object detection and recognition, as a cornerstone in the field of computer vision, are also receiving increasing attention. The method is also widely applied to actual life, such as target tracking, video monitoring, information security, automatic driving, image retrieval, medical image analysis, network data mining, unmanned aerial vehicle navigation, remote sensing image analysis, national defense systems and the like.
The target detection is also an important branch of image processing and computer vision discipline and is also a core part of an intelligent monitoring system, and simultaneously, the target detection is also a basic algorithm in the field of universal identity recognition and plays an important role in subsequent tasks such as face recognition, gait recognition, crowd counting, instance segmentation and the like. Therefore, the method has important practical significance in improving the accuracy of target detection and reducing the missing rate of the target.
Currently, there are two main types of research methods for target detection and identification: the method comprises a target detection and identification method based on traditional image processing and machine learning algorithm and a target detection and identification method based on deep learning.
1. The target detection and identification method based on the traditional image processing and machine learning algorithm comprises the following steps:
the conventional target detection and identification method can be expressed as: target feature extraction- > target recognition- > target positioning. The Features used herein are all designed artificially, such as SIFT (scale invariant feature transform matching algorithm), HOG (histogram of oriented gradient Features), SURF (accelerated Robust feature speedup Robust Features), and so on. The target is identified through the characteristics, and then the target is positioned by combining with a corresponding strategy.
2. The target detection and identification method based on deep learning comprises the following steps:
nowadays, target detection and identification based on deep learning becomes a mainstream method, and can be expressed as: and (3) extracting depth features of the image- > identifying and positioning the target based on a depth neural network, wherein the depth neural network model used is a convolutional neural network CNN. Currently, the existing target detection and recognition algorithms based on deep learning can be roughly classified into the following three categories:
1) target detection and identification algorithms based on regional recommendations, such as R-CNN, Fast-R-CNN.
2) Regression-based target detection and identification algorithms, such as YOLO, SSD.
3) Search-based object detection and recognition algorithms, such as AttentionNet based on visual attention, algorithms based on reinforcement learning.
The prior art also has the following defects:
1. the target detection algorithm based on the traditional image processing and machine learning algorithm has the following defects:
(1) when a large-focus monitoring scene is encountered, the difference between the near-end target and the far-end target is very large, and targets with multiple scales exist in the same scene. When the target prediction area is selected, the size and the length-width ratio of the sliding window cannot be effectively set by adopting the sliding window mode, so the exhaustion mode of the sliding window has long time consumption and high redundancy.
(2) In a large-focus monitoring scene, a target is larger when being close to a camera, smaller when being far away from the camera, and larger in target size change, so that the near-end target and the far-end target cannot be accurately identified in the large-focus scene by using a traditional method, and the generalization capability is poor.
2. The target detection and identification method based on deep learning has the following defects:
(1) most of the existing target detection methods based on deep learning use a mode based on fixed anchor regression, when a large-focus monitoring scene is encountered, a plurality of targets with different sizes exist, the fixed anchor cannot effectively take into account the situation that the difference of the sizes of the targets is large, so that a detection network cannot be converged or the quality of a training network is low, and missing detection and false detection of the targets are easily caused.
(2) When the deep learning network is used for target detection, a super-parameter threshold value needs to be set to detect a target, and only when the confidence coefficient of a network prediction target is greater than the set super-parameter threshold value, the prediction frame is considered as the target, so that the super-parameter threshold value has great influence on the detection rate and accuracy of the target, and an empirical value is often set in practical application. However, when the threshold is set to be high, missed detection is caused, and when the threshold is set to be low, false detection is caused, so that the trained network model cannot be fully utilized to realize the identification of the target.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a quick and accurate identification method for a multi-scale target in a large-focus monitoring scene. The method can effectively improve the detectable rate of multi-scale targets under a large-focus monitoring scene through the dynamic anchor, and can effectively improve the accuracy of target detection through a network structure combining segmentation and dynamic anchor detection, thereby effectively improving the overall effect of target identification.
In order to achieve the above object, the technical solution of the present invention is implemented as follows:
a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene comprises the following steps:
1) dynamic anchor setting:
and acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor.
2) Designing a network structure of DANCHORNet:
and designing a target detection branch and a target segmentation branch in the DANCHORNet, and solving the setting of a target detection parameter-exceeding threshold value through the combination of the target detection branch and the segmentation branch.
3) Design the loss function of dankhornet:
a loss function in the training process is optimized through a dynamic weight design scheme, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss.
As the method is adopted, compared with the prior art, the method has the following advantages that:
1. according to the method, the anchor value is dynamically set according to the position of the target, the utilization rate of the anchor in target detection can be effectively improved, large and small targets in a large focal distance scene can be effectively considered, meanwhile, the network is easier to converge, and the detection rate of the multi-scale target in the large focal distance scene is effectively improved.
2, a combination mode of fusion and segmentation branch loss functions is adopted in the DANCHORNet, the difficulty of network training is undoubtedly increased by a fusion and segmentation network, a dynamic weight design scheme is proposed to optimize the loss functions in the training process, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss. When the average probability value of the target area is higher, the segmentation network is better trained, and the loss contribution of the segmentation network can be reduced. When the average probability value of the target area is lower, the convergence of the segmented network is not good enough, the loss contribution of the segmented network is required to be improved, the training difficulty of the network is reduced, and the network training effect is improved.
3. A new network structure DANCHORNet is provided to improve the target detection effect, a fused and segmented target detection method is provided, segmentation branches are added on the premise that the calculation amount is not increased greatly, and a new network structure DANCHORNet is obtained after the segmentation branches are fused. And calculating an intersection set of the detection targets of the two branches through the DANCHORNet to obtain a final detection result, and when the intersection set of the two branches meets the set requirement, considering the prediction frame as a target. The network structure can avoid setting of a target confidence coefficient super-parameter threshold value in an independent detection method, fully utilizes a network model, and effectively improves the accuracy of target detection.
The invention is further described with reference to the following figures and detailed description.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 shows a merged segmented network structure DAnchorNet according to an embodiment of the present invention.
Detailed Description
Referring to fig. 1 and 2, the invention provides a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene, which comprises the following steps:
1. dynamic anchor setting:
acquiring training data, performing data fitting on a training target, and acquiring a fitting result of an anchor, wherein the steps are as follows:
(1) obtaining data M (x, y, w, h), MiFor the ith group of data, x, in the data seti、yiIs the coordinates of the upper left corner of the ith target, wi、hiAnd recombining the data M (x, y, w, h) to obtain two groups of data M _ h (y, h) and M _ w (y, w) for the length and width of the ith target.
(2) And respectively carrying out data linear fitting on the obtained M _ h (y, h) and M _ w (y, w) to obtain a slope k _ w and an intercept b _ w for M _ w (y, w) fitting and a slope k _ h and an intercept b _ h for M _ h (y, h) fitting.
(3) During network training, the width of the anchor _ w and the height of the anchor _ h are dynamically set through k _ w, b _ w, k _ h and b _ h, and the result is as follows:
where y is the height coordinate of the original image converted by j in grid (i, j) on each featuremap.
2. Designing a network structure of DANCHORNet:
(1) obtaining target detection result R through detection branchd,RdIncluding coordinate locations of predicted targetsTo put Rd_x、 Rd_yLength and width R of targetd_w、 Rd_hConfidence of target Rd_conf。
(2) Obtaining a target segmentation result F _ seg through a segmentation branch, wherein the segmentation result comprises two single-channel segmentation graphs Ffull_seg、FinterSeg, wherein FfullSeg is the result of the predictive segmentation of all objects, FinterSeg is the result of segmentation of the adherent part of all objects, by Ffull_seg、FinterSeg obtains the individual segmentation result Seg of the final image target.
(3) And performing contour extraction on the obtained segmentation result Seg to further obtain an outer contour rectangle Seg _ Seg _ body of the target, wherein the Seg _ body comprises coordinate positions S _ x and S _ y of the upper left corner of the segmented target, the length and width S _ w and S _ h of the target and the confidence coefficient S _ conf of the target.
(4) Through S _ conf, Rd_confTo obtain the final results R _1, R _2 of the partial targets, which is calculated as shown in equation (4):
(5) calculating an intersection set IOU (input output) among the rest detection results calculated in the step (4) to obtain a target detection result R _3, and setting an intersection set threshold ThIOU0.7, Seg _ boud obtained by dividing the prediction result with lower confidence coefficient and R obtained by detectiondThe target judgment is carried out in a combined mode, if the two targets are intersected and integrated IOU>ThIOUObtaining the final target detection result R _3, and calculating as shown in the following formula (5) if IOU<ThIOUThe current target is discarded.
(6) And (5) merging the R _1, the R _2 and the R _3 obtained in the steps (4) and (5) to obtain a final detection result R _ all.
3. Design the loss function of dankhornet:
(1) obtaining loss L of a detection branch1The loss function is the loss function of yolo _ v 3.
(2) Obtaining loss L of split branches2The loss function is a sigmoid loss function, Pi,jObtaining the probability values of all feature maps of the target position Area of a group gateway for the probability values of the i, j positions of the finally segmented feature maps, obtaining the total probability value P if the group gateway has N target boxes and the total Area of the target Area is Area, and further obtaining the average probability value P of the N target areasavg:
(3) P obtained according to the step (2)avgThe total loss L is obtained dynamically, and the calculation method is as follows:
in the method, the segmented target detection network structure DANCHorrNet is combined, an original target detection method is optimized, a large target and a small target under a large scene are effectively considered by utilizing a dynamic anchor, and the detection rate of the network under the condition of multi-scale targets is improved; and then, a segmentation network is led out from the detected branch, the setting of the confidence coefficient of an individual target detection network is avoided by combining the two, and the detection rate and the accuracy of the target are effectively improved under the condition of small increase of the calculated amount.
Claims (1)
1. A method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene comprises the following steps:
1) dynamic anchor setting:
acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor;
2) designing a network structure of DANCHORNet:
designing a DANCHORNet network structure, wherein the network structure comprises two branches, one is a target detection branch, the other is a target segmentation branch, the target segmentation branch and the target detection branch share one basic network, and the setting of a target detection hyper-parameter threshold is solved through the combination of the target detection branch and the segmentation branch;
3) design the loss function of dankhornet:
a loss function in the training process is optimized through a dynamic weight design scheme, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010004300.2A CN111191621A (en) | 2020-01-03 | 2020-01-03 | Rapid and accurate identification method for multi-scale target under large-focus monitoring scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010004300.2A CN111191621A (en) | 2020-01-03 | 2020-01-03 | Rapid and accurate identification method for multi-scale target under large-focus monitoring scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111191621A true CN111191621A (en) | 2020-05-22 |
Family
ID=70708022
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010004300.2A Pending CN111191621A (en) | 2020-01-03 | 2020-01-03 | Rapid and accurate identification method for multi-scale target under large-focus monitoring scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111191621A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3171297A1 (en) * | 2015-11-18 | 2017-05-24 | CentraleSupélec | Joint boundary detection image segmentation and object recognition using deep learning |
US20180137642A1 (en) * | 2016-11-15 | 2018-05-17 | Magic Leap, Inc. | Deep learning system for cuboid detection |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108694401A (en) * | 2018-05-09 | 2018-10-23 | 北京旷视科技有限公司 | Object detection method, apparatus and system |
CN109325418A (en) * | 2018-08-23 | 2019-02-12 | 华南理工大学 | Based on pedestrian recognition method under the road traffic environment for improving YOLOv3 |
CN109816024A (en) * | 2019-01-29 | 2019-05-28 | 电子科技大学 | A kind of real-time automobile logo detection method based on multi-scale feature fusion and DCNN |
CN109902629A (en) * | 2019-03-01 | 2019-06-18 | 成都康乔电子有限责任公司 | A kind of real-time vehicle target detection model under vehicles in complex traffic scene |
CN109919000A (en) * | 2019-01-23 | 2019-06-21 | 杭州电子科技大学 | A kind of Ship Target Detection method based on Multiscale Fusion strategy |
CN109934236A (en) * | 2019-01-24 | 2019-06-25 | 杰创智能科技股份有限公司 | A kind of multiple dimensioned switch target detection algorithm based on deep learning |
KR20190085464A (en) * | 2018-01-10 | 2019-07-18 | 삼성전자주식회사 | A method of processing an image, and apparatuses performing the same |
-
2020
- 2020-01-03 CN CN202010004300.2A patent/CN111191621A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3171297A1 (en) * | 2015-11-18 | 2017-05-24 | CentraleSupélec | Joint boundary detection image segmentation and object recognition using deep learning |
US20180137642A1 (en) * | 2016-11-15 | 2018-05-17 | Magic Leap, Inc. | Deep learning system for cuboid detection |
KR20190085464A (en) * | 2018-01-10 | 2019-07-18 | 삼성전자주식회사 | A method of processing an image, and apparatuses performing the same |
CN108460403A (en) * | 2018-01-23 | 2018-08-28 | 上海交通大学 | The object detection method and system of multi-scale feature fusion in a kind of image |
CN108694401A (en) * | 2018-05-09 | 2018-10-23 | 北京旷视科技有限公司 | Object detection method, apparatus and system |
CN109325418A (en) * | 2018-08-23 | 2019-02-12 | 华南理工大学 | Based on pedestrian recognition method under the road traffic environment for improving YOLOv3 |
CN109919000A (en) * | 2019-01-23 | 2019-06-21 | 杭州电子科技大学 | A kind of Ship Target Detection method based on Multiscale Fusion strategy |
CN109934236A (en) * | 2019-01-24 | 2019-06-25 | 杰创智能科技股份有限公司 | A kind of multiple dimensioned switch target detection algorithm based on deep learning |
CN109816024A (en) * | 2019-01-29 | 2019-05-28 | 电子科技大学 | A kind of real-time automobile logo detection method based on multi-scale feature fusion and DCNN |
CN109902629A (en) * | 2019-03-01 | 2019-06-18 | 成都康乔电子有限责任公司 | A kind of real-time vehicle target detection model under vehicles in complex traffic scene |
Non-Patent Citations (1)
Title |
---|
张楚楚;吕学斌;: "基于改进YOLOv2网络的密集人群场景行人检测", 现代计算机(专业版), no. 28 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107330920B (en) | Monitoring video multi-target tracking method based on deep learning | |
CN109636829B (en) | Multi-target tracking method based on semantic information and scene information | |
Tsintotas et al. | Assigning visual words to places for loop closure detection | |
CN107832672B (en) | Pedestrian re-identification method for designing multi-loss function by utilizing attitude information | |
Yang et al. | Multi-object tracking with discriminant correlation filter based deep learning tracker | |
CN106886216B (en) | Robot automatic tracking method and system based on RGBD face detection | |
WO2017000466A1 (en) | Method and system for tracking moving target based on optical flow method | |
Su et al. | Global localization of a mobile robot using lidar and visual features | |
CN112598713A (en) | Offshore submarine fish detection and tracking statistical method based on deep learning | |
CN105513053A (en) | Background modeling method for video analysis | |
CN110751619A (en) | Insulator defect detection method | |
Tu et al. | Instance segmentation based on mask scoring R-CNN for group-housed pigs | |
CN111241987B (en) | Multi-target model visual tracking method based on cost-sensitive three-branch decision | |
CN110689557A (en) | Improved anti-occlusion target tracking method based on KCF | |
CN111881775B (en) | Real-time face recognition method and device | |
CN109344712A (en) | A kind of road vehicle tracking | |
CN113129336A (en) | End-to-end multi-vehicle tracking method, system and computer readable medium | |
Hu et al. | Automatic detection of pecan fruits based on Faster RCNN with FPN in orchard | |
CN116721398A (en) | Yolov5 target detection method based on cross-stage route attention module and residual information fusion module | |
CN108985216B (en) | Pedestrian head detection method based on multivariate logistic regression feature fusion | |
CN117011346A (en) | Blower image registration algorithm | |
Ge et al. | Detection and localization strategy based on YOLO for robot sorting under complex lighting conditions | |
CN111191621A (en) | Rapid and accurate identification method for multi-scale target under large-focus monitoring scene | |
CN112069997B (en) | Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net | |
CN110634151B (en) | Single-target tracking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |