CN111191621A - Rapid and accurate identification method for multi-scale target under large-focus monitoring scene - Google Patents

Rapid and accurate identification method for multi-scale target under large-focus monitoring scene Download PDF

Info

Publication number
CN111191621A
CN111191621A CN202010004300.2A CN202010004300A CN111191621A CN 111191621 A CN111191621 A CN 111191621A CN 202010004300 A CN202010004300 A CN 202010004300A CN 111191621 A CN111191621 A CN 111191621A
Authority
CN
China
Prior art keywords
target
anchor
detection
branch
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010004300.2A
Other languages
Chinese (zh)
Inventor
魏世安
刘立强
江龙
王亚涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua Tongfang Co Ltd
Beijing Tongfang Software Co Ltd
SG Biofuels Ltd
Original Assignee
Beijing Tongfang Software Co Ltd
SG Biofuels Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tongfang Software Co Ltd, SG Biofuels Ltd filed Critical Beijing Tongfang Software Co Ltd
Priority to CN202010004300.2A priority Critical patent/CN111191621A/en
Publication of CN111191621A publication Critical patent/CN111191621A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A quick and accurate identification method for multi-scale targets in a large-focus monitoring scene relates to the field of artificial intelligence and the field of computer vision. The method comprises the following steps: 1) dynamic anchor setting: and acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor. 2) Designing a network structure of DANCHORNet: and designing a target detection branch and a target segmentation branch in the DANCHORNet, and solving the setting of a target detection parameter-exceeding threshold value through the combination of the target detection branch and the segmentation branch. 3) Design the loss function of dankhornet: and optimizing a loss function in the training process through a dynamic weight design scheme, and adjusting the total loss by focusing on the average probability value of the target region. According to the method, the detection rate of the multi-scale target under the large-focus monitoring scene can be effectively improved through the dynamic anchor, the accuracy of target detection can be effectively improved through the network structure combining segmentation and dynamic anchor detection, and the overall effect of target identification is further effectively improved.

Description

Rapid and accurate identification method for multi-scale target under large-focus monitoring scene
Technical Field
The invention relates to the field of artificial intelligence and the field of computer vision, in particular to a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene.
Background
Object detection and recognition are widely used in many areas of life, and distinguish objects in images or videos from portions that are not of interest to determine whether an object is present. If the target exists, the position of the target is determined, and the target is identified as a computer vision task. The target detection and identification are a very important research direction in the field of computer vision, and with the rapid development of the internet, artificial intelligence technology and intelligent hardware, a large amount of image and video data exist in human life, so that the computer vision technology plays an increasingly greater role in human life, and the research on the computer vision is more and more hot. Object detection and recognition, as a cornerstone in the field of computer vision, are also receiving increasing attention. The method is also widely applied to actual life, such as target tracking, video monitoring, information security, automatic driving, image retrieval, medical image analysis, network data mining, unmanned aerial vehicle navigation, remote sensing image analysis, national defense systems and the like.
The target detection is also an important branch of image processing and computer vision discipline and is also a core part of an intelligent monitoring system, and simultaneously, the target detection is also a basic algorithm in the field of universal identity recognition and plays an important role in subsequent tasks such as face recognition, gait recognition, crowd counting, instance segmentation and the like. Therefore, the method has important practical significance in improving the accuracy of target detection and reducing the missing rate of the target.
Currently, there are two main types of research methods for target detection and identification: the method comprises a target detection and identification method based on traditional image processing and machine learning algorithm and a target detection and identification method based on deep learning.
1. The target detection and identification method based on the traditional image processing and machine learning algorithm comprises the following steps:
the conventional target detection and identification method can be expressed as: target feature extraction- > target recognition- > target positioning. The Features used herein are all designed artificially, such as SIFT (scale invariant feature transform matching algorithm), HOG (histogram of oriented gradient Features), SURF (accelerated Robust feature speedup Robust Features), and so on. The target is identified through the characteristics, and then the target is positioned by combining with a corresponding strategy.
2. The target detection and identification method based on deep learning comprises the following steps:
nowadays, target detection and identification based on deep learning becomes a mainstream method, and can be expressed as: and (3) extracting depth features of the image- > identifying and positioning the target based on a depth neural network, wherein the depth neural network model used is a convolutional neural network CNN. Currently, the existing target detection and recognition algorithms based on deep learning can be roughly classified into the following three categories:
1) target detection and identification algorithms based on regional recommendations, such as R-CNN, Fast-R-CNN.
2) Regression-based target detection and identification algorithms, such as YOLO, SSD.
3) Search-based object detection and recognition algorithms, such as AttentionNet based on visual attention, algorithms based on reinforcement learning.
The prior art also has the following defects:
1. the target detection algorithm based on the traditional image processing and machine learning algorithm has the following defects:
(1) when a large-focus monitoring scene is encountered, the difference between the near-end target and the far-end target is very large, and targets with multiple scales exist in the same scene. When the target prediction area is selected, the size and the length-width ratio of the sliding window cannot be effectively set by adopting the sliding window mode, so the exhaustion mode of the sliding window has long time consumption and high redundancy.
(2) In a large-focus monitoring scene, a target is larger when being close to a camera, smaller when being far away from the camera, and larger in target size change, so that the near-end target and the far-end target cannot be accurately identified in the large-focus scene by using a traditional method, and the generalization capability is poor.
2. The target detection and identification method based on deep learning has the following defects:
(1) most of the existing target detection methods based on deep learning use a mode based on fixed anchor regression, when a large-focus monitoring scene is encountered, a plurality of targets with different sizes exist, the fixed anchor cannot effectively take into account the situation that the difference of the sizes of the targets is large, so that a detection network cannot be converged or the quality of a training network is low, and missing detection and false detection of the targets are easily caused.
(2) When the deep learning network is used for target detection, a super-parameter threshold value needs to be set to detect a target, and only when the confidence coefficient of a network prediction target is greater than the set super-parameter threshold value, the prediction frame is considered as the target, so that the super-parameter threshold value has great influence on the detection rate and accuracy of the target, and an empirical value is often set in practical application. However, when the threshold is set to be high, missed detection is caused, and when the threshold is set to be low, false detection is caused, so that the trained network model cannot be fully utilized to realize the identification of the target.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a quick and accurate identification method for a multi-scale target in a large-focus monitoring scene. The method can effectively improve the detectable rate of multi-scale targets under a large-focus monitoring scene through the dynamic anchor, and can effectively improve the accuracy of target detection through a network structure combining segmentation and dynamic anchor detection, thereby effectively improving the overall effect of target identification.
In order to achieve the above object, the technical solution of the present invention is implemented as follows:
a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene comprises the following steps:
1) dynamic anchor setting:
and acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor.
2) Designing a network structure of DANCHORNet:
and designing a target detection branch and a target segmentation branch in the DANCHORNet, and solving the setting of a target detection parameter-exceeding threshold value through the combination of the target detection branch and the segmentation branch.
3) Design the loss function of dankhornet:
a loss function in the training process is optimized through a dynamic weight design scheme, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss.
As the method is adopted, compared with the prior art, the method has the following advantages that:
1. according to the method, the anchor value is dynamically set according to the position of the target, the utilization rate of the anchor in target detection can be effectively improved, large and small targets in a large focal distance scene can be effectively considered, meanwhile, the network is easier to converge, and the detection rate of the multi-scale target in the large focal distance scene is effectively improved.
2, a combination mode of fusion and segmentation branch loss functions is adopted in the DANCHORNet, the difficulty of network training is undoubtedly increased by a fusion and segmentation network, a dynamic weight design scheme is proposed to optimize the loss functions in the training process, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss. When the average probability value of the target area is higher, the segmentation network is better trained, and the loss contribution of the segmentation network can be reduced. When the average probability value of the target area is lower, the convergence of the segmented network is not good enough, the loss contribution of the segmented network is required to be improved, the training difficulty of the network is reduced, and the network training effect is improved.
3. A new network structure DANCHORNet is provided to improve the target detection effect, a fused and segmented target detection method is provided, segmentation branches are added on the premise that the calculation amount is not increased greatly, and a new network structure DANCHORNet is obtained after the segmentation branches are fused. And calculating an intersection set of the detection targets of the two branches through the DANCHORNet to obtain a final detection result, and when the intersection set of the two branches meets the set requirement, considering the prediction frame as a target. The network structure can avoid setting of a target confidence coefficient super-parameter threshold value in an independent detection method, fully utilizes a network model, and effectively improves the accuracy of target detection.
The invention is further described with reference to the following figures and detailed description.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 shows a merged segmented network structure DAnchorNet according to an embodiment of the present invention.
Detailed Description
Referring to fig. 1 and 2, the invention provides a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene, which comprises the following steps:
1. dynamic anchor setting:
acquiring training data, performing data fitting on a training target, and acquiring a fitting result of an anchor, wherein the steps are as follows:
(1) obtaining data M (x, y, w, h), MiFor the ith group of data, x, in the data seti、yiIs the coordinates of the upper left corner of the ith target, wi、hiAnd recombining the data M (x, y, w, h) to obtain two groups of data M _ h (y, h) and M _ w (y, w) for the length and width of the ith target.
(2) And respectively carrying out data linear fitting on the obtained M _ h (y, h) and M _ w (y, w) to obtain a slope k _ w and an intercept b _ w for M _ w (y, w) fitting and a slope k _ h and an intercept b _ h for M _ h (y, h) fitting.
(3) During network training, the width of the anchor _ w and the height of the anchor _ h are dynamically set through k _ w, b _ w, k _ h and b _ h, and the result is as follows:
Figure DEST_PATH_IMAGE001
where y is the height coordinate of the original image converted by j in grid (i, j) on each featuremap.
2. Designing a network structure of DANCHORNet:
(1) obtaining target detection result R through detection branchd,RdIncluding coordinate locations of predicted targetsTo put Rd_x、 Rd_yLength and width R of targetd_w、 Rd_hConfidence of target Rd_conf
(2) Obtaining a target segmentation result F _ seg through a segmentation branch, wherein the segmentation result comprises two single-channel segmentation graphs Ffull_seg、FinterSeg, wherein FfullSeg is the result of the predictive segmentation of all objects, FinterSeg is the result of segmentation of the adherent part of all objects, by Ffull_seg、FinterSeg obtains the individual segmentation result Seg of the final image target.
Figure 957592DEST_PATH_IMAGE002
(3) And performing contour extraction on the obtained segmentation result Seg to further obtain an outer contour rectangle Seg _ Seg _ body of the target, wherein the Seg _ body comprises coordinate positions S _ x and S _ y of the upper left corner of the segmented target, the length and width S _ w and S _ h of the target and the confidence coefficient S _ conf of the target.
(4) Through S _ conf, Rd_confTo obtain the final results R _1, R _2 of the partial targets, which is calculated as shown in equation (4):
Figure DEST_PATH_IMAGE003
(5) calculating an intersection set IOU (input output) among the rest detection results calculated in the step (4) to obtain a target detection result R _3, and setting an intersection set threshold ThIOU0.7, Seg _ boud obtained by dividing the prediction result with lower confidence coefficient and R obtained by detectiondThe target judgment is carried out in a combined mode, if the two targets are intersected and integrated IOU>ThIOUObtaining the final target detection result R _3, and calculating as shown in the following formula (5) if IOU<ThIOUThe current target is discarded.
Figure 235252DEST_PATH_IMAGE004
(6) And (5) merging the R _1, the R _2 and the R _3 obtained in the steps (4) and (5) to obtain a final detection result R _ all.
3. Design the loss function of dankhornet:
(1) obtaining loss L of a detection branch1The loss function is the loss function of yolo _ v 3.
(2) Obtaining loss L of split branches2The loss function is a sigmoid loss function, Pi,jObtaining the probability values of all feature maps of the target position Area of a group gateway for the probability values of the i, j positions of the finally segmented feature maps, obtaining the total probability value P if the group gateway has N target boxes and the total Area of the target Area is Area, and further obtaining the average probability value P of the N target areasavg
Figure DEST_PATH_IMAGE005
(3) P obtained according to the step (2)avgThe total loss L is obtained dynamically, and the calculation method is as follows:
Figure 342885DEST_PATH_IMAGE006
in the method, the segmented target detection network structure DANCHorrNet is combined, an original target detection method is optimized, a large target and a small target under a large scene are effectively considered by utilizing a dynamic anchor, and the detection rate of the network under the condition of multi-scale targets is improved; and then, a segmentation network is led out from the detected branch, the setting of the confidence coefficient of an individual target detection network is avoided by combining the two, and the detection rate and the accuracy of the target are effectively improved under the condition of small increase of the calculated amount.

Claims (1)

1. A method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene comprises the following steps:
1) dynamic anchor setting:
acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor;
2) designing a network structure of DANCHORNet:
designing a DANCHORNet network structure, wherein the network structure comprises two branches, one is a target detection branch, the other is a target segmentation branch, the target segmentation branch and the target detection branch share one basic network, and the setting of a target detection hyper-parameter threshold is solved through the combination of the target detection branch and the segmentation branch;
3) design the loss function of dankhornet:
a loss function in the training process is optimized through a dynamic weight design scheme, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss.
CN202010004300.2A 2020-01-03 2020-01-03 Rapid and accurate identification method for multi-scale target under large-focus monitoring scene Pending CN111191621A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010004300.2A CN111191621A (en) 2020-01-03 2020-01-03 Rapid and accurate identification method for multi-scale target under large-focus monitoring scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010004300.2A CN111191621A (en) 2020-01-03 2020-01-03 Rapid and accurate identification method for multi-scale target under large-focus monitoring scene

Publications (1)

Publication Number Publication Date
CN111191621A true CN111191621A (en) 2020-05-22

Family

ID=70708022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010004300.2A Pending CN111191621A (en) 2020-01-03 2020-01-03 Rapid and accurate identification method for multi-scale target under large-focus monitoring scene

Country Status (1)

Country Link
CN (1) CN111191621A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
US20180137642A1 (en) * 2016-11-15 2018-05-17 Magic Leap, Inc. Deep learning system for cuboid detection
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108694401A (en) * 2018-05-09 2018-10-23 北京旷视科技有限公司 Object detection method, apparatus and system
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN109816024A (en) * 2019-01-29 2019-05-28 电子科技大学 A kind of real-time automobile logo detection method based on multi-scale feature fusion and DCNN
CN109902629A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of real-time vehicle target detection model under vehicles in complex traffic scene
CN109919000A (en) * 2019-01-23 2019-06-21 杭州电子科技大学 A kind of Ship Target Detection method based on Multiscale Fusion strategy
CN109934236A (en) * 2019-01-24 2019-06-25 杰创智能科技股份有限公司 A kind of multiple dimensioned switch target detection algorithm based on deep learning
KR20190085464A (en) * 2018-01-10 2019-07-18 삼성전자주식회사 A method of processing an image, and apparatuses performing the same

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3171297A1 (en) * 2015-11-18 2017-05-24 CentraleSupélec Joint boundary detection image segmentation and object recognition using deep learning
US20180137642A1 (en) * 2016-11-15 2018-05-17 Magic Leap, Inc. Deep learning system for cuboid detection
KR20190085464A (en) * 2018-01-10 2019-07-18 삼성전자주식회사 A method of processing an image, and apparatuses performing the same
CN108460403A (en) * 2018-01-23 2018-08-28 上海交通大学 The object detection method and system of multi-scale feature fusion in a kind of image
CN108694401A (en) * 2018-05-09 2018-10-23 北京旷视科技有限公司 Object detection method, apparatus and system
CN109325418A (en) * 2018-08-23 2019-02-12 华南理工大学 Based on pedestrian recognition method under the road traffic environment for improving YOLOv3
CN109919000A (en) * 2019-01-23 2019-06-21 杭州电子科技大学 A kind of Ship Target Detection method based on Multiscale Fusion strategy
CN109934236A (en) * 2019-01-24 2019-06-25 杰创智能科技股份有限公司 A kind of multiple dimensioned switch target detection algorithm based on deep learning
CN109816024A (en) * 2019-01-29 2019-05-28 电子科技大学 A kind of real-time automobile logo detection method based on multi-scale feature fusion and DCNN
CN109902629A (en) * 2019-03-01 2019-06-18 成都康乔电子有限责任公司 A kind of real-time vehicle target detection model under vehicles in complex traffic scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张楚楚;吕学斌;: "基于改进YOLOv2网络的密集人群场景行人检测", 现代计算机(专业版), no. 28 *

Similar Documents

Publication Publication Date Title
CN107330920B (en) Monitoring video multi-target tracking method based on deep learning
CN109636829B (en) Multi-target tracking method based on semantic information and scene information
Tsintotas et al. Assigning visual words to places for loop closure detection
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
Yang et al. Multi-object tracking with discriminant correlation filter based deep learning tracker
CN106886216B (en) Robot automatic tracking method and system based on RGBD face detection
WO2017000466A1 (en) Method and system for tracking moving target based on optical flow method
Su et al. Global localization of a mobile robot using lidar and visual features
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN105513053A (en) Background modeling method for video analysis
CN110751619A (en) Insulator defect detection method
Tu et al. Instance segmentation based on mask scoring R-CNN for group-housed pigs
CN111241987B (en) Multi-target model visual tracking method based on cost-sensitive three-branch decision
CN110689557A (en) Improved anti-occlusion target tracking method based on KCF
CN111881775B (en) Real-time face recognition method and device
CN109344712A (en) A kind of road vehicle tracking
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
Hu et al. Automatic detection of pecan fruits based on Faster RCNN with FPN in orchard
CN116721398A (en) Yolov5 target detection method based on cross-stage route attention module and residual information fusion module
CN108985216B (en) Pedestrian head detection method based on multivariate logistic regression feature fusion
CN117011346A (en) Blower image registration algorithm
Ge et al. Detection and localization strategy based on YOLO for robot sorting under complex lighting conditions
CN111191621A (en) Rapid and accurate identification method for multi-scale target under large-focus monitoring scene
CN112069997B (en) Unmanned aerial vehicle autonomous landing target extraction method and device based on DenseHR-Net
CN110634151B (en) Single-target tracking method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination