CN111191621A

CN111191621A - Rapid and accurate identification method for multi-scale target under large-focus monitoring scene

Info

Publication number: CN111191621A
Application number: CN202010004300.2A
Authority: CN
Inventors: 魏世安; 刘立强; 江龙; 王亚涛
Original assignee: Beijing Tongfang Software Co Ltd; SG Biofuels Ltd
Current assignee: Tsinghua Tongfang Co Ltd; Beijing Tongfang Software Co Ltd; SG Biofuels Ltd
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2020-05-22

Abstract

A quick and accurate identification method for multi-scale targets in a large-focus monitoring scene relates to the field of artificial intelligence and the field of computer vision. The method comprises the following steps: 1) dynamic anchor setting: and acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor. 2) Designing a network structure of DANCHORNet: and designing a target detection branch and a target segmentation branch in the DANCHORNet, and solving the setting of a target detection parameter-exceeding threshold value through the combination of the target detection branch and the segmentation branch. 3) Design the loss function of dankhornet: and optimizing a loss function in the training process through a dynamic weight design scheme, and adjusting the total loss by focusing on the average probability value of the target region. According to the method, the detection rate of the multi-scale target under the large-focus monitoring scene can be effectively improved through the dynamic anchor, the accuracy of target detection can be effectively improved through the network structure combining segmentation and dynamic anchor detection, and the overall effect of target identification is further effectively improved.

Description

Rapid and accurate identification method for multi-scale target under large-focus monitoring scene

Technical Field

The invention relates to the field of artificial intelligence and the field of computer vision, in particular to a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene.

Background

Object detection and recognition are widely used in many areas of life, and distinguish objects in images or videos from portions that are not of interest to determine whether an object is present. If the target exists, the position of the target is determined, and the target is identified as a computer vision task. The target detection and identification are a very important research direction in the field of computer vision, and with the rapid development of the internet, artificial intelligence technology and intelligent hardware, a large amount of image and video data exist in human life, so that the computer vision technology plays an increasingly greater role in human life, and the research on the computer vision is more and more hot. Object detection and recognition, as a cornerstone in the field of computer vision, are also receiving increasing attention. The method is also widely applied to actual life, such as target tracking, video monitoring, information security, automatic driving, image retrieval, medical image analysis, network data mining, unmanned aerial vehicle navigation, remote sensing image analysis, national defense systems and the like.

The target detection is also an important branch of image processing and computer vision discipline and is also a core part of an intelligent monitoring system, and simultaneously, the target detection is also a basic algorithm in the field of universal identity recognition and plays an important role in subsequent tasks such as face recognition, gait recognition, crowd counting, instance segmentation and the like. Therefore, the method has important practical significance in improving the accuracy of target detection and reducing the missing rate of the target.

Currently, there are two main types of research methods for target detection and identification: the method comprises a target detection and identification method based on traditional image processing and machine learning algorithm and a target detection and identification method based on deep learning.

1. The target detection and identification method based on the traditional image processing and machine learning algorithm comprises the following steps:

the conventional target detection and identification method can be expressed as: target feature extraction- > target recognition- > target positioning. The Features used herein are all designed artificially, such as SIFT (scale invariant feature transform matching algorithm), HOG (histogram of oriented gradient Features), SURF (accelerated Robust feature speedup Robust Features), and so on. The target is identified through the characteristics, and then the target is positioned by combining with a corresponding strategy.

2. The target detection and identification method based on deep learning comprises the following steps:

nowadays, target detection and identification based on deep learning becomes a mainstream method, and can be expressed as: and (3) extracting depth features of the image- > identifying and positioning the target based on a depth neural network, wherein the depth neural network model used is a convolutional neural network CNN. Currently, the existing target detection and recognition algorithms based on deep learning can be roughly classified into the following three categories:

1) target detection and identification algorithms based on regional recommendations, such as R-CNN, Fast-R-CNN.

2) Regression-based target detection and identification algorithms, such as YOLO, SSD.

3) Search-based object detection and recognition algorithms, such as AttentionNet based on visual attention, algorithms based on reinforcement learning.

The prior art also has the following defects:

1. the target detection algorithm based on the traditional image processing and machine learning algorithm has the following defects:

(1) when a large-focus monitoring scene is encountered, the difference between the near-end target and the far-end target is very large, and targets with multiple scales exist in the same scene. When the target prediction area is selected, the size and the length-width ratio of the sliding window cannot be effectively set by adopting the sliding window mode, so the exhaustion mode of the sliding window has long time consumption and high redundancy.

(2) In a large-focus monitoring scene, a target is larger when being close to a camera, smaller when being far away from the camera, and larger in target size change, so that the near-end target and the far-end target cannot be accurately identified in the large-focus scene by using a traditional method, and the generalization capability is poor.

2. The target detection and identification method based on deep learning has the following defects:

(1) most of the existing target detection methods based on deep learning use a mode based on fixed anchor regression, when a large-focus monitoring scene is encountered, a plurality of targets with different sizes exist, the fixed anchor cannot effectively take into account the situation that the difference of the sizes of the targets is large, so that a detection network cannot be converged or the quality of a training network is low, and missing detection and false detection of the targets are easily caused.

(2) When the deep learning network is used for target detection, a super-parameter threshold value needs to be set to detect a target, and only when the confidence coefficient of a network prediction target is greater than the set super-parameter threshold value, the prediction frame is considered as the target, so that the super-parameter threshold value has great influence on the detection rate and accuracy of the target, and an empirical value is often set in practical application. However, when the threshold is set to be high, missed detection is caused, and when the threshold is set to be low, false detection is caused, so that the trained network model cannot be fully utilized to realize the identification of the target.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a quick and accurate identification method for a multi-scale target in a large-focus monitoring scene. The method can effectively improve the detectable rate of multi-scale targets under a large-focus monitoring scene through the dynamic anchor, and can effectively improve the accuracy of target detection through a network structure combining segmentation and dynamic anchor detection, thereby effectively improving the overall effect of target identification.

In order to achieve the above object, the technical solution of the present invention is implemented as follows:

a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene comprises the following steps:

1) dynamic anchor setting:

and acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor.

2) Designing a network structure of DANCHORNet:

and designing a target detection branch and a target segmentation branch in the DANCHORNet, and solving the setting of a target detection parameter-exceeding threshold value through the combination of the target detection branch and the segmentation branch.

3) Design the loss function of dankhornet:

a loss function in the training process is optimized through a dynamic weight design scheme, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss.

As the method is adopted, compared with the prior art, the method has the following advantages that:

1. according to the method, the anchor value is dynamically set according to the position of the target, the utilization rate of the anchor in target detection can be effectively improved, large and small targets in a large focal distance scene can be effectively considered, meanwhile, the network is easier to converge, and the detection rate of the multi-scale target in the large focal distance scene is effectively improved.

2, a combination mode of fusion and segmentation branch loss functions is adopted in the DANCHORNet, the difficulty of network training is undoubtedly increased by a fusion and segmentation network, a dynamic weight design scheme is proposed to optimize the loss functions in the training process, a target attention mechanism is fused, and the average probability value of a target area is focused to adjust the total loss. When the average probability value of the target area is higher, the segmentation network is better trained, and the loss contribution of the segmentation network can be reduced. When the average probability value of the target area is lower, the convergence of the segmented network is not good enough, the loss contribution of the segmented network is required to be improved, the training difficulty of the network is reduced, and the network training effect is improved.

3. A new network structure DANCHORNet is provided to improve the target detection effect, a fused and segmented target detection method is provided, segmentation branches are added on the premise that the calculation amount is not increased greatly, and a new network structure DANCHORNet is obtained after the segmentation branches are fused. And calculating an intersection set of the detection targets of the two branches through the DANCHORNet to obtain a final detection result, and when the intersection set of the two branches meets the set requirement, considering the prediction frame as a target. The network structure can avoid setting of a target confidence coefficient super-parameter threshold value in an independent detection method, fully utilizes a network model, and effectively improves the accuracy of target detection.

The invention is further described with reference to the following figures and detailed description.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

fig. 2 shows a merged segmented network structure DAnchorNet according to an embodiment of the present invention.

Detailed Description

Referring to fig. 1 and 2, the invention provides a method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene, which comprises the following steps:

1. dynamic anchor setting:

acquiring training data, performing data fitting on a training target, and acquiring a fitting result of an anchor, wherein the steps are as follows:

(1) obtaining data M (x, y, w, h), M_iFor the ith group of data, x, in the data set_i、y_iIs the coordinates of the upper left corner of the ith target, w_i、h_iAnd recombining the data M (x, y, w, h) to obtain two groups of data M _ h (y, h) and M _ w (y, w) for the length and width of the ith target.

(2) And respectively carrying out data linear fitting on the obtained M _ h (y, h) and M _ w (y, w) to obtain a slope k _ w and an intercept b _ w for M _ w (y, w) fitting and a slope k _ h and an intercept b _ h for M _ h (y, h) fitting.

(3) During network training, the width of the anchor _ w and the height of the anchor _ h are dynamically set through k _ w, b _ w, k _ h and b _ h, and the result is as follows:

where y is the height coordinate of the original image converted by j in grid (i, j) on each featuremap.

2. Designing a network structure of DANCHORNet:

(1) obtaining target detection result R through detection branch_d，R_dIncluding coordinate locations of predicted targetsTo put R_d__x、 R_d__yLength and width R of target_d__w、 R_d__hConfidence of target R_d__conf。

(2) Obtaining a target segmentation result F _ seg through a segmentation branch, wherein the segmentation result comprises two single-channel segmentation graphs F_full_seg、F_interSeg, wherein F_fullSeg is the result of the predictive segmentation of all objects, F_interSeg is the result of segmentation of the adherent part of all objects, by F_full_seg、F_interSeg obtains the individual segmentation result Seg of the final image target.

(3) And performing contour extraction on the obtained segmentation result Seg to further obtain an outer contour rectangle Seg _ Seg _ body of the target, wherein the Seg _ body comprises coordinate positions S _ x and S _ y of the upper left corner of the segmented target, the length and width S _ w and S _ h of the target and the confidence coefficient S _ conf of the target.

(4) Through S _ conf, R_d__confTo obtain the final results R _1, R _2 of the partial targets, which is calculated as shown in equation (4):

(5) calculating an intersection set IOU (input output) among the rest detection results calculated in the step (4) to obtain a target detection result R _3, and setting an intersection set threshold Th_IOU0.7, Seg _ boud obtained by dividing the prediction result with lower confidence coefficient and R obtained by detection_dThe target judgment is carried out in a combined mode, if the two targets are intersected and integrated IOU>Th_IOUObtaining the final target detection result R _3, and calculating as shown in the following formula (5) if IOU<Th_IOUThe current target is discarded.

(6) And (5) merging the R _1, the R _2 and the R _3 obtained in the steps (4) and (5) to obtain a final detection result R _ all.

3. Design the loss function of dankhornet:

(1) obtaining loss L of a detection branch₁The loss function is the loss function of yolo _ v 3.

(2) Obtaining loss L of split branches₂The loss function is a sigmoid loss function, P_i,jObtaining the probability values of all feature maps of the target position Area of a group gateway for the probability values of the i, j positions of the finally segmented feature maps, obtaining the total probability value P if the group gateway has N target boxes and the total Area of the target Area is Area, and further obtaining the average probability value P of the N target areas_avg：

(3) P obtained according to the step (2)_avgThe total loss L is obtained dynamically, and the calculation method is as follows:

in the method, the segmented target detection network structure DANCHorrNet is combined, an original target detection method is optimized, a large target and a small target under a large scene are effectively considered by utilizing a dynamic anchor, and the detection rate of the network under the condition of multi-scale targets is improved; and then, a segmentation network is led out from the detected branch, the setting of the confidence coefficient of an individual target detection network is avoided by combining the two, and the detection rate and the accuracy of the target are effectively improved under the condition of small increase of the calculated amount.

Claims

1. A method for quickly and accurately identifying a multi-scale target in a large-focus monitoring scene comprises the following steps:

1) dynamic anchor setting:

acquiring training data, performing data fitting on a training target, analyzing the characteristic of the anchor through big data fitting, and dynamically setting the value of the anchor;

2) designing a network structure of DANCHORNet:

designing a DANCHORNet network structure, wherein the network structure comprises two branches, one is a target detection branch, the other is a target segmentation branch, the target segmentation branch and the target detection branch share one basic network, and the setting of a target detection hyper-parameter threshold is solved through the combination of the target detection branch and the segmentation branch;

3) design the loss function of dankhornet: