CN112686923A

CN112686923A - Target tracking method and system based on double-stage convolutional neural network

Info

Publication number: CN112686923A
Application number: CN202011632331.9A
Authority: CN
Inventors: 范梦婷; 刘浩; 宋春红; 郑谊峰
Original assignee: Jiaxing Hengyun Data Technology Co ltd; Zhejiang Aerospace Hengjia Data Technology Co ltd
Current assignee: Jiaxing Hengyun Data Technology Co ltd; Zhejiang Aerospace Hengjia Data Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-20

Abstract

The invention relates to a target tracking method and a target tracking system based on a double-stage convolutional neural network. The method comprises the steps of obtaining a video stream in a detection area, and preprocessing the video stream to obtain a multi-frame picture; inputting a plurality of frames of pictures into a pre-trained YOLOv3 target detection improved model for target detection to obtain a frame picture with a detection target; and tracking the detection target of the frame picture with the detection target by using a deepsort multi-target tracking algorithm, and tracking the real-time position of the detection target in the frame picture. The invention adopts the improved YOLOv3 combined with the Deepsort multi-target tracking two-stage convolutional neural network to monitor the target in real time, more accurately obtains the real-time position information of the same moving object, avoids the calculation error of multiple targets and overlapped targets, realizes the real-time tracking of the multiple targets in motion and has high detection accuracy.

Description

Target tracking method and system based on double-stage convolutional neural network

Technical Field

The invention relates to the field of parking detection, in particular to a target tracking method and a target tracking system based on a two-stage convolutional neural network.

Background

Dangerous chemical transport vehicles, referred to as dangerous chemical vehicles for short, are extremely dangerous because the loading and unloading objects are usually inflammable, explosive or extremely toxic substances, and therefore the vehicles need to be parked at specified places strictly according to the management method of dangerous chemical vehicles. The parking device can not be parked randomly in the transportation process, and the temporary parking device cannot be close to places which are possibly harmful, such as open fire, high-temperature places, personnel-intensive places and the like. Aiming at the problem that dangerous chemical substance vehicles are not parked according to the regulations and can cause serious harm in the road transportation process, especially in a chemical industry park,

at present, target detection of a single frame image can be realized based on a simple target detection algorithm, such as a YOLO series algorithm, that is, a position value of a target is detected by a single frame image, but continuous position information of the target in video motion cannot be obtained, on one hand, when the single frame image has multiple targets, position information between multiple targets in front and back frames needs to be carefully calculated, and on the other hand, when the multiple targets are closer to each other, it cannot be confirmed which target the target in front and back frame motion specifically belongs to; the algorithm based on the target detection and multi-target tracking double-stage can realize the real-time tracking of the multi-target object in motion, and has higher accuracy.

The existing NMS method applied to the yolov3 target detection method sorts the suggested boxes from high to low in the process of removing redundant overlapped boxes, then the detection box with the highest score is selected, and when other boxes are obviously overlapped with the selected suggested box, the other boxes are inhibited. The method deletes the target frame with higher overlapping rate, that is, when two target frames are close to each other, the frame with lower score is deleted by mistake because of too large overlapping area.

The Loss function Loss in the YOLOv3 object framework is divided into three parts, one is the error caused by the center coordinates and the width and height, one is the Loss caused by the confidence coefficient, the last one is the Loss caused by the class, and finally the three losses are added to form the final Loss function. And in errors caused by the central coordinate and the width and the height, the loss of the central abscissa, the central ordinate, the width and the height of the target detection frame are added. Because the four losses are not independent from each other and have mutual dependency relationship in actual calculation, the loss calculation method has deviation and cannot accurately reflect the actual loss caused by the center coordinates and the width and height. For example, there may be different center abscissas, center ordinates, widths, and height penalties, with the total center coordinate and width height penalty being the same.

Disclosure of Invention

In order to solve the above-mentioned problems in the prior art, the present invention provides a target tracking method and system based on a two-stage convolutional neural network.

The technical scheme of the invention is as follows: a target tracking method based on a double-stage convolution neural network comprises the following steps:

s1, acquiring a video stream in a detection area, and preprocessing the video stream to obtain a multi-frame picture;

s2, inputting the frames of pictures into a pre-trained YOLOv3 target detection improved model for target detection to obtain a frame of picture with a detection target;

and S3, tracking the detection target of the frame picture with the detection target by using a deepsort multi-target tracking algorithm, and tracking the real-time position of the detection target in the frame picture.

The invention has the beneficial effects that: the invention realizes the target detection tracking function based on the dual-stage target detection tracking model, and utilizes the combination of the improved YOLOv3 target detection model and the depsort multi-target tracking algorithm to form the dual-stage detection tracking algorithm, so that the real-time position information of the same detected target can be more accurately detected for the detected target in the detection area, the calculation error of the real-time position detection of the detected target when a plurality of targets appear in the detection area and the detected targets are overlapped is avoided, the real-time tracking of the plurality of detected targets in motion is realized, the real-time detection accuracy of the target position information is higher, the real-time calculation can be realized, the accuracy is high, the time-consuming and labor-consuming situation of manual monitoring is improved, and the intelligent monitoring of the detection area is realized.

On the basis of the technical scheme, the invention can be further improved as follows.

Further, the method further includes, S4, performing decision alarm according to the detected target real-time location information in the frame picture.

Further, in step S4, if the euclidean distance between the center points of the detection targets in the previous and subsequent frame images is smaller than the preset threshold within the preset time, it is determined that the detection target is in a static state, and an alarm is issued.

The beneficial effects of the above scheme are that, in the preset time, through the detection to the same detection target real-time position in the video stream, judge the current motion state and the position information of the detection target and timely feed back and alarm the current motion state and the position information of the detection target, make things convenient for managers to make corresponding judgement and measure in time, thereby realize the intelligent management and control to the target in the detection area.

Further, in S1, specifically, the method includes acquiring a video stream in the detection area, skipping frames according to a preset frame rate, and obtaining a multi-frame picture from the video stream.

Further, before the step S2, the method further comprises the step of training a YOLOv3 target detection improvement model,

the method for training the improved Yolov3 target detection model specifically comprises the following steps,

acquiring pictures with targets in various scenes, performing data enhancement processing and labeling processing to obtain a data set, and dividing the data set into a training set and a verification set according to a preset proportion;

constructing a YOLOv3 target detection improved model based on a YOLOv3 target detection model;

and performing multiple iterative training on the YOLOv3 target detection improved model by using the training set, verifying the YOLOv3 target detection improved model after each training by using the verification set, outputting the average precision mean value of the verification set, and selecting the YOLOv3 target detection improved model with the largest output average precision mean value as the trained YOLOv3 target detection improved model.

Further, the specific steps of performing the data enhancement processing are,

carrying out horizontal turning or/and vertical turning or/and random cutting or/and random angle rotation or/and contrast adjustment or/and brightness adjustment on the collected pictures with the targets in various scenes to obtain processed pictures with the targets in various scenes;

the specific steps for performing the labeling process are,

and framing the images with the targets in various scenes and the targets in the processed images with the targets in various scenes by using a labeling tool by using a target labeling frame to generate a label information file in an xml format, wherein the label information file records the upper left corner coordinate and the lower right corner coordinate of the target labeling frame.

The method has the advantages that the frame pictures subjected to labeling processing and data enhancement processing are adopted to construct the data set, the improved YOLOv3 target detection model obtained by training the YOLOv3 target detection model by utilizing the training set has better generalization capability, the target detection can be performed on the pictures with different angles and different types in various scenes, and the situation that the pictures are diversified due to various inefficients in practical application can be effectively avoided, so that the error of the improved YOLOv3 target detection model on the target detection even cannot detect the target.

Further, constructing a YOLOv3 target detection improved model based on a YOLOv3 target detection model, specifically, improving the YOLOv3 target detection model to obtain the YOLOv3 target detection improved model; wherein, the YOLOv3 target detection model is improved, and comprises the following three improvement points,

the improvement point is as follows: improving the input size of the YOLOv3 target detection model;

the improvement point two: replacing the loss caused by the center coordinates of the target detection frame, the width of the target detection frame and the height of the target detection frame in the loss function of the YOLOv3 target detection model by CIoU, wherein the improved CIoU formula is as follows;

wherein L is_CIoUFor the loss function of the Yolov3 target detection model, IoU is the intersection ratio between the target detection frame and the target labeling frame, ρ () is the function for calculating the Euclidean distance between two central points, b and b^gtRespectively representing the central points of the target detection frame and the target labeling frame, c representing the diagonal distance of the minimum closure area capable of simultaneously containing the target detection frame and the target labeling frame, a being the weight function of a YOLOv3 target detection model, v being the similarity of the aspect ratio, w^gtMarking the width, h, of the box for the target^gtMarking the height of the target frame, w is the width of the target detection frame, and h is the height of the target detection frame;

the improvement point is three: in the process of removing overlapped target detection frames by using Soft-NMS in the YOLOv3 target detection model, performing Soft-NMS operation according to the intersection ratio of the target detection frames and the target marking frames, performing bounding box suppression on the target detection frames with the reliability greater than a second preset threshold value, reducing the confidence of the target detection frames, and outputting the final target detection frames and types.

The scheme has the advantages that the detection algorithm and the multi-target tracking algorithm are combined to form a double-stage detection tracking algorithm, and the Soft-NMS algorithm is used for removing redundant overlapped frames in the detection process of the target detection algorithm, so that the recall rate of target detection is improved under the condition that the complexity of the algorithm is not increased, and the average accuracy rate of overlapped targets is obviously improved; the target detection frames with higher overlapping rate are mistakenly deleted because the overlapping area of the target detection frames with the detection targets is overlarge when the two target detection frames are close to each other, so that the tracking detection of the real-time position information of a plurality of moving objects in the detection area is more accurately realized, and the calculation error of the real-time position detection of the detection targets when a plurality of targets appear in the detection area and the detection targets are overlapped is avoided. In addition, the CIOU calculation method is used as a loss function of the target detection algorithm, so that the loss calculation of the target detection is more accurate, and the accuracy of the target detection is improved.

Based on the target tracking method based on the double-stage convolutional neural network, the invention also provides a target tracking system based on the double-stage convolutional neural network.

A target tracking system based on a double-stage convolution neural network comprises the following modules,

the frame picture acquisition module is used for acquiring a video stream in a detection area and preprocessing the video stream to obtain a plurality of frame pictures;

the target detection module is used for inputting a plurality of frames of the frame pictures into a pre-trained Yolov3 target detection improved model for target detection to obtain a frame picture with a detection target;

and the target tracking module is used for tracking the detection target of the frame picture with the detection target by using a deepsort multi-target tracking algorithm and tracking the real-time position of the detection target in the frame picture.

The invention has the beneficial effects that: the invention realizes the parking detection function of the detected target based on the target detection tracking model with two stages, utilizes the combination of the improved target detection algorithm and the multi-target tracking algorithm to form the detection tracking algorithm with two stages, can more accurately detect the real-time position information of the same detected target for the detected target in the detection area, also avoids the calculation error of the real-time position detection of the detected target when a plurality of targets appear in the detection area and the detected targets are overlapped, also realizes the real-time tracking of a plurality of detected targets in motion, has higher real-time detection accuracy for the target position information, can realize the real-time calculation and high accuracy, improves the situation that manual monitoring wastes time and labor, and realizes the intelligent monitoring of the detection area.

And the decision alarm module is used for carrying out decision alarm according to the real-time position information of the detected target in the frame picture.

Based on the target tracking method based on the double-stage convolutional neural network, the invention also provides a computer readable storage medium.

A computer-readable storage medium comprising a memory, in which a computer program is stored, which, when executed by a processor, implements the above-described target tracking based on a two-stage convolutional neural network.

Drawings

FIG. 1 is a flow chart of a target tracking method based on a two-stage convolutional neural network according to the present invention;

FIG. 2 is a flow chart of training a Yolov3 target detection improvement model;

FIG. 3 is a schematic diagram illustrating the principle of normalizing the center points of the same detection target in the previous and subsequent frames;

fig. 4 is a structural block diagram of a target tracking system based on a two-stage convolutional neural network according to the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, a target tracking method based on a dual-stage convolutional neural network includes the following steps:

The invention realizes the target detection tracking function based on the dual-stage target detection tracking model, and utilizes the combination of the improved YOLOv3 target detection model and the depsort multi-target tracking algorithm to form the dual-stage detection tracking algorithm, thereby not only more accurately detecting the real-time position information of the same moving object, but also avoiding the calculation error of the real-time position detection of the detected objects when a plurality of targets appear in the detection area and the detected targets are overlapped, realizing the real-time tracking of the multi-target object in motion, having higher real-time detection accuracy of the target position information, realizing the real-time calculation and high accuracy, improving the time-consuming and labor-consuming situation of manual monitoring, and realizing the intelligent monitoring of the detection area.

In this particular embodiment:

specifically, S1 is to obtain a video stream in the detection area, skip frames according to a preset frame rate, and obtain a multi-frame picture from the video stream.

For example, when a dangerous chemical transport vehicle in a chemical industry park is subjected to target detection and tracking, the image acquisition device is arranged and installed at a position where the image acquisition device can completely acquire a video stream of a detection target in the whole detection area, and the video stream is subjected to frequency hopping processing according to a preset frame rate to obtain a multi-frame image in the whole detection area. For example, cameras are arranged opposite to parking spaces of an important road and a chemical industry park, the cameras are used for acquiring video streams of the important road and the chemical industry park cameras, and the acquired video streams are subjected to frame skipping processing according to a proper frame rate to obtain multi-frame images.

In this particular embodiment:

before the step S2, the method further comprises the step of training a YOLOv3 target detection improvement model,

as shown in fig. 2, training the YOLOv3 target detection improvement model specifically includes the following steps,

The specific steps for performing the data enhancement processing are,

collecting pictures with targets in various scenes, for example, collecting pictures with targets at different time, different weather and different angles for a parking lot gate and a camera of a peripheral road; the data enhancement processing is carried out on the pictures to expand data, data expansion can also be carried out on detection target pictures on a collection and search website, an improved YOLOv3 target detection model obtained by carrying out iterative training by utilizing frame images after data expansion to construct a data set has strong generalization capability, the YOLOv3 target detection improved model can carry out target detection on frame pictures with different angles and different types under various scenes, and the problems that in the actual application process, the frame pictures in the video stream are blurred, overturned, inclined and the like due to various inefficients can be effectively avoided, so that the YOLOv3 improved target detection model can not detect target detection errors and even targets.

The specific steps for performing the labeling process are,

using a labeling tool LabelImg to frame out the pictures with targets in various scenes and the targets in the processed pictures with targets in various scenes by using a target labeling frame, generating a label information file in an xml format, recording the coordinates (x1, y1) of the upper left corner and the coordinates (x2, y2) of the target labeling frame in the label information file, not labeling the detected targets with the resolution lower than 40x40 pixels, and deleting the frame images with the resolution lower than 600x 600.

Constructing a YOLOv3 target detection improved model based on a YOLOv3 target detection model, specifically, improving the YOLOv3 target detection model to obtain the YOLOv3 target detection improved model; wherein, the YOLOv3 target detection model is improved, and comprises the following three improvement points,

because the dangerous chemical vehicle target is larger, 256 × 256 input sizes are used, the calculation amount can be reduced, and in the improved Yolov3 target detection model, the feature maps output by the third, fourth and fifth downsampling layers are respectively and sequentially fused to obtain the multi-size feature maps with the feature sizes of 8 × 8, 16 × 16 and 32 × 32. The calculated amount of the target detection model in the process of carrying out target detection on the frame image and obtaining the confidence coefficients of the target detection frame and the target detection frame is reduced, and the efficiency of the target detection model for detecting the vehicle can be improved.

wherein L is_CIoUFor the loss function of the Yolov3 target detection model, IoU is the intersection ratio between the target detection frame and the target labeling frame, ρ () is the function for calculating the Euclidean distance between two central points, b and b^gtRespectively representing a target detection frame and a target labelThe center point of the frame, c represents the diagonal distance of the minimum closure area which can contain the target detection frame and the target marking frame simultaneously, a is the weight function of the YOLOv3 target detection model, v is the aspect ratio similarity, w^gtMarking the width, h, of the box for the target^gtMarking the height of the target frame, w is the width of the target detection frame, and h is the height of the target detection frame;

the Loss function Loss in the YOLOv3 target detection model is divided into three parts, one is error caused by center coordinates and width and height, one is Loss caused by confidence, the last is Loss caused by class, and finally the three losses are added to form the final Loss function. And in errors caused by the central coordinate and the width and the height, the loss of the central abscissa, the central ordinate, the width and the height of the target detection frame are added. Because the four losses are not independent from each other and have mutual dependency relationship in actual calculation, the loss calculation method has deviation and cannot accurately reflect the actual loss caused by the center coordinates and the width and height. For example, there may be different center abscissas, center ordinates, widths, and height penalties, with the total center coordinate and width height penalty being the same. The IOU is an index combining the central coordinate and the width and the height, so that the IOU index is more accurate as a loss function of the network, but the IOU is directly used as the loss function without considering the conditions of the coverage area of the detection frame, the distance of a central point and the shape of the prediction frame. Therefore, the invention adopts CIoU (improved IOU) as a loss function of target detection, and the loss of the target detection box is calculated more accurately. On the basis of the IOU, the method comprises the conditions that a target detection frame and a target marking frame are not intersected, the condition of the specific position of the IOU and the aspect ratio problem.

The improvement point is three: in the process of removing overlapped target detection frames by using Soft-NMS in the YOLOv3 target detection model, performing Soft-NMS operation according to the intersection ratio of the target detection frames and the target marking frames, performing bounding box suppression on the target detection frames with the reliability greater than a second preset threshold value, reducing the confidence of the target detection frames, and outputting the final target detection frames and types;

removing overlapped frames of the same target detection in the target detection process by using Soft-NMS, carrying out Soft-NMS operation according to the intersection ratio (IOU, which represents the intersection ratio between the predicted boundary frame and the actual boundary frame) of the predicted boundary frame and the actual boundary frame, carrying out boundary frame suppression on the condition that the confidence coefficient of the predicted boundary frame is smaller than a given threshold value, and outputting the final predicted boundary frame and the final type; the Soft-NMS operation sorts the detection frames according to scores, then only keeps the frame with the highest score, and simultaneously deletes other frames with the overlapping area larger than a certain proportion; the Soft-NMS idea is that all detection boxes with IOU larger than the threshold value are not deleted directly, but the confidence coefficient is reduced;

wherein, b_iFor the target detection box to be processed, M is the target detection box with the highest current score, N_tRepresents a threshold value;

when IoU is less than or equal to the threshold N_tWhen the detection score is s_i(ii) a Greater than threshold N at IoU_tWhen the detection score is s_i(1-IOU(M，b_i))。b_iThe larger IoU the sum M, b_iScore s of_iThe more severe the drop; s_iThe smaller, b_iScore s of_iThe more severe the drop. The method has the advantages that overlapping frames in the target detection process are removed by using Soft-NMS, generated frames can be removed better, overlapping frames which are not removed are reserved, the problem of missed detection when two or more targets are overlapped with each other is solved, and the detection recall rate is improved.

The detection algorithm and the multi-target tracking algorithm are combined to form a double-stage detection tracking algorithm, and a Soft-NMS algorithm is used for removing redundant overlapping frames in the detection process of the target detection algorithm, so that the recall rate of target detection is improved under the condition that the complexity of the algorithm is not increased, and the average accuracy rate of the overlapped targets is obviously improved; the target detection frames with higher overlapping rate are mistakenly deleted because the overlapping area of the target detection frames with the detection targets is overlarge when the two target detection frames are close to each other, so that the tracking detection of the real-time position information of a plurality of moving objects in the detection area is more accurately realized, and the calculation error of the real-time position detection of the detection targets when a plurality of targets appear in the detection area and the detection targets are overlapped is avoided. In addition, the CIOU calculation method is used as a loss function of the target detection algorithm, so that the loss calculation of the target detection is more accurate, and the accuracy of the target detection is improved.

In the process of training and verifying the Yolov3 target detection improvement model, all training sets are subjected to at least 100 forward propagations and 100 backward propagations in the target detection network, and each time 1 forward propagation and 1 backward propagation are performed in all the training set target detection networks, verification is performed through the verification set. And verifying the YOLOv3 target detection improved model after each training by using the verification set, namely calculating the average precision mean mAP of the target detection model after the training is finished, and finally selecting the network model weight with the maximum average precision mean mAP.

In this particular embodiment:

the method of the invention also comprises S4, and decision alarm is carried out according to the real-time position information of the detected target in the frame picture.

Specifically, in S4, if the euclidean distance between the central points of the detection targets in the previous and subsequent frame images is smaller than the preset threshold within the preset time, it is determined that the detection target is in a static state, and an alarm is issued.

Specifically, the frame picture is input into a pre-trained YOLOv3 target detection improved model to obtain the frame picture of the detection target framed by the target detection frame and the position information of the detection target, the frame picture of the detection target framed by the target detection frame is input into a depsort multi-target tracking algorithm for tracking the detection target, and the frame picture of the detection target framed by the target detection frame is marked. And judging whether the detection targets of the front frame and the rear frame in the video stream are the same detection target or not according to the marks within preset time, comparing the central position of the detection target of the frame image with the central position distance of the detection target of the previous frame when the detection target mark in the frame image is the same as the detection target mark of the previous frame image in the video stream, judging whether the detection target is in a static state or not, if the central position of the detection target in the detection area and the central position distance of the detection target of the previous frame in the detection area are smaller than a first threshold value, judging that the detection target is in the static state, sending an alarm, and sending the alarm information to a background management system.

Specifically, as shown in fig. 3, when the target mark in the frame image is the same as the detection target mark of the previous frame image in the surveillance video stream, after normalizing the detection target center point and the same detection target center point of the previous frame with respect to the detection target, the euclidean distance between the detection target center point and the same detection target center point of the previous frame is calculated, and when the euclidean distance s between the detection target center point and the same detection target center point of the previous frame is smaller than the preset threshold e, it indicates that the vehicle has a parking behavior. The specific calculation formula of the euclidean distance s between the center point of the detection target and the same detection target center point of the previous frame is as follows,

wherein w is the width of the target detection frame, h is the height of the target detection frame, and x_iAbscissa value, y, representing the center point of the ith frame of picture_iOrdinate value, x, representing the center point of the ith frame of picture_i-1Abscissa value, y, representing the center point of the i-1 th frame picture_i-1And (3) a longitudinal coordinate value representing the central point of the i-1 th frame of picture, wherein the threshold e is determined according to the specific situation of the site in different scenes, and if the site is narrower, the threshold e is smaller, or the road condition of the site is poorer, or a vehicle has certain potential safety hazard.

In the preset time, the current motion state and the position information of the detection target are judged and fed back in time to give an alarm through the detection of the real-time position of the same detection target in the video stream, so that management personnel can make corresponding judgment and measures in time, and the intelligent management and control of the target in the detection area are realized.

The invention is illustrated below by taking the vicinity of a parking lot of a certain dangerous chemical vehicle as an example.

Constructing a data set; the method comprises the steps of collecting 2000 dangerous chemical vehicle pictures at different time, different weather and different angles by using a dangerous chemical vehicle parking lot gate and a peripheral road camera, carrying out operations of horizontal and vertical overturning, random cutting, random angle rotation, image contrast and brightness change and the like on the pictures to expand the data to 4000, collecting 4000 dangerous chemical vehicle pictures on a search website, and constructing a data set. Manually labeling the data set by using labeling tool LabelImg software, framing out a rectangular frame for a dangerous chemical vehicle in the picture, not labeling the dangerous chemical vehicle with an excessively small target (less than 40x40 pixels), deleting the picture with the resolution ratio of less than 600x600, and performing labeling on all the pictures which are labeled and meet the standard according to the following steps of 8: 2 into a training set and a verification set, wherein the images acquired by the camera, the data set after the expansion and the network images are divided according to the ratio of 1: the scale of 1 is divided into a training set and a validation set. After marking is finished, a marking information file in an xml format is generated, wherein the marking information file comprises category information, picture length and width, and coordinates of each marked dangerous chemical vehicle, including the coordinates (x1, x2) of the upper left corner and the coordinates (x2, y2) of a target rectangular frame;

constructing an improved YOLOv3 target detection network; the invention adopts CIOU calculation method to replace the loss caused by the center coordinate and width and height in the loss function of target detection, Soft-NMS calculation method removes the overlapping frame in the target detection process, uses Darkenet53 as the main network frame to extract the characteristics, and adopts multi-scale prediction method to predict on the characteristic graphs with the sizes of 13 × 13, 26 × 26 and 52 × 52 respectively. And continuously performing convolution operation on the feature maps with different scales, performing tensor splicing on the feature maps obtained by the upper sampling layer and the previous layer, performing target detection and position regression on different feature maps after the convolution operation, and finally outputting coordinate and category results through a Yolov3 detection layer. The CIOU and Soft-NMS calculation method improves the accuracy of a target detection algorithm and also improves the missing detection rate of target vehicle overlapping.

Training a model; training the improved YOLOv3 target detection network by using a training set, training 100 epochs, verifying in a verification set each time, calculating mAP (the larger the mAP is, the better the algorithm is represented in the verification set), and finally selecting the network model weight with the maximum mAP value.

Detecting a target; and performing target detection on the frame picture to be detected by adopting a trained improved YOLOv3 target detection network.

Tracking a target; inputting the target detection result into a Deepsort target tracking algorithm, and outputting the result (position information of the dangerous chemical vehicle and statistical information of the dangerous chemical vehicle), wherein the confidence degree P related to the algorithm₀0.5, the IOU threshold is 0.5;

parking detection; and outputting the detected real-time position information of the dangerous chemical vehicle by the target detection and target tracking algorithm model, calculating the central position distance of the target between the front frame and the rear frame of the video, judging whether the dangerous chemical vehicle is in a parking behavior, and if the central position distance of the front frame and the rear frame of the dangerous chemical vehicle in 100 frames is smaller than a certain threshold value e, determining that the dangerous chemical vehicle is in the parking behavior, and sending parking alarm information. Where the threshold e is set to 0.03.

In the present invention:

YOLOv 3: a target detection algorithm based on a convolutional neural network is characterized in that: under the same hardware condition, the detection speed is high and the accuracy is high.

An Epoch: a complete data set passes through the neural network once and back once, a process called an Epoch. That is, all training samples have been forward propagated and backward propagated in the neural network; in another colloquial point, an Epoch is the process of training all training samples once.

mAP: mean Average Precision is a general metric for evaluating target detection.

Deepsort: a multi-target tracking algorithm based on a convolutional neural network can realize real-time tracking and has high accuracy.

As shown in fig. 4, a target tracking system based on a dual-stage convolutional neural network includes the following modules,

The invention realizes the parking detection function of the detected target based on the target detection tracking model with two stages, utilizes the combination of the improved target detection algorithm and the multi-target tracking algorithm to form the detection tracking algorithm with two stages, can more accurately detect the real-time position information of the same detected target for the detected target in the detection area, also avoids the calculation error of the real-time position detection of the detected target when a plurality of targets appear in the detection area and the detected targets are overlapped, also realizes the real-time tracking of a plurality of detected targets in motion, has higher real-time detection accuracy for the target position information, can realize the real-time calculation and high accuracy, improves the situation that manual monitoring wastes time and labor, and realizes the intelligent monitoring of the detection area.

In this particular embodiment: the system also comprises a decision alarm module which is used for carrying out decision alarm according to the real-time position information of the detected target in the frame picture.

The invention has the following effects:

1) based on an improved YOLOv3 target detection algorithm in deep learning, the invention trains and obtains an algorithm model capable of detecting the dangerous vehicles in video stream in real time by utilizing a large number of dangerous vehicle pictures acquired by cameras on the network and on actual roads and parks, and realizes the parking detection function of the dangerous vehicles based on the real-time position information of the dangerous vehicles, thereby improving the situation that manual monitoring is time-consuming and labor-consuming, realizing intelligent monitoring of the roads and parks, giving an alarm to the parking behavior of the dangerous vehicles entering non-parking areas such as chemical parks and the like, and enhancing the safety of the parks;

2) various data enhancement means are utilized to enhance the collected data set of the dangerous chemical vehicle, so that the target detection model has better generalization capability;

3) removing redundant overlapping frames in the detection process of the target detection algorithm by using a Soft-NMS algorithm, so that the recall rate of the detection is improved and the average accuracy of the overlapped objects is obviously improved under the condition that the complexity of the algorithm is not increased;

4) the loss caused by the center coordinate and the width and the height in the loss function of the target detection is replaced by the CIOU calculation method, the loss description of the target detection is more accurate, and the accuracy of the target detection is improved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A target tracking method based on a double-stage convolution neural network is characterized in that: comprises the following steps of (a) carrying out,

2. The target tracking method based on the dual-stage convolutional neural network of claim 1, wherein: also comprises the following steps of (1) preparing,

and S4, performing decision alarm according to the real-time position information of the detected target in the frame picture.

3. The target tracking method based on the dual-stage convolutional neural network of claim 2, wherein: specifically, in S4, if the euclidean distance between the central points of the detection targets in the previous and subsequent frame images is smaller than the preset threshold within the preset time, it is determined that the detection target is in a static state, and an alarm is issued.

4. The target tracking method based on the dual-stage convolutional neural network as claimed in any one of claims 1 to 3, wherein: specifically, S1 is to obtain a video stream in the detection area, skip frames according to a preset frame rate, and obtain a multi-frame picture from the video stream.

5. The target tracking method based on the dual-stage convolutional neural network as claimed in any one of claims 1 to 3, wherein: before the step S2, the method further comprises the step of training a YOLOv3 target detection improvement model,

6. The target tracking method based on the dual-stage convolutional neural network of claim 5, wherein: the specific steps for performing the data enhancement processing are,

the specific steps for performing the labeling process are,

7. The target tracking method based on the dual-stage convolutional neural network of claim 6, wherein: constructing a YOLOv3 target detection improved model based on a YOLOv3 target detection model, specifically, improving the YOLOv3 target detection model to obtain the YOLOv3 target detection improved model; wherein, the YOLOv3 target detection model is improved, and comprises the following three improvement points,

the improvement point two: replacing the loss caused by the center coordinates of the target detection frame, the width of the target detection frame and the height of the target detection frame in the loss function of the YOLOv3 target detection model by CIoU, wherein the CIoU formula is as follows;

8. A target tracking system based on a two-stage convolution neural network is characterized by comprising the following modules,

9. The dual stage convolutional neural network-based target tracking system of claim 8, wherein: the system also comprises a decision alarm module which is used for carrying out decision alarm according to the real-time position information of the detected target in the frame picture.

10. A computer-readable storage medium characterized by: comprising a memory in which a computer program is stored which, when being executed by a processor, carries out the method of target tracking based on a two-stage convolutional neural network of any one of claims 1 to 7.