CN116740753A - Target detection and tracking method and system based on improved YOLOv5 and deep SORT - Google Patents

Target detection and tracking method and system based on improved YOLOv5 and deep SORT Download PDF

Info

Publication number
CN116740753A
CN116740753A CN202310424367.5A CN202310424367A CN116740753A CN 116740753 A CN116740753 A CN 116740753A CN 202310424367 A CN202310424367 A CN 202310424367A CN 116740753 A CN116740753 A CN 116740753A
Authority
CN
China
Prior art keywords
target
tracking
personnel
helmet
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310424367.5A
Other languages
Chinese (zh)
Inventor
郝喆
崔宇鑫
王子皓
孙立龙
王珍亦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202310424367.5A priority Critical patent/CN116740753A/en
Publication of CN116740753A publication Critical patent/CN116740753A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a target detection and tracking method and system based on improved YOLOv5 and deep SORT, wherein the method comprises the steps of obtaining a target image to be detected, preprocessing, dividing a preprocessed image set by a processed image according to a ratio of 6:2:2 to obtain a training set, a verification set and a test set; constructing a target detection model by improving YOLOv 5; and detecting the target vehicle of the current frame through the trained target detection model to obtain the helmet wearing condition and position information of the current frame riding electric vehicle personnel. According to the application, the detection precision is improved by YOLOv5, the condition that a plurality of frames of tracked personnel wear a helmet is detected by deep SORT for comprehensive analysis, the condition of missed detection and false detection possibly existing in a single frame due to the problems of image blurring, mutual shielding among multiple targets and the like is reduced, the expected track can be accurately tracked, and the accurate detection and tracking of the targets are realized.

Description

Target detection and tracking method and system based on improved YOLOv5 and deep SORT
Technical Field
The application relates to the technical field of image recognition, in particular to a target detection and tracking method and system based on improved YOLOv5 and deep sort.
Background
In recent years, the target detection technology has been greatly developed and innovated, and the target detection also often plays a role in information acquisition in our lives, and has wide application in the fields of unmanned driving, urban traffic and the like. With the continuous promotion and development of urban traffic, urban traffic safety is always a social concern, and according to information of industry and informatization department, the passenger helmet for motorcycles and electric bicycles (GB 811-2022, hereinafter referred to as new standard) published by the year 2022, month 12 and day 1 is implemented in the year 2023, month 7 and day 1. The electric bicycle helmet aims at standardizing and improving the quality standard and the safety performance of the electric bicycle passenger helmet and guaranteeing the traffic safety of cyclists. Meanwhile, in recent years, the number of casualties caused by driving an electric vehicle without wearing a helmet is increasing year by year.
At present, for targets with different sizes, a good effect is difficult to obtain by a traditional target detection algorithm, but some advanced target detection algorithms have the problems of large calculation amount and low speed, and are difficult to meet the requirements of certain real-time applications. In the actual scene of target tracking, the background is always changed due to light change or other interference factors, which makes the model in the target tracking technology easy to be interfered, thereby affecting the tracking effect; during the movement of the target, the target may be blocked by other objects, which makes it difficult for the tracker to track the target. Therefore, the application provides a target detection and deep SORT-based target tracking method based on YOLOv5, which comprehensively judges personnel who wear helmets without regulation through detection and tracking.
Disclosure of Invention
The present application is directed to a method and a system for target detection and tracking based on improved YOLOv5 and deep sort, which solve the above-mentioned problems in the prior art.
In order to achieve the above purpose, the present application provides the following technical solutions: a method for improved YOLOv5 and deep sort based target detection and tracking, comprising the steps of:
acquiring a target image to be detected, preprocessing, dividing a preprocessed image set by the processed image according to the ratio of 6:2:2, and obtaining a training set, a verification set and a test set;
constructing a target detection model by improving YOLOv 5;
detecting a target vehicle in the current frame through the trained target detection model to obtain the helmet wearing condition and position information of the current frame riding electric vehicle personnel;
detecting and tracking target personnel in the video through a deep SORT algorithm, and comprehensively judging whether the tracked personnel wear the helmet correctly according to the helmet wearing conditions of the detected personnel in a plurality of frames in the tracking process;
and detecting and tracking the personnel wearing the helmet in the traffic video without regulation through a target detection and tracking model of YOLOv5 and deep SORT.
Preferably, the obtaining the target image to be detected and preprocessing, dividing the preprocessed image set by the processed image according to the ratio of 6:2:2, to obtain a training set, a verification set and a test set, including:
obtaining an image of a target to be detected, splicing 4 pictures by adopting a Mosaic method in a random scaling, random cutting and random arrangement mode, dividing the processed pictures into preprocessed image sets according to a ratio of 6:2:2, and obtaining a training set, a verification set and a test set.
Preferably, the splicing the 4 pictures by random scaling, random cutting and random arrangement in the image by using the mosaics method includes:
randomly selecting reference point coordinates (xc, yc) of picture stitching, and randomly selecting four pictures;
the four pictures are respectively placed at the left upper part, the right upper part, the left lower part and the right lower part of the large picture with the specified size after being subjected to size adjustment and scaling according to the datum point;
mapping relation is corresponding to the picture label according to the size conversion mode of each picture;
and splicing the large images according to the designated abscissa and ordinate, and processing the coordinates of the detection frame exceeding the boundary.
Preferably, in the construction of the target detection model by improving YOLOv5, the target detection model comprises an input end, a backbone network, a feature fusion network and a detector network;
the input end comprises Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling;
the backbone network comprises a focus+CPS structure and adopts a CSP2 structure;
the feature fusion network adopts a structure of combining FPN and PAN;
the detector network uses diou_loss instead of IoU _loss and diou_nms instead of NMS.
Preferably, the detecting and tracking the target person in the video through the deep sort algorithm comprehensively judges whether the tracked person wears the helmet correctly according to the helmet wearing condition of the detected person in a plurality of frames in the tracking process, and includes:
detecting and tracking a target person in a video through a deep SORT algorithm, predicting the position information of the target person at the next moment by using a Kalman filter, performing cascade matching and IoU matching on a predicted track and a detection result by using a Hungary algorithm, comprehensively judging the matching degree between the predicted track and the detection result, and completing tracking matching of the target person.
The application also provides a system for improved YOLOv5 and deep sort based target detection and tracking, comprising:
the target detection stage module is used for acquiring an image and carrying out target detection on the image;
a target tracking stage module for tracking whether the matched target wears a helmet;
and the information feedback stage module is used for recording the target video, collecting evidence and integrating image information for uploading and feedback.
Preferably, the target detection stage module includes:
the target detection state unit is used for obtaining target detection results including the positions of detected electric bicycle riding personnel and helmet positions after judging whether the electric bicycle is an electric bicycle or not through target detection of the current frame image, and determining that the detected personnel is not correctly wearing helmet personnel if the detected personnel is detected to be not coincident with the helmet positions;
the information association state unit is used for determining the position of the detected personnel detection frame and the position of the detected helmet detection frame according to the position information of the target personnel and the helmet in the current frame, calculating the proportion of the overlapping area of the personnel detection frame and the helmet detection frame to the personnel frame area, and if the proportion is larger than a threshold value, matching the personnel detection frame with the helmet detection frame to obtain the condition that the detected personnel wears the helmet;
the sample separation state unit is used for separating target personnel with a tracker from the target detection result of the information association state unit so as to construct a corresponding tracking system, and the tracker is built for the detected personnel who wear the helmet incorrectly, so that the follow-up tracking is facilitated, and the position information, the appearance characteristics and the like of the detected personnel are fed back in time;
the target tracking stage module comprises:
the target prediction state unit is used for obtaining a prediction track of the position information of the detected person at the next moment through Kalman filter processing;
the target matching state unit is used for matching the predicted track obtained through the Kalman filter with the detection track by using the Hungary algorithm and judging the matching degree between the predicted track and the detection track, so that tracking matching of detected personnel is completed;
the target checking state unit is used for checking whether target personnel wear the helmet correctly in continuous frames, so that the conditions of missed checking and wrong checking are reduced, and the checking state unit can also be used as a checking basis;
the target judging state unit is used for integrating the helmet wearing conditions of the detected personnel in all frames in the target tracking process and judging whether the detected personnel wear the helmet correctly or not;
the information feedback stage module comprises:
the illegal recording state unit is used for automatically recording videos so as to record evidence that detected personnel wear the helmet incorrectly, and feeding back image related information to the background together so as to help traffic managers evaluate bad behaviors and process illegal behaviors;
and the violation alarm state unit is used for triggering the alarm device according to the target analysis result under the condition that the detected personnel does not wear the helmet correctly so as to remind traffic management personnel to process.
The application also provides an electronic device, which is entity equipment, comprising:
the device comprises a processor and a memory, wherein the memory is in communication connection with the processor;
the memory is used for storing executable instructions executed by at least one of the processors, and the processor is used for executing the executable instructions to realize the method for detecting and tracking the target based on the improved YOLOv5 and deep sort.
The present application also provides a computer readable storage medium having stored therein a computer program which when executed by a processor implements the method of improved YOLOv5 and deep sort based target detection and tracking as described above.
Compared with the prior art, the application has the beneficial effects that:
according to the application, the detection precision is improved by improving the YOLOv5, the multiple frames of the situation that a tracked person wears the helmet is detected by the deep SORT for comprehensive analysis, the condition of missed detection and false detection possibly existing in a single frame due to the problems of image blurring, mutual shielding among multiple targets and the like is reduced, the expected track can be accurately tracked, and the accurate detection and tracking of the targets are realized.
Drawings
FIG. 1 is a main flow chart of a method for detecting and tracking targets based on improved Yolov5 and DeepSORT according to an embodiment of the present application;
FIG. 2 is a flow chart of a system for improved YOLOv5 and deep SORT based target detection and tracking according to an embodiment of the present application;
fig. 3 is a diagram showing the implementation result of a method for detecting and tracking targets based on improved YOLOv5 and deep sort according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The main execution body of the method in this embodiment is a terminal, and the terminal may be a device such as a mobile phone, a tablet computer, a PDA, a notebook or a desktop, but of course, may be another device with a similar function, and this embodiment is not limited thereto.
Referring to fig. 1, the present application provides a method for detecting and tracking targets based on improved YOLOv5 and deep sort, the method is applied to the helmet wearing detection of electric vehicle personnel, and comprises the following steps:
step 101, obtaining an image of a target to be detected, preprocessing, and dividing the preprocessed image set by the processed image according to the ratio of 6:2:2 to obtain a training set, a verification set and a test set.
Specifically, a target image to be detected is obtained, 4 pictures are spliced by adopting a Mosaic method through a random scaling, random cutting and random arrangement mode, and the preprocessed pictures are divided into preprocessed image sets according to a ratio of 6:2:2, so that a training set, a verification set and a test set are obtained.
The method for splicing 4 pictures of the image by adopting the mosaics method comprises the following steps of:
randomly selecting reference point coordinates (xc, yc) of picture stitching, and randomly selecting four pictures;
the four pictures are respectively placed at the left upper part, the right upper part, the left lower part and the right lower part of the large picture with the specified size after being subjected to size adjustment and scaling according to the datum point;
mapping relation is corresponding to the picture label according to the size conversion mode of each picture;
and splicing the large images according to the designated abscissa and ordinate, and processing the coordinates of the detection frame exceeding the boundary.
Step 102, constructing a target detection model by improving YOLOv 5.
Specifically, the target detection model comprises an input end, a backbone network, a feature fusion network and a detector network;
the input end comprises Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling;
the backbone network comprises a focus+CPS structure and adopts a CSP2 structure;
the feature fusion network adopts a structure of combining FPN and PAN;
the detector network adopts DIoU_Loss to replace IoU _Loss, and uses DIoU_NMS to replace NMS, so that the method can enrich characteristic information, increase small targets, enhance characteristic fusion capability, reduce calculation amount, enhance detection robustness, and achieve the expectations of reducing omission ratio and improving detection speed and accuracy.
And 103, detecting the target vehicle of the current frame through the trained target detection model, and obtaining the helmet wearing condition and position information of the current frame riding electric vehicle personnel.
And 104, detecting and tracking target personnel in the video through a deep SORT algorithm, and comprehensively judging whether the tracked personnel wear the helmet correctly according to the helmet wearing conditions of the detected personnel in a plurality of frames in the tracking process.
Specifically, detecting and tracking a target person in a video through a deep SORT algorithm, predicting position information of the target person at the next moment by using a Kalman filter, cascade matching and IoU matching are performed on a predicted track and a detection result by using a Hungary algorithm, the matching degree between the predicted track and the detection result is comprehensively judged, tracking and matching of the target person are completed, a more reliable weighted value of a Markov distance and a cosine distance is used as a correlation measure during cascade matching, and appearance and motion information are integrated into a matching strategyIn the abbreviation, x is used 2 The distributed threshold value is used for judging whether the matching is successful or not, and the ID Switch times of the occurrence of the blocked and then reappeared target are reduced.
Step 105, detecting and tracking the personnel wearing the helmet in the traffic video without being regulated through the target detection and tracking models of YOLOv5 and deep sort.
In the embodiment, the detection accuracy is improved by improving the YOLOv5, and the detection conditions of a plurality of frames of tracked personnel wearing helmets are detected by deep SORT to carry out comprehensive analysis, so that the detection omission and false detection conditions possibly caused by problems of image blurring, mutual shielding among multiple targets and the like of a single frame are reduced.
For better understanding of the above embodiments, please refer to fig. 3, fig. 3 is a result diagram of a method for detecting and tracking an object based on improved YOLOv5 and deep sort according to an embodiment of the present application, which is mainly implemented on captured traffic intersection pictures and videos (three frames are taken) and network related pictures, and on the basis of the above embodiments and fig. 3, the present application further provides a specific step flowchart of a method for detecting and tracking an object based on improved YOLOv5 and deep sort, which at least includes:
step 201, obtaining a target image to be detected, preprocessing, splicing 4 pictures by adopting a Mosaic method in a random scaling, random cutting and random arrangement mode to achieve the effects of enriching a data set and improving robustness, dividing the preprocessed image set by the processed pictures according to the ratio of 6:2:2 to obtain a training set, a verification set and a test set, training a target detection model through the training set, checking the training effect based on the verification set, testing the actual learning ability based on the test set, and verifying the effectiveness of the target detection model;
step 202, a loss function in the training process is used for measuring the distance between predicted information and expected information, and the closer the predicted information and the expected information are, the smaller the loss function value is;
three main aspects of this are considered: rectangular frame loss (loss) rect ) Confidence loss (loss) obj ) Loss of classification (loss) clc ) The overall loss is a weighted sum of three losses and the loss function formula is as follows:
Loss=a*loss obj +b*loss rect +c*loss clc
step 203, constructing a target detection model based on improved YOLOv5, which comprises an Input end (Input), a Backbone network (Backbone), a feature fusion network (neg) and a detector network (Head);
the input end comprises a Mosaic data enhancement, a self-adaptive anchor frame calculation and a self-adaptive picture scaling; the backbone network mainly comprises a focus+CPS structure and adopts a CSP2 structure; the feature fusion network adopts a structure of combining FPN and PAN; the detector network adopts DIoU_Loss to replace IoU _Loss, and uses DIoU_NMS to replace NMS, so that feature information can be enriched, small targets can be increased, feature fusion capability can be enhanced, calculated amount can be reduced, and robustness of detection can be enhanced, and the expectations of reducing omission rate and improving detection speed and accuracy can be achieved;
step 204, replacing IoU _loss with diou_loss, where IoU _loss cannot accurately reflect the degree of overlap between the real frame and the predicted frame, and may affect smaller objects, and may be unstable in the case of multiple labels; the distance between center points can be directly minimized by the DIoU_Loss penalty term, and the effect of rapid regression can be achieved for the situation that a real frame wraps a predicted frame;
wherein, the formulas of DIoU_Loss and IoU _Loss are as follows:
L IoU =1-IoU
wherein B is gt =(x gt ,y gtgt ,h gt ) Is a real frame, b= (x, y, ω, h) is a predicted frame, B gt B is a predicted frame center point, ρ (-) represents the Euclidean distance of the two center points, and L IoU And L DIoU Loss functions of IoU and DIoU, respectively, R DIoU For the penalty term, c represents the diagonal distance of the minimum closure region that can contain both the true and predicted frames;
step 205, replacing NMS with DIoU_NMS;
since YOLOv5 defaults to the NMS algorithm, NMS has the following drawbacks: (1) The processing of the overlapping area is limited, since it only considers the IoU value between two bounding boxes, the detection box where the overlapping portion of adjacent detection boxes is larger than the overlapping threshold is removed, in which case if a real object appears in the overlapping area, the detection of the object will fail; (2) The method has higher threshold dependency, the threshold of the NMS is not easy to determine, and incorrect setting can cause the problems of false deletion, false detection, omission detection and the like; (3) lower stability: the result of the NMS may be affected by the order of the input boxes, and if the IoU values of the two boxes are the same, the latter box is typically chosen as the reserved box, which may lead to an unstable result, where diou_nms is formulated as follows:
wherein s is i For classifying confidence, ε is a threshold, M is the candidate box with the highest confidence, B i Traversing the superposition of each frame and the frame with high confidence;
step 206, detecting the target vehicle of the current frame based on the trained target detection model, and obtaining the helmet wearing condition and position information of the current frame riding electric vehicle personnel;
step 207, detecting and tracking the video based on the deep SORT algorithm, and comprehensively judging whether the tracked person wears the helmet correctly according to the helmet wearing conditions of the detected person in a plurality of frames in the tracking process;
if the detected person is judged to wear the helmet, the tracker is released, and if the detected person is judged not to wear the helmet according to the regulations, the detection video and the video interception time of a certain frame are recorded for later manual checking and inspection and illegal processing;
specifically, when target tracking is performed on a person who wears the helmet without regulation based on a deep SORT algorithm, confirming the characteristics of the target person who wears the helmet without regulation and confirming the target according to the detection result, starting to record the detection track of the target person at the moment, finding the tracked person by using a detection frame, initializing a Kalman filter, recording the corresponding speed information of the center position, the aspect ratio and the height of the detection frame in coordinates, matching the detection track with the predicted track obtained by the Kalman filter, and determining that one track is the track after the next frames (three continuous frames) are successfully matched;
and simultaneously setting a life cycle threshold value, and if the time is not matched, considering that the person leaving the target leaves the tracking area;
step 208, obtaining a predicted track of the position information of the target person at the next moment after processing by a Kalman filter;
the calculation formula of the Kalman filter is as follows:
the prior state estimate is:
the posterior state estimate is:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the predicted value of the state, A is the state transition matrix,>is the optimal estimated value of the state, B is the control input matrix,>covariance between true and predicted values, P k For covariance between true and optimal estimate, K k Is Kalman gain matrix, H is system observation matrix, R is noise matrix of detector, z k Is the observed quantity of the state array;
step 209, detecting a target person in the current frame to obtain a detection result, matching (cascade matching and IoU matching) the predicted track and the detection result by using a Hungary algorithm, and comprehensively judging the matching degree between the predicted track and the detection result, thereby completing tracking matching of the target person;
wherein, more reliable weighted values of the mahalanobis distance and the cosine distance are used as the correlation measurement when cascade matching, the appearance and the motion information are also integrated into a matching strategy, and χ is utilized 2 A distributed threshold value is used for judging whether the matching is successful or not, and the prediction in a short time is more effective because the mahalanobis distance provides possible position information of the target personnel; the cosine distance is more considered as the appearance characteristics of the prediction information and the track information, and is more effective when the displacement of the tracked object is less or the target personnel after being recovered and shielded is reconfirmed and predicted under the shielded condition, so that the Marshall distance and the apparent characteristic cosine distance are weighted and added to be used as the total matching index of the model;
giving priority matching rights to tracks which are not lost, and finally matching the longest track which is lost, and recovering the blocked target through the partial processing, so as to reduce the ID Switch times of the blocked target;
wherein, the calculation formula of the mahalanobis distance similarity measure is as follows:
the apparent feature cosine distance metric formula is as follows:
the cost formula for the mahalanobis distance and apparent cosine distance is as follows:
c i,j =λd (1) (i,j)+(1-λ)d (2) (i,j)
d (1) (i, j) represents the matching degree of the jth predicted result and the ith trace, d j Represents the j-th detection frame position S i Is a covariance matrix, y of the observation space at the current moment predicted by a track Kalman filter i Representing the predicted position of the ith tracker track at the current moment; d, d (2) (i, j) represents the minimum cosine distance between the feature vector of the ith trace and the jth detection, r j Representing the feature vector of the jth detection,representing the tracked feature vector;
step 210, obtaining matched detection and track, unmatched tracks and unmatched detections after cascade matching;
specifically, ioU matching is carried out on the track which is only one frame and is successfully matched in the cascade matching, all unconfirmed state tracks of the current kth frame predicted by Kalman filtering based on the kth-1 frame and the rest non-matched detection in the cascade matching, and matched detection and track, non-matched tracks and non-matched detections are obtained again;
step 211, performing matrix updating and subsequent processing;
the Kalman filter updates the mean value and variance of track in a new frame, deletes the track exceeding the life cycle threshold, reassigns ID to unmatched detection, and updates the feature matrix;
and 212, finally detecting and tracking the personnel wearing the helmet according to the target detection and tracking model based on the Yolov5 and the deep SORT in the traffic video.
On the basis of the above embodiment, as shown in fig. 2, an embodiment of the present application further provides a flowchart of a system for detecting and tracking a target based on improved YOLOv5 and deep sort, for supporting the method for detecting and tracking a target based on improved YOLOv5 and deep sort of the above embodiment, where the system for detecting and tracking a target based on improved YOLOv5 and deep sort includes:
the target detection stage module is used for acquiring an image and carrying out target detection on the image;
a target tracking stage module for tracking whether the matched target wears a helmet;
and the information feedback stage module is used for recording the target video, collecting evidence and integrating image information for uploading and feedback.
In this embodiment, the target detection stage module includes:
the target detection state unit is used for obtaining target detection results including the positions of detected electric bicycle riding personnel and helmet positions after judging whether the electric bicycle is an electric bicycle or not through target detection of the current frame image, and determining that the detected personnel is not correctly wearing helmet personnel if the detected personnel is detected to be not coincident with the helmet positions;
the information association state unit is used for determining the position of the detected personnel detection frame and the position of the detected helmet detection frame according to the position information of the target personnel and the helmet in the current frame, calculating the proportion of the overlapping area of the personnel detection frame and the helmet detection frame to the personnel frame area, and if the proportion is larger than a threshold value, matching the personnel detection frame with the helmet detection frame to obtain the condition that the detected personnel wears the helmet;
the sample separation state unit is used for separating target personnel with a tracker from the target detection result of the information association state unit so as to construct a corresponding tracking system, and the tracker is built for the detected personnel who wear the helmet incorrectly, so that the follow-up tracking is facilitated, and the position information, the appearance characteristics and the like of the detected personnel are fed back in time;
the target tracking stage module comprises:
the target prediction state unit is used for obtaining a prediction track of the position information of the detected person at the next moment through Kalman filter processing;
the target matching state unit is used for matching the predicted track obtained through the Kalman filter with the detection track by using the Hungary algorithm and judging the matching degree between the predicted track and the detection track, so that tracking matching of detected personnel is completed;
the target checking state unit is used for checking whether target personnel wear the helmet correctly in continuous frames, so that the conditions of missed checking and wrong checking are reduced, and the checking state unit can also be used as a checking basis;
the target judging state unit is used for integrating the helmet wearing conditions of the detected personnel in all frames in the target tracking process and judging whether the detected personnel wear the helmet correctly or not;
the information feedback stage module comprises:
the illegal recording state unit is used for automatically recording videos so as to record evidence that detected personnel wear the helmet incorrectly, and feeding back image related information to the background together so as to help traffic managers evaluate bad behaviors and process illegal behaviors;
and the violation alarm state unit is used for triggering the alarm device according to the target analysis result under the condition that the detected personnel does not wear the helmet correctly so as to remind traffic management personnel to process.
Further, the system for detecting and tracking the target based on the improved YOLOv5 and deep sort may operate the method for detecting and tracking the target based on the improved YOLOv5 and deep sort, and specific implementation may refer to a method embodiment, which is not described herein.
On the basis of the embodiment, the application further provides electronic equipment, which comprises:
the device comprises a processor and a memory, wherein the processor is in communication connection with the memory;
in this embodiment, the memory may be implemented in any suitable manner, for example: the memory can be read-only memory, mechanical hard disk, solid state disk, USB flash disk or the like; the memory is used for storing executable instructions executed by at least one of the processors;
in this embodiment, the processor may be implemented in any suitable manner, e.g., the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), a programmable logic controller, and an embedded microcontroller, etc.; the processor is configured to execute the executable instructions to implement the improved YOLOv5 and deep sort based target detection and tracking method as described above.
On the basis of the above embodiments, the present application further provides a computer readable storage medium, in which a computer program is stored, which when executed by a processor implements the method for target detection and tracking based on improved YOLOv5 and deep sort as described above.
Those of ordinary skill in the art will appreciate that the various illustrative phase modules and method steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus, device and stage module described above may refer to the corresponding processes in the foregoing method embodiments, which are not described in detail herein.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and the division of the phase modules is merely a logical function division, and there may be other manners of dividing the phase modules or state units in actual implementation, for example, multiple phase modules or state units may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or apparatuses, which may be in electrical, mechanical or other form.
The phase modules described as separate components may or may not be physically separate, and components shown as phase modules may or may not be physical state units, may be located in one place, or may be distributed over multiple network state units. Some or all of the state units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional stage module in the embodiments of the present application may be integrated in one processing stage module, or each stage module may exist separately and physically, or two or more stage modules may be integrated in one stage module.
The functions, if implemented in the form of software functional stage modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a read-only memory server, a random access memory server, a magnetic disk or an optical disk, or other various media capable of storing program instructions.
In addition, it should be noted that the combination of the technical features described in the present application is not limited to the combination described in the claims or the combination described in the specific embodiments, and all the technical features described in the present application may be freely combined or combined in any manner unless contradiction occurs between them.
It should be noted that the above-mentioned embodiments are merely examples of the present application, and it is obvious that the present application is not limited to the above-mentioned embodiments, and many similar variations are possible. All modifications attainable or obvious from the present disclosure set forth herein should be deemed to be within the scope of the present disclosure.
The foregoing is merely illustrative of the preferred embodiments of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (9)

1. A method for improved YOLOv5 and deep sort based target detection and tracking comprising the steps of:
acquiring a target image to be detected, preprocessing, dividing a preprocessed image set by the processed image according to the ratio of 6:2:2, and obtaining a training set, a verification set and a test set;
constructing a target detection model by improving YOLOv 5;
detecting a target vehicle in the current frame through the trained target detection model to obtain the helmet wearing condition and position information of the current frame riding electric vehicle personnel;
detecting and tracking target personnel in the video through a deep SORT algorithm, and comprehensively judging whether the tracked personnel wear the helmet correctly according to the helmet wearing conditions of the detected personnel in a plurality of frames in the tracking process;
and detecting and tracking the personnel wearing the helmet in the traffic video without regulation through a target detection and tracking model of YOLOv5 and deep SORT.
2. The method for detecting and tracking the target based on the improved YOLOv5 and deep sort according to claim 1, wherein the steps of obtaining the target image to be detected and preprocessing, dividing the processed image into preprocessed image sets according to the ratio of 6:2:2, and obtaining a training set, a verification set and a test set comprise the steps of:
obtaining an image of a target to be detected, splicing 4 pictures by adopting a Mosaic method in a random scaling, random cutting and random arrangement mode, dividing the processed pictures into preprocessed image sets according to a ratio of 6:2:2, and obtaining a training set, a verification set and a test set.
3. The method for detecting and tracking the target based on the improved YOLOv5 and deep sort according to claim 2, wherein the method for splicing 4 pictures by random scaling, random cutting and random arrangement by using a Mosaic method comprises the following steps:
randomly selecting reference point coordinates (xc, yc) of picture stitching, and randomly selecting four pictures;
the four pictures are respectively placed at the left upper part, the right upper part, the left lower part and the right lower part of the large picture with the specified size after being subjected to size adjustment and scaling according to the datum point;
mapping relation is corresponding to the picture label according to the size conversion mode of each picture;
and splicing the large images according to the designated abscissa and ordinate, and processing the coordinates of the detection frame exceeding the boundary.
4. The method for detecting and tracking the target based on the improved YOLOv5 and deep sort according to claim 1, wherein the method for constructing the target detection model by the improved YOLOv5 comprises four parts of an input end, a backbone network, a feature fusion network and a detector network;
the input end comprises Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling;
the backbone network comprises a focus+CPS structure and adopts a CSP2 structure;
the feature fusion network adopts a structure of combining FPN and PAN;
the detector network uses diou_loss instead of IoU _loss and diou_nms instead of NMS.
5. The method for detecting and tracking the target based on the improved YOLOv5 and deep sort according to claim 1, wherein the detecting and tracking the target person in the video by the deep sort algorithm comprehensively judges whether the tracked person wears the helmet correctly according to the helmet wearing condition of the detected person in a plurality of frames in the tracking process, and comprises the following steps:
detecting and tracking a target person in a video through a deep SORT algorithm, predicting the position information of the target person at the next moment by using a Kalman filter, performing cascade matching and IoU matching on a predicted track and a detection result by using a Hungary algorithm, comprehensively judging the matching degree between the predicted track and the detection result, and completing tracking matching of the target person.
6. A system for improved YOLOv5 and deep sort based target detection and tracking comprising:
the target detection stage module is used for acquiring an image and carrying out target detection on the image;
a target tracking stage module for tracking whether the matched target wears a helmet;
and the information feedback stage module is used for recording the target video, collecting evidence and integrating image information for uploading and feedback.
7. The improved YOLOv5 and deep sort based target detection and tracking system of claim 6, wherein the target detection phase module comprises:
the target detection state unit is used for obtaining target detection results including the positions of detected electric bicycle riding personnel and helmet positions after judging whether the electric bicycle is an electric bicycle or not through target detection of the current frame image, and determining that the detected personnel is not correctly wearing helmet personnel if the detected personnel is detected to be not coincident with the helmet positions;
the information association state unit is used for determining the position of the detected personnel detection frame and the position of the detected helmet detection frame according to the position information of the target personnel and the helmet in the current frame, calculating the proportion of the overlapping area of the personnel detection frame and the helmet detection frame to the personnel frame area, and if the proportion is larger than a threshold value, matching the personnel detection frame with the helmet detection frame to obtain the condition that the detected personnel wears the helmet;
the sample separation state unit is used for separating target personnel with a tracker from the target detection result of the information association state unit so as to construct a corresponding tracking system, and the tracker is built for the detected personnel who wear the helmet incorrectly, so that the follow-up tracking is facilitated, and the position information, the appearance characteristics and the like of the detected personnel are fed back in time;
the target tracking stage module comprises:
the target prediction state unit is used for obtaining a prediction track of the position information of the detected person at the next moment through Kalman filter processing;
the target matching state unit is used for matching the predicted track obtained through the Kalman filter with the detection track by using the Hungary algorithm and judging the matching degree between the predicted track and the detection track, so that tracking matching of detected personnel is completed;
the target checking state unit is used for checking whether target personnel wear the helmet correctly in continuous frames, so that the conditions of missed checking and wrong checking are reduced, and the checking state unit can also be used as a checking basis;
the target judging state unit is used for integrating the helmet wearing conditions of the detected personnel in all frames in the target tracking process and judging whether the detected personnel wear the helmet correctly or not;
the information feedback stage module comprises:
the illegal recording state unit is used for automatically recording videos so as to record evidence that detected personnel wear the helmet incorrectly, and feeding back image related information to the background together so as to help traffic managers evaluate bad behaviors and process illegal behaviors;
and the violation alarm state unit is used for triggering the alarm device according to the target analysis result under the condition that the detected personnel does not wear the helmet correctly so as to remind traffic management personnel to process.
8. An electronic device, the electronic device comprising:
the device comprises a processor and a memory, wherein the memory is in communication connection with the processor;
the memory is used for storing executable instructions executed by at least one of the processors, the processor is used for executing the executable instructions to realize the method for detecting and tracking the target based on the improved YOLOv5 and deep sort according to any one of claims 1 to 5.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of improved YOLOv5 and deep sort based target detection and tracking according to any of claims 1 to 5.
CN202310424367.5A 2023-04-20 2023-04-20 Target detection and tracking method and system based on improved YOLOv5 and deep SORT Pending CN116740753A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310424367.5A CN116740753A (en) 2023-04-20 2023-04-20 Target detection and tracking method and system based on improved YOLOv5 and deep SORT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310424367.5A CN116740753A (en) 2023-04-20 2023-04-20 Target detection and tracking method and system based on improved YOLOv5 and deep SORT

Publications (1)

Publication Number Publication Date
CN116740753A true CN116740753A (en) 2023-09-12

Family

ID=87905120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310424367.5A Pending CN116740753A (en) 2023-04-20 2023-04-20 Target detection and tracking method and system based on improved YOLOv5 and deep SORT

Country Status (1)

Country Link
CN (1) CN116740753A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117994987A (en) * 2024-04-07 2024-05-07 东南大学 Traffic parameter extraction method and related device based on target detection technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686923A (en) * 2020-12-31 2021-04-20 浙江航天恒嘉数据科技有限公司 Target tracking method and system based on double-stage convolutional neural network
CN113160274A (en) * 2021-04-19 2021-07-23 桂林电子科技大学 Improved deep sort target detection tracking method based on YOLOv4
KR102407170B1 (en) * 2022-01-07 2022-06-10 (주)토페스 Method for monitoring violation of traffic regulations of two-wheeled vehicle, and system therefor
CN114708610A (en) * 2022-02-23 2022-07-05 浙江万里学院 Non-motor vehicle violation detection method integrating attention mechanism
CN115171022A (en) * 2022-07-19 2022-10-11 武汉理工大学 Method and system for detecting wearing of safety helmet in construction scene
CN115526285A (en) * 2021-06-25 2022-12-27 中国农业大学 Fish counting device and counting method thereof, electronic equipment and storage medium
CN115984969A (en) * 2023-02-10 2023-04-18 沈阳大学 Lightweight pedestrian tracking method in complex scene

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686923A (en) * 2020-12-31 2021-04-20 浙江航天恒嘉数据科技有限公司 Target tracking method and system based on double-stage convolutional neural network
CN113160274A (en) * 2021-04-19 2021-07-23 桂林电子科技大学 Improved deep sort target detection tracking method based on YOLOv4
CN115526285A (en) * 2021-06-25 2022-12-27 中国农业大学 Fish counting device and counting method thereof, electronic equipment and storage medium
KR102407170B1 (en) * 2022-01-07 2022-06-10 (주)토페스 Method for monitoring violation of traffic regulations of two-wheeled vehicle, and system therefor
CN114708610A (en) * 2022-02-23 2022-07-05 浙江万里学院 Non-motor vehicle violation detection method integrating attention mechanism
CN115171022A (en) * 2022-07-19 2022-10-11 武汉理工大学 Method and system for detecting wearing of safety helmet in construction scene
CN115984969A (en) * 2023-02-10 2023-04-18 沈阳大学 Lightweight pedestrian tracking method in complex scene

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DARRENZHANG: "Yolo 发展史(v4/v5 的创新点汇总!)", pages 9 - 12, Retrieved from the Internet <URL:https://cloud.tencent.com/developer/article/1843074> *
DARRENZHANG: "Yolo发展史(v4/v5的创新点汇总!)", HTTPS://CLOUD.TENCENT.COM/DEVELOPER/ARTICLE/1843074, 7 July 2021 (2021-07-07), pages 9 - 12 *
来自网络: "【YOLO系列】YOLOv5超详细解读(网络详解)", HTTPS://AITECHTOGETHER.COM/PYTHON/78832.HTML, 15 April 2023 (2023-04-15), pages 6 - 8 *
来自网页: "【YOLO 系列】YOLOv5 超详细解读(网络详解)", pages 6 - 8, Retrieved from the Internet <URL:https://aitechtogether.com/python/78832.html> *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117994987A (en) * 2024-04-07 2024-05-07 东南大学 Traffic parameter extraction method and related device based on target detection technology

Similar Documents

Publication Publication Date Title
JP6873126B2 (en) Face location tracking methods, devices and electronic devices
US8139817B2 (en) Face image log creation
US8619135B2 (en) Detection of abnormal behaviour in video objects
US9773322B2 (en) Image processing apparatus and image processing method which learn dictionary
CN109657533A (en) Pedestrian recognition methods and Related product again
JP5604256B2 (en) Human motion detection device and program thereof
CN110852183B (en) Method, system, device and storage medium for identifying person without wearing safety helmet
US20130335571A1 (en) Vision based target tracking for constrained environments
Saddique et al. Spatial Video Forgery Detection and Localization using Texture Analysis of Consecutive Frames.
EP3324307B1 (en) Retrieval device, retrieval method, and computer-readable medium
JP2008026974A (en) Person tracking device
Gualdi et al. Contextual information and covariance descriptors for people surveillance: an application for safety of construction workers
Stadler et al. Pas tracker: Position-, appearance-and size-aware multi-object tracking in drone videos
CN116740753A (en) Target detection and tracking method and system based on improved YOLOv5 and deep SORT
Roy et al. Foreground segmentation using adaptive 3 phase background model
KR101406334B1 (en) System and method for tracking multiple object using reliability and delayed decision
CN116311063A (en) Personnel fine granularity tracking method and system based on face recognition under monitoring video
CN113436231B (en) Pedestrian track generation method, device, equipment and storage medium
Chen et al. Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos
Wibowo et al. Human face detection and tracking using retinaface network for surveillance systems
CN108334811B (en) Face image processing method and device
JP2007510994A (en) Object tracking in video images
Park et al. Intensity classification background model based on the tracing scheme for deep learning based CCTV pedestrian detection
Zhang et al. What makes for good multiple object trackers?
CN109614893B (en) Intelligent abnormal behavior track identification method and device based on situation reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination