CN110516556B

CN110516556B - Multi-target tracking detection method and device based on Darkflow-deep Sort and storage medium

Info

Publication number: CN110516556B
Application number: CN201910701678.5A
Authority: CN
Inventors: 王义文; 郑权; 王健宗; 曹靖康
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-07-31
Filing date: 2019-07-31
Publication date: 2023-10-31
Anticipated expiration: 2039-07-31
Also published as: CN110516556A; WO2021017291A1

Abstract

The application provides a Darkflow-DeepSort-based multi-target tracking detection method, a device and a storage medium, and relates to the technical field of intelligent decision making, wherein the method comprises the following steps: s110, training by utilizing a YOLOv3 algorithm to obtain a target detection model based on Darkflow; s120, inputting the detection image into a trained target detection model based on Darkflow to obtain apparent characteristics of a plurality of targets; the detection image is obtained based on decoding the monitoring video; s130, inputting the apparent characteristics of a plurality of targets into a trained target tracking model based on DeepSort; the target tracking model is obtained through training of a multi-target detection data set MOT16 Change; and S140, carrying out frame-by-frame data association processing on the monitoring video by using a Kalman filter of a target tracking model to realize multi-target tracking in the monitoring video. By utilizing the method, the multi-target tracking detection speed can be improved, and multi-target tracking can be completed without losing detection accuracy.

Description

Multi-target tracking detection method and device based on Darkflow-deep Sort and storage medium

Technical Field

The application relates to the technical field of intelligent decision making, in particular to a method and a device for multi-target tracking detection based on Darkflow-DeepSort and a storage medium.

Background

The visual target tracking method is widely applied to the fields of man-machine interaction, unmanned operation and the like, and the tracking method based on correlation filtering (Correlation Filter) and Convolutional Neural Network (CNN) already occupies the most of the fields of target tracking.

The SORT method (SIMPLE on line AND REALTIME TRACKING, SIMPLE on-line and real-time tracking) achieves better results in the existing multi-target tracking method. The method has the greatest characteristics of efficiently realizing target detection and tracking by using Kalman filtering to filter and a Hungarian algorithm.

The deep Sort is an improvement on the basis of SORT target tracking, and the original deep Sort is utilized to train a high-performance fast-RCNN model to detect the target, so that compared with a Sort algorithm, ID switch is reduced by 45%, depth appearance information is combined, and the tracking effect of shielding targets is greatly improved; the FP is raised, and the most advanced online tracking effect is achieved. However, fps can reach around 15fps at maximum when deep sort tracking is performed by this method, but average fps can only reach around 10, and real-time tracking is only stable around 8 fps.

Therefore, a multi-target tracking detection method is needed to increase the detection speed without losing the detection accuracy.

Disclosure of Invention

In order to solve the above problems, the present application is directed to a method, an apparatus and a storage medium for multi-target tracking detection based on Darkflow-DeepSort.

A multi-target tracking detection method based on Darkflow-DeepSort is applied to an electronic device and comprises the following steps:

s110, training by utilizing a YOLOv3 algorithm to obtain a target detection model based on Darkflow;

s120, inputting the detection image into a trained target detection model based on Darkflow to obtain apparent characteristics of a plurality of targets; the detection image is obtained based on decoding the monitoring video;

s130, inputting the apparent characteristics of a plurality of targets into a trained target tracking model based on DeepSort; the target tracking model is obtained through training of a multi-target detection data set MOT16 Change;

and S140, carrying out frame-by-frame data association processing on the monitoring video by using a Kalman filter of a target tracking model to realize multi-target tracking in the monitoring video.

Further, the preferred method is that the target detection model based on the Darkflow is a Python model, and the Python model is obtained by converting a Darknet network structure through Cython.

Further, the preferred method in step S140 specifically includes:

s210, obtaining the motion matching degree and the apparent feature matching degree of multiple targets; the motion matching degree is obtained by calculating the motion similarity of multiple targets obtained by a Kalman filter; the apparent feature matching degree is obtained by calculating apparent features of the plurality of targets; s220, obtaining the matching degree of the target frame by performing frame-by-frame data association processing on the monitoring video by utilizing the motion matching degree and the apparent characteristic matching degree of the multiple targets; s230, selecting a target frame with the final matching degree reaching the preset matching parameters as a target tracking result.

Further, it is preferable that the method is,

and screening targets with occurrence times exceeding a set threshold value for the apparent characteristics of the targets obtained in the step S120, and giving priority to the targets through cascade matching.

Furthermore, in the preferred method, the padding of the convolution layers in the network structure of the dark flow is 1, and the pooling layers are all the maximum pooling.

An electronic device, comprising: the system comprises a memory, a processor and a computer program which is stored in the memory and can run a Darkflow-deep-start-based multi-target tracking detection method on the processor, wherein the computer program of the Darkflow-deep-based multi-target tracking detection method realizes the following steps when being executed by the processor:

Further, the preferred structure is that the target detection model based on the Darkflow is a Python model, and the Python model is obtained by converting the Darknet network structure through Cython.

Further, it is preferable that the step 140 includes:

Further, in the preferred structure, the padding of the convolution layers in the network structure of the dark flow is 1, and the pooling layers are all the maximum pooling.

According to another aspect of the present application, there is provided a computer readable storage medium storing a computer program including a Darkflow-deep start-based multi-target trace detection program, which when executed by a processor, is the steps in the above-described Darkflow-deep start-based multi-target trace detection method.

By using the method, the device and the storage medium for detecting the multi-target tracking based on the Darkflow-DeepSort, the following effects can be achieved:

1. the application realizes multi-target tracking in the monitoring video by utilizing the Kalman filtering of the single hypothesis tracking method and the frame-by-frame data association, combines the YOLOv3 algorithm with the Kalman filtering, not only can track multi-targets with high accuracy, but also can avoid the defect of huge calculation caused by exponential increase of the multi-hypothesis algorithm along with the measurement number and the target number.

2. Positioning a moving target in the acquired moving target image by utilizing a target detection image model, and for a plurality of frames of continuous images which are continuously obtained, namely for a video, positioning the moving target in each frame of image so as to realize tracking detection of the moving target behavior in the video; because the image processing speed of the YOLOv3 algorithm is high, the image processing speed of the model based on the YOLOv3 algorithm is higher than that of the model trained by the conventional convolutional neural network algorithm (for example, 1000 times faster than R-CNN and 100 times faster than Fast-RCNN) under the same condition.

3. The YOLOv3 algorithm is convenient to transplant, can be realized under various operating systems, has relatively low requirements on configuration of terminal hardware, and can easily realize operation of a target detection model on lightweight equipment.

4. The apparent features of the target to be tracked are extracted to carry out nearest neighbor matching, so that the target tracking effect under the shielding condition is improved, and meanwhile, the problem of target ID jump is reduced.

5. When the method is used for tracking the target in the video, in the video with the original fps of 25, the target can reach 15fps without frame extraction processing, and when frame extraction processing is performed every three frames, the target can reach more than 20 fps optimally without losing the tracked target; and the real-time camera tracking can reach more than 14fps, and the detection speed is improved by 100 times on the basis of ensuring the accuracy.

6. Aiming at the application scene of real-time recording and broadcasting, the application can realize the accurate positioning and quick identification of the moving target characteristics under the same precision, improve the speed and precision of identification in the video field and reduce the delay and the blocking of a recording and broadcasting system.

To the accomplishment of the foregoing and related ends, one or more aspects of the application comprise the features hereinafter fully described. The following description and the annexed drawings set forth in detail certain illustrative aspects of the application. These aspects are indicative, however, of but a few of the various ways in which the principles of the application may be employed. Furthermore, the application is intended to include all such aspects and their equivalents.

Drawings

Other objects and attainments together with a more complete understanding of the application will become apparent and appreciated by referring to the following description taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 is a flow chart of a Darkflow-DeepSort-based multi-target tracking detection method according to an embodiment of the application;

FIG. 2 is a flow chart of a tracking method of a target tracking model according to an embodiment of the application;

FIG. 3 is a schematic diagram of a conversion flow for converting a Darknet network structure into a model structure of Python according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a Darkflow-based network architecture according to an embodiment of the application;

FIG. 5 is a schematic structural diagram of an electronic device for detecting multi-object tracking based on Darkflow-DeepSort according to an embodiment of the present application;

FIG. 6 is a flow frame diagram of a tracking method of existing target tracking according to an embodiment of the application.

The same reference numerals will be used throughout the drawings to refer to similar or corresponding features or functions.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of one or more embodiments. It may be evident, however, that such embodiment(s) may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to facilitate describing one or more embodiments.

The application provides a Darkflow-DeepSort-based multi-target tracking detection method, an electronic device and a storage medium. The method for detecting the multi-target tracking based on the Darkflow-DeepSort comprises a target detection stage and a target tracking stage; the Darkflow model is mainly used for training samples to detect pedestrians, and the DeepSort model only uses tracking parts such as Kalman filters to confirm tracks and the like. The Darkflow-deep-based multi-target tracking detection method provided by the application is characterized in that on the basis of extracting spatial features by using a convolutional neural network, a Kalman filter is used for learning the motion rule of a target, the features of the target are fused, the position of the target is budgeted, and the similarity of the target is calculated in combination with time and space to perform target matching, so that the target tracking purpose is realized.

Fig. 1 shows a flow of a method for detecting multi-target tracking based on dark flow-deep according to an embodiment of the present application.

As shown in fig. 1, the method for detecting multi-target tracking based on dark flow-deep includes the following steps:

s110, training by utilizing a YOLOv3 algorithm to obtain a target detection model based on Darkflow; the Darkflow model is based on the YOLOv3 algorithm and is obtained by adopting defined binary cross entropy training; the loss function of the YOLOv3 algorithm is two parts, the first part is a class error, the second part is an object position error, the two parts are defined binary cross entropy, and then the difference sum of the two classes of errors is taken as a total error function.

It should be noted that YOLOv3 (You only look once v) is a target detection algorithm based on dark net-53, and the greatest improvement in performance is that the detection speed is faster compared with other deep learning algorithms, which is why we want to use this method as target detection in multi-target tracking detection in the present application. In the target detection stage, a Darkflow model trained by a YOLOv3 algorithm is used for target detection, and a Darkflow network structure is adopted as a network frame for target detection; in the target tracking stage, the target tracking is completed by using a Python model.

S120, inputting the detection image into a trained target detection model based on Darkflow to obtain the apparent characteristics of a plurality of targets.

The detection image is obtained based on decoding the monitoring video; an exemplary illustration is as follows: a conventional method of decoding video is to decode a frame at intervals, for example, to extract 4 frames per second, and to set a decoding interval frame number, if video fps is 24, the interval frame number is 6, and to decode video in real time using Opencv according to the decoding interval frame number.

Apparent features, namely positional information and spatial features; furthermore, the target detection model based on the Darkflow is a Python model, the Python model is a target detection model based on the Darkflow, and the target detection model is a depth feature descriptor; the apparent features are extracted using a depth feature descriptor. Exemplary, 8 parameters are usedThe description of the motion state is made, where (u, v) is the center coordinates of the binding box, r is the aspect ratio, and h represents the height. The remaining four variables represent the corresponding velocity information in the image coordinate system. The apparent characteristics of the binding box are 128-dimensional characteristics obtained through a depth network.

S130, inputting apparent features of a plurality of targets into a trained target tracking model based on DeepSort, wherein the target tracking model is obtained through training a multi-target detection data set MOT16 Change. The target tracking model uses the deep sort model to remove the target detection part, namely the Kalman filter and the subsequent cascade matching.

The target tracking model is used for determining the position information of a target in a video frame, relatively stable statistical features or certain unchanged features in the target tracking model are required to be extracted through a corresponding target apparent feature description method, and the response of a target candidate region is acquired through a filter and is used as a standard for judging the position of the target.

The DeepSort-based target tracking model is trained on the disclosed multi-target detection dataset MOT16 Change. The training set itself is the game data provided by MOT16 Challenge.

Further, a target tracking model based on DeepSort is built on the basis of a Kalman filter, the Kalman filter is used for building a tracking model, then target detection is realized through Darkflow, apparent characteristics are determined for matching, and positioning information is input into the Kalman filter for tracking.

And S140, carrying out frame-by-frame data association processing on the monitoring video by using a Kalman filter of a target tracking model, so as to realize multi-target tracking in the monitoring video. The Kalman filtering method of single hypothesis tracking is utilized to realize multi-target tracking, so that the multi-target tracking with high accuracy can be realized, and the defect that the calculation amount is huge along with the exponential increase of the measurement number and the target number of a multi-hypothesis algorithm can be avoided.

Fig. 2 shows a flow of a working method of the deep sort-based object tracking model according to an embodiment of the present application.

As shown in fig. 2, the deep sort-based multi-target tracking detection method, while the deep sort-based core idea is a single-hypothesis tracking method, which uses recursive kalman filtering and frame-by-frame data association to implement a multi-target tracking process. Note that deep sort introduced a deep learning model trained offline on a pedestrian re-recognition dataset (ReID dataset containing 1261 more than 110 ten thousand images of individuals, suitable for pedestrian tracking). The DeepSort-based target tracking model in the application uses the DeepSort model to remove the target detection part.

In the target tracking stage, the visual target tracking task predicts the size and position of the target of the subsequent frame given the size and position of the target of the initial frame of a video sequence. While the visual target tracking task flow is divided by a framework as shown in fig. 6. Firstly initializing a target frame, generating a plurality of candidate frames in the next frame, extracting the characteristics of the candidate frames, scoring the candidate frames, and finally finding a candidate frame with the highest score in the scores as a predicted target or fusing a plurality of predicted values to obtain a better predicted target. In the present application, the prediction of the object is achieved by a kalman filter.

A typical example of Kalman filtering (Kalman) predicts the coordinates and velocity of the object's position from a finite set of noisy, observed sequences of object positions, which may be biased. Its statue can be found in many engineering applications (e.g. radar, computer vision). Meanwhile, kalman filtering is also an important subject in control theory and control system engineering. For example, for radar, it is of interest to be able to track objects. But the measurements of the position, velocity, acceleration of the target tend to be noisy at any time. Kalman filtering uses the dynamic information of the target to try to remove the effects of noise and get a good estimate of the target's position. This estimate may be an estimate of the current target position (filtering), an estimate of the future position (prediction), or an estimate of the past position (interpolation or smoothing). Performing nearest neighbor matching on the apparent characteristics of the target obtained by the target detection model through a Kalman filter; it should be noted that nearest neighbor matching is to find the nearest feature to complete matching according to the distance between features. The prediction of the position of this feature is done by a Kalman filter and then a nearest neighbor match is made to the position of the target that was actually detected.

S210, obtaining the motion matching degree and the apparent feature matching degree of multiple targets; the motion matching degree is obtained by calculating the motion similarity of multiple targets obtained by a Kalman filter; the apparent feature matching degree is obtained by calculating apparent features of the plurality of targets; s220, utilizing the motion matching degree and the apparent characteristic matching degree of the multiple targets, and obtaining the matching degree of the target frame by carrying out frame-by-frame data association processing on the monitoring video; s230, selecting a target frame with the final matching degree reaching the preset matching parameters as a target tracking result. That is, the target is paired with the tracker, the successful pairing and unsuccessful pairing tracking is updated, and the tracking which does not meet the condition is deleted; and then carrying out technology and trajectory drawing on the target, thereby completing the tracking action of the target.

The tracking object of the target frame may be a person, an animal, or other moving objects. When the tracking object is a person, the target frame may be referred to as a human body frame.

In a specific embodiment, the multi-target tracking is completed through the matching degree judgment of the target frame, and the matching degree judgment is divided into two parts: IOU matching and apparent feature matching; the IOU matching is carried out between the front detection and the rear detection; the apparent feature is matched by extracting apparent feature vectors through a network, and the current tracking target is compared with a potential matching object. And calculating the minimum value with small average distance of the apparent feature vectors before and after the calculation. Wherein, the degree of matching of the apparent features = 1-minimum of normalized average distance.

And the final degree of matching is equal to the average of the IOU match value and the apparent feature match value, that is, the final degree of matching is equal to (IOU match value + apparent feature match value)/2.

The preset matching parameters are that the final matching degree is larger than 0.5, and the IOU matching value is larger than 0.5; if the preset matching parameters are reached, the matching is successfully indicated for tracking; otherwise, the matching is judged to be unsuccessful.

For updating the successful and unsuccessful pair tracking, deleting the specific exemplary description of the condition-unsatisfied tracking, a standard Kalman filter is used to predict the motion state of the target, wherein the Kalman filter is based on a constant velocity model (i.e. a model in which the velocity defaults to be constant, i.e. a model in which no acceleration exists) and a linear observation model.

The result of the prediction by the Kalman filter is (u, v, r, h), and for each tracking target, the number of frames a since the last detection result matches the tracking result is recorded _k· Once the detection result of an object is correctly associated with the tracking result, the parameter a is then set to _k· Set to 0. Where the recording is equivalent to having an external recorder, or an array to record each frame, the tracking data for each object, the kalman filtering simply depends on the location of each object entered, and then makes predictions.

The predicted value of the kalman filter is compared with the actual detected value, and if the observed value and the predicted value differ too much, the prediction cannot represent the observed value.

That is, amax is an upper limit, a _k· Then the frame number value is not matched by the predicted value and the observed value of the Kalman filter, if a _k· Exceeding Amax indicates that the kalman filter tracking is not good. The tracking process for the target is deemed to have ended,the tracking is not continued. That is, the tracking process is finished, namely, after a target is tracked but a new position cannot be accurately predicted by a subsequent Kalman filter, the tracking process is considered to be finished.

The new target is determined to be present if a certain target in a certain detection result cannot be associated with an existing tracker (the existing tracker is the tracker that detects the existing target and is tracking the target).

If a potential new tracker in consecutive 3 frames (the new tracker is able to correlate the prediction result and the detection result for a new emerging target for three consecutive frames, then considered to be a new tracker.) both the prediction result of the target position can be correlated correctly with the detection result, then it is confirmed that a new moving target has emerged.

If the requirement cannot be met, the false alarm is considered to appear, and the moving target needs to be deleted; that is, when deleting the moving object, that is, when matching cannot be completed in three consecutive frames for an object detected by a new detection model, we consider that the object is not a tracking object (possibly derived from a detection error), and delete the object.

The appearance of a new target is matched with the existing tracker to see if the new target belongs to the target being tracked before, and if not, the new target is considered to be possible to appear, and a new tracker needs to be created.

In a specific embodiment of the present application, for the apparent features of the plurality of targets obtained in the step S120, targets whose occurrence number exceeds a set threshold are screened, and the targets are given priority through cascade matching. The threshold value for the number of occurrences is generally set to 3.

Further, in the final stage of cascade matching, in order to alleviate large changes due to apparent abrupt changes or partial occlusion, IOU-based matching may be performed on unmatched trajectories of unmatched and age=1.

Prioritizing frequently occurring targets by cascade matching is set for a state where one target is blocked for a long time. Wherein, when a target is occluded for a long time, the uncertainty of Kalman filtering prediction is greatly increased, and the observability in a state space is greatly reduced.

If two trackers compete for the matching right of the same detection result at this time, the mahalanobis distance of the track with longer shielding time is often smaller, so that the detection result is more likely to be associated with the track with longer shielding time, and the undesirable effect often damages the tracking persistence.

That is, assuming that the original covariance matrix is a normal distribution, then no update of the continuous prediction results in an increasing variance of the normal distribution, and then points further from the mean euclidean distance may obtain the same mahalanobis distance value as points closer in the previous distribution. Cascade Matching (Matching Cascade) is used in the present application to give priority to more frequently occurring targets.

It should be noted that cascade matching means that various matching modes (such as IOU matching or feature matching) are combined, and matching is performed in a cascade mode (i.e., one matching mode is followed by another matching mode); alternatively, further selection criteria are added first, and then the corresponding matches are made.

In a specific embodiment, the second cascade is used, that is, the selection criteria are added first and then the corresponding matching actions are performed. Therefore, a time point sequence is added first, the targets with high occurrence frequency are preferentially selected, and then a matching mechanism is entered, so that targets which are blocked for a long time are harder to be preferentially matched, namely, the targets which are more frequently appeared are given priority.

See paper SIMPLE on AND REALTIME TRACKING WITH ADEEP ASSOCIATION METRIC for specific algorithms; nicolai Wojke, alex Bewley, dietrich Paul us, university of Koblenz-LandauQueensland University of Technology, which are not described in detail herein。

FIG. 3 shows a conversion flow of converting a Darknet network structure into a model structure of Python according to an embodiment of the present application;

as shown in fig. 3, a conversion flow for converting a dark network structure into a model structure of Python includes the following steps:

that is to say, the flow used to translate dark to Tensorflow by dark; darkflow translates Darknet to Tensorflow.

By Cython we convert the original C-based Darknet network structure into a Python model structure for deep Sort. Meanwhile, a pb model structure used by Tensorflow can be generated for other algorithms.

FIG. 4 illustrates a Darkflow-based network architecture according to an embodiment of the application;

as shown in fig. 4, the network structure of the dark flow is as follows:

in the Darkflow network structure, the padding of all convolution layers is 1, and the pooling layers are the maximum pooling. Other parameters such as step size, convolution kernel size, number of filters are shown.

Initially a convolution layer with a convolution kernel (3*3) number of filters of 32; next, a step length of 2 is performed, and the largest pooling with the pooling size of 2 is performed; a convolution layer with a number of (3*3) filters of 64 followed by a maximum pooling of step size 2 with a step size of 2.

The latter network structure is similar, and a convolution layer with a convolution kernel of (3*3) and a number of filters of N is performed first, where N is twice the number of filters of the last large convolution structure. Then, a convolution with a number of (1*1) filters of N/2 is performed, a convolution layer with a number of (3*3) filters of N is further performed, and finally a maximum pooling is performed. A large convolution structure is formed. The convolution structure is carried out for 4 times, the pooling layer is removed in the last time, and two corresponding convolution layers are connected.

Corresponding to the above-mentioned method for detecting multi-target tracking based on Darkflow-DeepSort, the application also includes a multi-target tracking detection system based on Darkflow-DeepSort, which includes:

the method comprises a target detection model training unit, a target detection model generation unit and a target detection unit, wherein the target detection model training unit is used for training by utilizing a YOLOv3 algorithm to obtain a target detection model based on Darkflow;

the apparent feature determining unit is used for inputting the detection image into a trained Darkflow-based target detection model to obtain apparent features of a plurality of targets; the detection image is obtained based on decoding the monitoring video;

the target tracking model training unit is used for obtaining a target tracking model through MOT16Challenge training of a multi-target detection data set, and inputting apparent characteristics of a plurality of targets into the trained target tracking model based on deep.

And the target acquisition unit is used for carrying out frame-by-frame data association processing on the monitoring video by using a Kalman filter of a target tracking model so as to realize multi-target tracking in the monitoring video.

The specific implementation functions of the multi-target detection unit and the multi-target tracking unit correspond to corresponding steps of the multi-target tracking detection method based on Darkflow-DeepSort in the embodiment one by one, and the embodiment is not described in detail one by one.

Fig. 5 is a schematic diagram of a logic structure of an electronic device according to an embodiment of the application.

As shown in fig. 5, the electronic device 50 of this embodiment includes a processor 51, a memory 52, and a computer program 53 stored in the memory 52 and executable on the processor 51. The processor 51, when executing the computer program 53, implements the steps of the Darkflow-DeepSort-based multi-target tracking detection method of the embodiment, such as steps S110 to S140 shown in FIG. 1. Alternatively, the processor 51 performs the functions of the modules/units in the above-described device embodiments when executing the Darkflow-DeepSort-based multi-target tracking detection method.

By way of example, the computer program 53 may be divided into one or more modules/units, which are stored in the memory 52 and executed by the processor 51 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 53 in the electronic device 50. For example, the computer program 53 may be divided into a multi-target detection unit and a multi-target tracking unit, the function of which is described in detail in the embodiments, and which is not described in detail herein.

The electronic device 50 may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The electronic device 50 may include, but is not limited to, a processor 51, a memory 52. It will be appreciated by those skilled in the art that fig. 5 is merely an example of an electronic apparatus 50 and is not intended to limit the electronic apparatus 50, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic apparatus may further include input-output devices, network access devices, buses, etc.

The processor 51 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 52 may be an internal storage unit of the electronic device 50, such as a hard disk or a memory of the electronic device 50. The memory 52 may also be an external storage device of the electronic apparatus 50, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic apparatus 50. Further, the memory 52 may also include both internal storage units and external storage devices of the electronic apparatus 50. The memory 52 is used to store computer programs and other programs and data required by the electronic device. The memory 52 may also be used to temporarily store data that has been output or is to be output.

The present embodiment provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements the method for detecting multi-target tracking based on dark flow in the embodiment, and in order to avoid repetition, details are not repeated here. Or, the computer program when executed by the processor implements the functions of each module/unit in the above-mentioned multi-target tracking detection system based on dark flow-deep sort, and in order to avoid repetition, the description is omitted here.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.

The method for detecting multi-object tracking based on Darkflow-DeepSort, the electronic device and the storage medium according to the present application are described above by way of example with reference to FIGS. 1-6. However, it should be understood by those skilled in the art that, for the method, apparatus and storage medium for detecting multi-target tracking based on Darkflow-DeepSort as set forth in the foregoing disclosure, various modifications may be made without departing from the scope of the disclosure. Accordingly, the scope of the application should be determined from the following claims.

Claims

1. The method for detecting the multi-target tracking based on the Darkflow-DeepSort is applied to an electronic device and is characterized by comprising the following steps of:

s110, training by utilizing a YOLOv3 algorithm to obtain a target detection model based on Darkflow; the target detection model based on the Darkflow is a Python model, and the Python model is obtained by converting a Darknet network structure through Cython;

s120, inputting the detection image into a trained target detection model based on Darkflow to obtain apparent characteristics of a plurality of targets; the detection image is obtained based on decoding the monitoring video; screening targets with occurrence times exceeding a set threshold value for the apparent characteristics of the targets, and giving priority to the targets through cascade matching;

2. The method of Darkflow-deep-based multi-object tracking detection as claimed in claim 1, wherein said step S140 comprises:

s210, obtaining the motion matching degree and the apparent feature matching degree of multiple targets; the motion matching degree is obtained by calculating the motion similarity of multiple targets obtained by a Kalman filter; the apparent feature matching degree is obtained by calculating apparent features of the plurality of targets;

s220, calculating the final matching degree of the target frame by utilizing the motion matching degree and the apparent characteristic matching degree of the multiple targets through the IOU matching value and the apparent characteristic matching value which are obtained through the frame-by-frame data association processing of the monitoring video;

s230, selecting a target frame with the final matching degree reaching the preset matching parameters as a target tracking result.

3. The method for detecting multiple target tracking based on Darkflow-DeepSort as claimed in claim 1, wherein,

and the padding of the convolution layers in the Darkflow network structure is 1, and the pooling layers are the maximum pooling.

4. An electronic device, comprising: the system comprises a memory, a processor and a computer program which is stored in the memory and can run a Darkflow-deep-start-based multi-target tracking detection method on the processor, wherein the computer program of the Darkflow-deep-based multi-target tracking detection method realizes the following steps when being executed by the processor:

5. The electronic device of claim 4, wherein the step S140 includes:

6. The electronic device of claim 4, wherein the packing of convolutional layers in the network structure of the Darkflow is 1, and the pooling layers are all maximally pooled.

7. A computer readable storage medium storing a computer program comprising a Darkflow-deep start-based multi-target trace detection program which, when executed by a processor, implements the steps of the Darkflow-deep start-based multi-target trace detection method according to any one of claims 1 to 3.