CN112614156A

CN112614156A - Training method and device for multi-target tracking network model and related equipment

Info

Publication number: CN112614156A
Application number: CN202011488458.8A
Authority: CN
Inventors: 任玉蒙; 闫潇宁; 陈晓艳
Original assignee: Shenzhen Anruan Technology Co Ltd
Current assignee: Shenzhen Anruan Technology Co Ltd
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2021-04-06

Abstract

The invention relates to the technical field of target tracking, and provides a training method, a device and related equipment of a multi-target tracking network model, wherein the method comprises the following steps: constructing a training data set; constructing an initial tracking network model, wherein the initial tracking network model comprises a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm comprises a preset feature extraction network and an attention mechanism method; training the initial tracking network model based on the training data set, and performing multi-target detection on the video data in the training data set through the multi-target detection algorithm; and tracking the video frames of the multi-target object through the preset feature extraction network and the attention mechanism method, and performing matching test on the adjacent video frames to output a multi-target tracking network model. The invention can reduce the parameter quantity, improve the target detection speed and simultaneously reduce the hardware equipment for supporting the calculated quantity and the time cost of the operation process.

Description

Training method and device for multi-target tracking network model and related equipment

Technical Field

The invention relates to the technical field of target tracking, in particular to a training method and a training device for a multi-target tracking network model and related equipment.

Background

With the development of smart cities and the wide application of intelligent monitoring, the tracking of the pedestrian and vehicle targets plays an important role in numerous fields such as intelligent video monitoring, auxiliary detection, automatic driving, unmanned supermarkets and the like. However, with the target tracking method based on the convolutional neural network, the solution of tracking the target of the pedestrian and the vehicle by continuously increasing the number of layers of the convolutional neural network becomes the mainstream direction, and this method has a certain improvement on the tracking accuracy of the target tracking, but on the contrary, it brings huge calculation amount and brings huge pressure to hardware equipment. Therefore, in the prior art, the problems of large calculation amount and large hardware equipment pressure exist in the identification of the pedestrian and the vehicle.

Disclosure of Invention

The embodiment of the invention provides a training method of a multi-target tracking network model, which can reduce parameters and improve target detection speed while ensuring target detection precision.

In a first aspect, an embodiment of the present invention provides a method for training a multi-target tracking network model, including the following steps:

constructing a training data set, wherein the training data set comprises video data;

constructing an initial tracking network model, wherein the initial tracking network model comprises a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm comprises a preset feature extraction network and an attention mechanism method;

training the initial tracking network model based on the training data set, and performing multi-target detection on the video data in the training data set through the multi-target detection algorithm;

and tracking the video frames of the multi-target objects detected by the multi-target detection algorithm through the preset feature extraction network and the attention mechanism method, and performing matching test on the adjacent video frames to output a multi-target tracking network model.

Optionally, the preset feature extraction network includes a MobilenetV3 feature extraction network, and the step of performing video frame tracking on the multi-target object detected by the multi-target detection algorithm through the preset feature extraction network and the attention mechanism method includes:

under the monitoring of the attention mechanism method, feature information extraction is carried out on the detected multi-target object through the MobilenetV3 feature extraction network, real-time tracking is carried out on the basis of the feature information, and the feature information comprises various types of feature information.

Optionally, the training the initial tracking network model based on the training data set, and the multi-target detecting the video data in the training data set by the multi-target detecting algorithm includes:

carrying out data set processing on the training data set, wherein the data set processing comprises data classification to obtain a plurality of groups of sub data sets of different types;

and training the initial tracking network model based on the subdata sets respectively, and performing multi-target detection on the video data output from the subdata sets through the multi-target detection algorithm in the initial tracking network model.

Optionally, the step of performing a matching test on the adjacent video frames includes:

tracking each frame of the video frame of the multi-target object through a preset local data association algorithm, and labeling the tracked coordinates of the multi-target object through a labeling frame to obtain coordinates of the labeling frame;

performing local data association between adjacent frames on the coordinates of the marking frame to output optimal tracking matched multi-target object number tracking data;

and filtering the multi-target object tracking data through a preset filtering algorithm to output the multi-target tracking network model.

In a second aspect, an embodiment of the present invention provides a multi-target tracking identification method, including the following steps:

acquiring video data to be identified;

inputting the video data to be identified into a multi-target tracking model in any embodiment to perform multi-target tracking identification;

and outputting a multi-target identification result, and judging whether the target data exists in the data to be detected according to the multi-target output result.

In a third aspect, an embodiment of the present invention provides a training apparatus for a multi-target tracking network model, including:

a first construction module for constructing a training data set, the training data set comprising video data;

the second construction module is used for constructing an initial tracking network model, the initial tracking network model comprises a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm comprises a preset feature extraction network and an attention mechanism method;

the target detection module is used for training the initial tracking network model based on the training data set and carrying out multi-target detection on the video data in the training data set through the multi-target detection algorithm;

and the tracking identification module is used for tracking the video frames of the multi-target objects detected by the multi-target detection algorithm through the preset feature extraction network and the attention mechanism method, and performing matching test on the adjacent video frames to output a multi-target tracking network model.

In a fourth aspect, an embodiment of the present invention further provides a multi-target tracking and identifying apparatus, including:

the acquisition module is used for acquiring video data to be identified;

the detection and identification module is used for inputting the video data to be identified into a multi-target tracking model in any embodiment to perform multi-target tracking identification;

and the output module is used for outputting a multi-target identification result and judging whether the target data exists in the data to be detected or not according to the multi-target output result.

In a fifth aspect, an embodiment of the present invention further provides an electronic device, including: the training method comprises the steps of storing a training program, and executing a computer program stored on the storage and capable of running on the processor, wherein the processor implements the steps of the training method of the multi-target tracking network model provided by the embodiment when executing the computer program.

In a sixth aspect, a computer readable storage medium has a computer program stored thereon, and the computer program, when executed by a processor, implements the steps in the training method for the multi-target tracking network model provided by the embodiments.

In the embodiment of the invention, a training data set is constructed, wherein the training data set comprises video data; constructing an initial tracking network model, wherein the initial tracking network model comprises a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm comprises a preset feature extraction network and an attention mechanism method; training the initial tracking network model based on the training data set, and performing multi-target detection on the video data in the training data set through the multi-target detection algorithm; and tracking the video frames of the multi-target objects detected by the multi-target detection algorithm through the preset feature extraction network and the attention mechanism method, and performing matching test on the adjacent video frames to output a multi-target tracking network model. The invention ensures the extraction capability of the preset feature extraction network on feature information by providing the preset feature extraction network (MobilenetV3 feature extraction network) and adding an attention mechanism method; the multi-target object detected by the multi-target detection algorithm is subjected to video frame tracking through a preset feature extraction network, and adjacent video frames are subjected to matching test, so that the tracking accuracy of a plurality of targets can be ensured; therefore, the multi-target tracking network model output after multiple times of training can reduce the parameter quantity while ensuring the target detection precision, improve the target detection speed and simultaneously reduce the hardware equipment for supporting the calculated quantity and the time cost in the operation process.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart of a training method for a multi-target tracking network model according to an embodiment of the present invention;

FIG. 2 is a flowchart of another method for training a multi-target tracking network model according to an embodiment of the present invention;

FIG. 2a is a flowchart of another method for training a multi-target tracking network model according to an embodiment of the present invention;

FIG. 3 is a flow chart of a multi-target tracking identification method according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a training apparatus for a multi-target tracking network model according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of another training apparatus for a multi-target tracking network model according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of another training apparatus for a multi-target tracking network model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a multi-target tracking and identifying apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprising" and "having," and any variations thereof, in the description and claims of this application and the description of the figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

As shown in fig. 1, fig. 1 is a flowchart of a training method for a multi-target tracking network model according to an embodiment of the present invention, where the training method for the multi-target tracking network model includes the following steps:

101. a training data set is constructed, the training data set including video data.

In this embodiment, the training method of the multi-target tracking network model may be applied to various monitoring systems that need to track and identify vehicles and pedestrians. Of course, the tracking recognition of an animal or the like may be performed in addition to the tracking recognition of a vehicle or a pedestrian. The video data may be acquired by an image acquisition device, which may be a camera, or a camera equipped with a camera, or the like. The video data in the training data set may be video data obtained by recording the video data at a street, a station, or the like through a camera. And the electronic equipment on which the training method of the multi-target tracking network model operates can acquire video data in a training data set in a wired connection mode or a wireless connection mode and is used for data transmission and the like in the training process.

It should be noted that the Wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi (Wireless-Fidelity) connection, a bluetooth connection, a wimax (worldwide Interoperability for Microwave access) connection, a Zigbee (low power local area network protocol), a uwb (ultra wideband) connection, and other Wireless connection means now known or developed in the future.

The data training set may be a data set composed of a large amount of video data collected by a camera. The video data may include image data of pedestrians, vehicles, and the like, and the video data may be data collected over a period of time. The video is composed of images of one frame and one frame, so that image data capture can be realized in a video frame extraction mode. More specifically, the captured video data may include different types of pedestrians and/or vehicles, such as: including girls, boys, the elderly, children, cars, buses, vans, motorcycles, etc. Therefore, the targets such as pedestrians and/or vehicles in the video frames can be identified, and the targets such as pedestrians and/or vehicles can be marked and distinguished.

102. And constructing an initial tracking network model, wherein the initial tracking network model comprises a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm comprises a preset feature extraction network and an attention mechanism method.

The initial tracking network model is a model which is not trained yet, the initial tracking network model is constructed, a framework of the initial tracking network model can be constructed firstly, the multi-target detection algorithm and the multi-target tracking algorithm are added into the framework to form a complete initial tracking network model, and then the constructed training data set is input into the initial tracking network model to train the model for multiple times so as to realize model optimization.

The multi-target detection algorithm can realize the simultaneous detection of a plurality of targets and the extraction of the characteristic information of the plurality of targets. The simultaneous detection of multiple targets may include detection of multiple targets of the same type, or detection of multiple targets of different types, and the detected pedestrians and/or vehicles may also be marked, for example: the method comprises the steps of detecting a plurality of pedestrians in the video data or detecting various types of objects such as pedestrians and vehicles in the video data. The multi-target detection algorithm is not limited to a unique algorithm in the embodiment, and is applicable to the embodiment of the invention as long as multi-target detection can be realized. The multi-target tracking algorithm can track a plurality of pedestrians and/or vehicles, grasp the current state change of the pedestrians and/or vehicles in real time, and continuously optimize and train the initial tracking network model through the multi-target detection algorithm and the multi-target tracking algorithm, so that the recognition capability of the model can be enhanced.

Specifically, the multi-target detection algorithm includes a preset feature extraction network and an attention mechanism method. Wherein, the preset feature extraction network may be a MobilenetV3 feature extraction network. The MobilenetV3 has the characteristics of high accuracy of feature extraction, small calculation amount and no increase of time consumption. In addition, an attention mechanism method is added, and when the feature information of detected pedestrians and/or vehicles is extracted by the MobilenetV3 feature extraction network, the capability of the MobilenetV3 feature extraction network for extracting the feature information of the targets can be ensured.

103. Training the initial tracking network model based on the training data set, and performing multi-target detection on the video data in the training data set through a multi-target detection algorithm.

The constructed training data set is input to an initial tracking network model according to a certain sequence for training, and real-time multi-target detection is performed on video data in the multi-input training data set through a multi-target detection method, for example: the video data is data collected at the station exit, so that the pedestrians at the station exit can be detected in real time through a multi-target detection algorithm, the pedestrians of various age groups passing through the station exit are identified and detected, and attribute (long-hair, short-hair and the like) differentiation can be performed on the similar pedestrians according to the attributes.

104. And tracking the video frames of the multi-target object detected by the multi-target detection algorithm through a preset feature extraction network and an attention mechanism method, and performing matching test on adjacent video frames to output a multi-target tracking network model.

Under the monitoring of the attention mechanism method, a plurality of pedestrians and/or vehicles can be identified and tracked through a MobilenetV3 feature extraction network based on feature information of the pedestrians and/or vehicles, and then matching tests are carried out on video frames adjacent to the same target in the tracking and identification process, wherein the feature information comprises a plurality of types of feature information. If the pedestrian is detected, the pedestrian detection method can comprise wearing, accessory, human face, human body and the like; if a vehicle is detected, the vehicle type, license plate, color, driving route, etc. may be included. The above-described match test may include calculating a similarity value and/or calculating the correlation of local data between adjacent frames. The training data set comprises a large amount of video data, the initial tracking network model is subjected to multiple times of cyclic training through the large amount of video data, the model is continuously optimized on the previous training result, and finally the multi-target tracking network model can be output.

Optionally, the multi-target tracking network model may be deployed at a camera board end to realize landing of an algorithm and perform quick and accurate identification on targets such as pedestrians and/or vehicles in the video data acquired in real time.

In the embodiment of the invention, a training data set is constructed, wherein the training data set comprises video data; constructing an initial tracking network model, wherein the initial tracking network model comprises a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm comprises a preset feature extraction network and an attention mechanism method; training an initial tracking network model based on a training data set, and performing multi-target detection on video data in the training data set through a multi-target detection algorithm; and tracking the video frames of the multi-target object detected by the multi-target detection algorithm through a preset feature extraction network and an attention mechanism method, and performing matching test on adjacent video frames to output a multi-target tracking network model. The embodiment of the invention reduces the parameter number by providing the MobilenetV3 feature extraction network. And an attention mechanism method is added, so that the feature information extraction capability of the MobilenetV3 feature extraction network can be ensured. And the video frame tracking is carried out on the multi-target object detected by the multi-target detection algorithm through the MobilenetV3 characteristic extraction network, the matching test is carried out on the adjacent video frames, and the tracking accuracy of a plurality of targets can be ensured. Therefore, after the multi-target tracking network model output after multiple times of training is arranged on the ground, the parameter quantity can be reduced while the target detection precision is ensured, the target detection speed is improved, and meanwhile, hardware equipment for supporting calculated quantity and time cost in the operation process can be reduced.

As shown in fig. 2, fig. 2 is a flowchart of another method for training a multi-target tracking network model according to an embodiment of the present invention, which specifically includes the following steps:

201. a training data set is constructed, the training data set including video data.

202. And constructing an initial tracking network model, wherein the initial tracking network model comprises a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm comprises a preset feature extraction network and an attention mechanism method.

203. And carrying out data set processing on the training data set, wherein the data set processing comprises data classification so as to obtain a plurality of groups of sub data sets of different types.

Referring to fig. 2a, fig. 2a is a flowchart of another training method for a multi-target tracking network model according to an embodiment of the present invention. The data set processing can be to mark pedestrians and/or vehicles in the video data and classify the pedestrians and/or vehicles according to different types, and the pedestrians and/or vehicles of the same type can be classified into one type, so that a plurality of types of sub data sets can be obtained, systematization can be facilitated during training according to the sub data sets, and the identification efficiency of the targets of the same type can be improved. When targets of different types are marked, the targets can be marked through the mark symbols of different shapes, and the targets can be directly divided according to different mark symbols during classification. And the plurality of subdata sets can be concentrated, different classes of the subdata sets are divided into a training set, a verification set and a test set, so that a model can be trained on the training set, the model is evaluated on the verification set, and the model is tested on the test set for the last time, so that the finally trained multi-target tracking network model has stronger identification capability.

204. And training the initial tracking network model based on the subdata sets respectively, and performing multi-target detection on the video data output from the subdata sets through a multi-target detection algorithm in the initial tracking network model.

The data in the subdata set are video data, the classified video data are subjected to processes of training, checking, testing and the like on the initial tracking network model for multiple times respectively, multiple pedestrians and/or vehicles in the video data are identified, whether the identification is accurate or not is checked, whether the test output is correct or not is tested, and the like.

205. The method comprises the steps of carrying out video frame tracking on a multi-target object detected by a multi-target detection algorithm through a preset feature extraction network and an attention mechanism method, tracking each frame of video frame of the multi-target object through a preset local data association algorithm, and marking the coordinates of the tracked multi-target object through a marking frame to obtain the coordinates of the marking frame.

The preset local data association algorithm can be a Hungarian algorithm in a multi-target tracking algorithm. The Hungarian algorithm is the most common algorithm for matching partial graphs, and the core of the algorithm is an algorithm for searching an augmented path and using the augmented path to obtain maximum matching of bipartite graphs. In this embodiment, each frame of video frames of multiple pedestrians and/or vehicles can be tracked in real time through the hungarian algorithm, and the detected multiple pedestrians and/or vehicles are labeled in real time through the labeling boxes to obtain the coordinates of the labeling boxes. Due to real-time detection, the coordinates of the labeling frame change along with the position change of the pedestrian and/or the vehicle in the video. The shape of the marking frame may be rectangular or circular.

206. And performing local data association between adjacent frames on the coordinates of the labeling frame to output the tracking data of the optimal tracking matching multi-target object number.

The local data can comprise coordinates of a labeling frame, the coordinates of the labeling frame obtained through real-time labeling by a Hungarian algorithm are used as coordinates of pedestrians and/or vehicles, data association is carried out on coordinate data between adjacent frames of the same target, and multi-target object number tracking data matched with optimal tracking are output. The maximum matching of each pedestrian and/or vehicle is realized, and the accuracy of real-time tracking and identification of each pedestrian and/or vehicle is improved.

207. And filtering the multi-target object tracking data through a preset filtering algorithm to output a multi-target tracking network model.

The preset similarity algorithm may be a Kalman filtering algorithm (Kalman filtering). A typical example of kalman filtering is to predict the coordinates and velocity of the position of an object from a finite set of observations, including noise, of the position of the object, possibly with a bias. For example, for cameras, radars, etc., kalman filtering is commonly used for target tracking. However, the measured values of the position, the speed and the acceleration of the target are always noisy at any time, and the Kalman filtering can be used for trying to remove the influence of noise by utilizing the real-time dynamic information of the tracking data (multi-target object tracking data) of the pedestrians and/or vehicles so as to more accurately estimate the corresponding position information of the pedestrians and/or vehicles tracked in real time. Of course, the position estimate may be an estimate (filter) of the position information of the present pedestrian and/or vehicle, an estimate (prediction) of the position information of the future pedestrian and/or vehicle, or an estimate (interpolation or smoothing) of the position information of the pedestrian and/or vehicle. After continuous training is carried out in the mode, the optimal multi-target tracking network model can be output, and the model is deployed at the camera board end, so that target tracking with higher precision can be realized.

In the embodiment of the invention, the parameter quantity is reduced by providing the MobilenetV3 feature extraction network; and an attention mechanism method is added, so that the feature information extraction capability of the MobilenetV3 feature extraction network can be ensured. In addition, video frame tracking is carried out on the multi-target object detected by the multi-target detection algorithm through a MobilenetV3 feature extraction network, local data association between adjacent frames is carried out through the Hungarian algorithm, maximum matching is achieved for each pedestrian and/or vehicle, and accuracy of real-time tracking and identification of each pedestrian and/or vehicle is improved. And a Kalman filtering algorithm is added, and the Kalman filtering method tries to remove the influence of noise by utilizing the real-time dynamic information of the tracking data (multi-target object tracking data) of the pedestrians and/or vehicles, so that the position information corresponding to the pedestrians and/or vehicles tracked in real time is more accurately estimated. Therefore, after the multi-target tracking network model output after multiple times of training is arranged on the ground, the parameter quantity can be reduced while the target detection precision is ensured, the target detection speed is improved, and meanwhile, hardware equipment for supporting calculated quantity and time cost in the operation process can be reduced.

As shown in fig. 3, fig. 3 is a flowchart of a multi-target tracking and identifying method provided in the embodiment of the present invention, which specifically includes the following steps:

301. and acquiring video data to be identified, wherein the video data to be identified comprises target data and data to be detected.

The video data to be identified may include pedestrian and/or vehicle data collected by a camera in real time, or may be pre-stored and directly input video data. Other obstacle data, such as roadside signs, green lanes, etc., may also be included in the video data to be identified. The target data can be pedestrian and/or vehicle data which are specified by an upper layer and need to be tracked, and the data to be detected can be video data of pedestrians and/or vehicles which are collected in real time through a camera.

302. And inputting the video data to be identified into a multi-target tracking model in any embodiment to perform multi-target tracking identification.

The multi-target tracking model is a trained optimal model and can be deployed at the board end of the camera, after target data and data to be detected are obtained, pedestrians and/or vehicles in the data to be detected can be detected and tracked and identified through the multi-target tracking model, matching calculation is carried out on the pedestrians and/or vehicles in the target data in real time, and whether the pedestrians and/or vehicles in the target data and the data to be detected are the same or not is judged. The matching calculation comprises similarity calculation, similarity matching can be carried out according to the extracted characteristic information of the pedestrians and/or vehicles, and finally the similarity values of the characteristic information of the same pedestrian and/or the same vehicle are integrated to carry out mean value calculation; of course, the feature information with high recognition accuracy can be matched with a higher weight, so that more accurate judgment can be realized, for example: a pupil.

303. And outputting a multi-target identification result, and judging whether target data exist in the data to be detected or not according to the multi-target output result.

And after the similarity value is calculated, outputting a multi-target identification result, and judging whether target data exist in the data to be detected or not according to the multi-target output result, wherein the multi-target identification result comprises the similarity value. Therefore, whether target data exists in the data to be detected or not can be determined according to the size of the similarity value, namely whether the pedestrians and/or vehicles in the video data collected by the camera are consistent or similar is detected. When the multi-target tracking and identifying method provided by the embodiment is applied to numerous fields such as intelligent video monitoring, auxiliary detection, automatic driving, unmanned supermarkets and the like, the accuracy and the speed of identification can be increased.

In the embodiment of the invention, the multi-target tracking identification method can be applied to the multi-target tracking network model in any embodiment. The multi-target tracking network model provided by the embodiment of the invention reduces the parameter quantity by providing the MobilenetV3 feature extraction network. And an attention mechanism method is added, so that the feature information extraction capability of the MobilenetV3 feature extraction network can be ensured. And the video frame tracking is carried out on the multi-target object detected by the multi-target detection algorithm through the MobilenetV3 characteristic extraction network, the matching test is carried out on the adjacent video frames, and the tracking accuracy of a plurality of targets can be ensured. Therefore, after the multi-target tracking network model output after multiple times of training is arranged on the ground, the parameter quantity can be reduced while the target detection precision is ensured, the target detection speed is improved, and meanwhile, hardware equipment for supporting calculated quantity and time cost in the operation process can be reduced. Therefore, the multi-target tracking identification method can also achieve the generated technical effects, and is not repeated herein to avoid repetition.

As shown in fig. 4, fig. 4 is a schematic structural diagram of a training apparatus for a multi-target tracking network model according to an embodiment of the present invention, where the training apparatus 400 for a multi-target tracking network model includes:

a first construction module 401, configured to construct a training data set, where the training data set includes video data;

a second constructing module 402, configured to construct an initial tracking network model, where the initial tracking network model includes a multi-target detection algorithm and a multi-target tracking algorithm, and the multi-target detection algorithm includes a preset feature extraction network and an attention mechanism method;

a target detection module 403, configured to train the initial tracking network model based on the training data set, and perform multi-target detection on the video data in the training data set through a multi-target detection algorithm;

and the tracking identification module 404 is configured to perform video frame tracking on the multi-target object detected by the multi-target detection algorithm through a preset feature extraction network and an attention mechanism method, and perform matching test on adjacent video frames to output a multi-target tracking network model.

Optionally, the preset feature extraction network includes a MobilenetV3 feature extraction network, and the tracking identification module 404 is further configured to extract feature information of the detected multi-target object through the MobilenetV3 feature extraction network under the monitoring of the attention mechanism method, and perform real-time tracking based on the feature information, where the feature information includes multiple types of feature information.

Optionally, as shown in fig. 5, fig. 5 is a schematic structural diagram of another training apparatus for a multi-target tracking network model according to an embodiment of the present invention, where the target detection module 403 includes:

a classification unit 4031, configured to perform data set processing on the training data set, where the data set processing includes data classification to obtain multiple sets of different types of sub data sets;

and the checking unit 4032 is configured to train the initial tracking network models based on the sub data sets, and perform multi-target detection on the video data output from the sub data sets through a multi-target detection algorithm in the initial tracking network models.

Optionally, as shown in fig. 6, fig. 6 is a schematic structural diagram of another training apparatus for a multi-target tracking network model according to an embodiment of the present invention, where the tracking identification module 404 includes:

the labeling unit 4041 is configured to track each frame of video frames of the multi-target object through a preset local data association algorithm, and label coordinates of the tracked multi-target object through a labeling frame;

the data association unit 4042 is used for performing local data association between adjacent frames on the coordinates of the labeling frame to output multi-target object number tracking data matched with optimal tracking;

the filtering unit 4043 is configured to filter the multi-target object tracking data through a preset filtering algorithm to output a multi-target tracking network model.

The training device for the multi-target tracking network model provided by the embodiment of the invention can realize each implementation mode in the training method embodiment of the multi-target tracking network model and corresponding beneficial effects, and is not repeated here for avoiding repetition.

Optionally, as shown in fig. 7, fig. 7 is a schematic structural diagram of a multi-target tracking and identifying apparatus according to an embodiment of the present invention, where a multi-target tracking and identifying apparatus 700 includes:

an obtaining module 701, configured to obtain video data to be identified;

a detection and identification module 702, configured to input video data to be identified into a multi-target tracking model in any embodiment for multi-target tracking identification;

the output module 703 is configured to output a multi-target identification result, and determine whether target data exists in the data to be detected according to the multi-target output result.

The multi-target tracking identification device provided by the embodiment of the invention can realize each implementation mode in the multi-target tracking identification method embodiment and corresponding beneficial effects, and is not repeated here for avoiding repetition.

As shown in fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 800 includes: the system comprises a processor 801, a memory 802, a network interface 803 and a computer program which is stored on the memory 802 and can run on the processor 801, wherein the processor 801 executes the computer program to realize the steps of the training method of the multi-target tracking network model provided by the embodiment.

Specifically, the processor 801 is configured to perform the following steps:

training an initial tracking network model based on a training data set, and performing multi-target detection on video data in the training data set through a multi-target detection algorithm;

and tracking the video frames of the multi-target object detected by the multi-target detection algorithm through a preset feature extraction network and an attention mechanism method, and performing matching test on adjacent video frames to output a multi-target tracking network model.

Optionally, the preset feature extraction network includes a MobilenetV3 feature extraction network, and the step of performing, by the processor 801, video frame tracking on the multi-target object detected by the multi-target detection algorithm through the preset feature extraction network and the attention mechanism method includes:

under the monitoring of an attention mechanism method, feature information extraction is carried out on the detected multi-target object through a MobilenetV3 feature extraction network, real-time tracking is carried out on the basis of the feature information, and the feature information comprises various types of feature information.

Optionally, the training of the initial tracking network model based on the training data set performed by the processor 801, and the step of performing multi-target detection on the video data in the training data set by using the multi-target detection algorithm includes:

and training the initial tracking network model based on the subdata sets respectively, and performing multi-target detection on the video data output from the subdata sets through a multi-target detection algorithm in the initial tracking network model.

Optionally, the step of performing matching test on adjacent video frames by the processor 801 includes:

tracking each frame of video frame of the multi-target object through a preset local data association algorithm, and labeling the coordinates of the tracked multi-target object through a labeling frame to obtain coordinates of the labeling frame;

and filtering the multi-target object tracking data through a preset filtering algorithm to output a multi-target tracking network model.

The electronic device 800 provided by the embodiment of the present invention can implement each implementation manner in the training method embodiment of the multi-target tracking network model, and has corresponding beneficial effects, and for avoiding repetition, details are not repeated here.

It is noted that 801 and 803 with components are shown, but it is understood that not all of the shown components are required and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the electronic device 800 is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable gate array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device 800 includes, but is not limited to, a desktop computer, a notebook, a palm top computer, a cloud server, and other computing devices. The electronic equipment can be in man-machine interaction with a client in a keyboard, a mouse, a remote controller, a touch panel or a voice control device and the like.

The memory 802 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 802 may be an internal storage unit of the electronic device 800, such as a hard disk or a memory of the electronic device 800. In other embodiments, the memory 802 may also be an external storage device of the electronic device 800, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the electronic device 800. Of course, the memory 802 may also include both internal and external memory units of the electronic device 800. In this embodiment, the memory 802 is generally used for storing an operating system and various application software installed in the electronic device 800, such as: program codes of the training method of the multi-target tracking network model and the like. In addition, the memory 802 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 801 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 801 is generally configured to control the overall operation of the electronic device 800. In this embodiment, the processor 801 is configured to run program code stored in the memory 802 or process data, such as program code for running a training method of a multi-objective tracking network model.

The network interface 803 may include a wireless network interface or a wired network interface, and the network interface 803 is generally used to establish a communication connection between an electronic device and other electronic devices.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by the processor 801, the computer program implements each process in the training method for a multi-target tracking network model provided in the embodiment, and can achieve the same technical effect, and is not described herein again to avoid repetition.

It will be understood by those skilled in the art that all or part of the processes in the training method for implementing the multi-target tracking network model according to the embodiments may be implemented by a computer program instructing associated hardware, and the program may be stored in a computer-readable storage medium, and when executed, may include processes such as the embodiments of the methods. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A training method of a multi-target tracking model is characterized by comprising the following steps:

2. The training method of the multi-target tracking network model according to claim 1, wherein the preset feature extraction network comprises a MobilenetV3 feature extraction network, and the step of performing video frame tracking on the multi-target objects detected by the multi-target detection algorithm through the preset feature extraction network and the attention mechanism method comprises:

3. The method for training the multi-target tracking network model according to claim 1, wherein the training of the initial tracking network model based on the training dataset comprises the steps of performing multi-target detection on the video data in the training dataset through the multi-target detection algorithm, and the multi-target detection comprises the steps of:

4. The method for training the multi-target tracking network model according to claim 1, wherein the step of performing the matching test on the adjacent video frames comprises:

5. A multi-target tracking identification method is characterized by comprising the following steps:

acquiring video data to be identified;

inputting the video data to be identified into a multi-target tracking model according to any one of claims 1 to 4 for multi-target tracking identification;

6. A training device for a multi-target tracking network model is characterized by comprising:

7. The training device of the multi-target tracking network model according to claim 6, wherein the tracking recognition module is further configured to extract feature information of the detected multi-target object through the MobilenetV3 feature extraction network under the monitoring of the attention mechanism method, and perform real-time tracking based on the feature information, where the feature information includes multiple types of feature information.

8. A multi-target tracking recognition apparatus, comprising:

the acquisition module is used for acquiring video data to be identified;

a detection identification module, for inputting the video data to be identified into a multi-target tracking model of any one of claims 1-4 for multi-target tracking identification;

9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the training method of the multi-target tracking network model according to any one of claims 1 to 4 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the method for training a multi-target tracking network model according to any one of claims 1 to 4.