CN115690545A - Training target tracking model and target tracking method and device - Google Patents

Training target tracking model and target tracking method and device Download PDF

Info

Publication number
CN115690545A
CN115690545A CN202211424529.7A CN202211424529A CN115690545A CN 115690545 A CN115690545 A CN 115690545A CN 202211424529 A CN202211424529 A CN 202211424529A CN 115690545 A CN115690545 A CN 115690545A
Authority
CN
China
Prior art keywords
target tracking
tracking model
frame
video frame
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211424529.7A
Other languages
Chinese (zh)
Inventor
倪烽
王冠中
党青青
邓凯鹏
赖宝华
刘其文
于佃海
胡晓光
马艳军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202211424529.7A priority Critical patent/CN115690545A/en
Publication of CN115690545A publication Critical patent/CN115690545A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The disclosure provides a training target tracking model and a target tracking method and device, and relates to the field of artificial intelligence, in particular to the field of deep learning. The specific implementation scheme is as follows: acquiring a sample set, wherein the sample comprises a video frame and a real frame; constructing a target tracking model, wherein the head of the target tracking model comprises a cross-over ratio head used for calculating a cross-over ratio loss value; the following training steps are performed: selecting samples from the sample set; inputting a video frame in the selected sample into a target tracking model, and outputting a prediction frame; calculating an original loss value and an intersection-to-intersection ratio loss value according to the difference between a real frame and a prediction frame in the selected sample; if the weighted sum of the original loss value and the intersection-comparison loss value is smaller than a preset threshold value, determining that the training of the target tracking model is finished; otherwise, adjusting the network parameters of the target tracking model and continuing to execute the training step. The embodiment can improve the tracking precision and speed of the generated target tracking model.

Description

Training target tracking model and target tracking method and device
Cross Reference to Related Applications
The application is a divisional application of Chinese patent application, which has an application date of 2021, 03/12/3 and an application number of 202111464709.3 and is named as a method and a device for training a target tracking model and target tracking.
Technical Field
The disclosure relates to the field of artificial intelligence, in particular to the field of deep learning, and specifically relates to a method and a device for training a target tracking model and target tracking.
Background
Multi-Object Tracking (Multi-Object Tracking) is a technique that locates multiple objects of interest given a sequence of video images, and maintains and records the individual ID information and trajectory of the individual between successive frames. The multi-target tracking technology is one of the most important and complex tasks in the field of computer vision, and is applied to the fields of automatic driving, security inspection, smart cities and the like.
Compared with the target detection technology which only outputs the positioning information of the target at the current static moment, the multi-target tracking technology increases the individual ID information of the one-dimensional target, and the relation between frames can be established by utilizing the ID information, so that the same object in the adjacent frames can be identified. From the perspective of an application scene, the difference between the two tasks can be better understood. The scenes for object detection, such as steel bar counting, industrial quality inspection, power inspection, ear detection and the like, only need to detect the state of an object at a certain static moment at a certain point at a certain moment. The target tracking scenes such as intelligent traffic, medical analysis, livestock inventory, military reconnaissance and the like all need to continuously track the continuous motion state of an object, so that the tasks cannot be replaced by target detection.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, storage medium and computer program product for training a target tracking model and target tracking.
According to a first aspect of the present disclosure, there is provided a method of training a target tracking model, comprising: obtaining a sample set, wherein samples in the sample set comprise video frames and real frames for labeling target objects in the video frames; constructing a target tracking model, wherein the head of the target tracking model comprises an intersection ratio head and is used for calculating an intersection ratio loss value; the following training steps are performed: selecting a sample from the sample set; inputting a video frame in the selected sample into the target tracking model, and outputting a prediction frame; calculating an original loss value and an intersection ratio loss value according to the difference between a real frame and the prediction frame in the selected sample; if the weighted sum of the original loss value and the intersection-comparison loss value is smaller than a preset threshold value, determining that the training of the target tracking model is finished; otherwise, adjusting the network parameters of the target tracking model, and continuing to execute the training step.
According to a second aspect of the present disclosure, there is provided a target tracking method, including: acquiring a video frame set to be detected; inputting the set of video frames into a target tracking model trained according to the method of the first aspect, and outputting at least one detection box in each video frame; for each video frame, dividing the detection frame in the video frame into a high frame set and a low frame set according to the score of the detection frame; and for each video frame, performing first matching on the high frame set of the video frame and the determined tracking track, and performing second matching on the tracking track failed in the first matching and the low frame set of the video frame to obtain an updated tracking track.
According to a third aspect of the present disclosure, there is provided an apparatus for training a target tracking model, comprising: an obtaining unit configured to obtain a sample set, wherein a sample in the sample set includes a video frame and a real frame for labeling a target object in the video frame; a construction unit configured to construct a target tracking model, wherein a head of the target tracking model comprises an intersection ratio head for calculating an intersection ratio loss value; a training unit configured to perform the following training steps: selecting a sample from the sample set; inputting a video frame in the selected sample into the target tracking model, and outputting a prediction frame; calculating an original loss value and an intersection ratio loss value according to the difference between a real frame and the prediction frame in the selected sample; if the weighted sum of the original loss value and the intersection-comparison loss value is smaller than a preset threshold value, determining that the training of the target tracking model is finished; and the adjusting unit is configured to adjust the network parameters of the target tracking model and continue to execute the training step if the network parameters are not adjusted.
According to a fourth aspect of the present disclosure, there is provided a target tracking apparatus comprising: acquiring a video frame set to be detected; inputting the video frame set into a target tracking model trained by the device according to the second aspect, and outputting at least one detection box in each video frame; for each video frame, dividing the detection frame in the video frame into a high frame set and a low frame set according to the score of the detection frame; and for each video frame, performing first matching on the high frame set of the video frame and the determined tracking track, and performing second matching on the tracking track failed in the first matching and the low frame set of the video frame to obtain an updated tracking track.
According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first or second aspect.
According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first or second aspect.
According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the first or second aspect.
The method and the device for training the target tracking model and the target tracking can achieve an ideal multi-target tracking effect in different cloud deployment devices, and can better solve the problems of shielding, frequent disappearance, violent object scale change, difficulty in recognizing posture change, incapability of expanding to multiple categories and the like.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a method of training a target tracking model according to the present disclosure;
FIG. 3 is a schematic diagram of one application scenario of a method of training a target tracking model according to the present disclosure;
FIG. 4 is a flow diagram for one embodiment of a method of target tracking according to the present disclosure;
FIG. 5 is a schematic diagram illustrating the architecture of one embodiment of an apparatus for training a target tracking model according to the present disclosure;
FIG. 6 is a schematic block diagram of one embodiment of an apparatus for target tracking according to the present disclosure;
FIG. 7 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 illustrates an exemplary system architecture 100 to which a method of training a target tracking model, an apparatus to train a target tracking model, a method of target tracking, or an apparatus of target tracking of embodiments of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminals 101, 102, a network 103, a database server 104, and a server 105. The network 103 serves as a medium for providing communication links between the terminals 101, 102, the database server 104 and the server 105. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user 110 may use the terminals 101, 102 to interact with the server 105 over the network 103 to receive or send messages or the like. The terminals 101 and 102 may have various client applications installed thereon, such as a model training application, a target tracking application, a shopping application, a payment application, a web browser, an instant messenger, and the like.
Here, the terminals 101 and 102 may be hardware or software. When the terminals 101 and 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III), laptop portable computers, desktop computers, and the like. When the terminals 101 and 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminals 101, 102 are hardware, an image capturing device may be further mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user 110 may capture a video using an image capture device on the terminal 101, 102.
Database server 104 may be a database server that provides various services. For example, a database server may have a sample set stored therein. The sample set contains a large number of samples. Wherein a sample may comprise a video frame and a real box for labeling a target object in the video frame. In this way, the user 110 may also select samples from a set of samples stored by the database server 104 via the terminals 101, 102. Various original target tracking models and various network structures can be stored in the database server and used for recombining and constructing the target tracking models on the basis of the original target tracking models.
The server 105 may also be a server providing various services, such as a background server providing support for various applications displayed on the terminals 101, 102. The background server may train the initial model using samples in the sample set sent by the terminals 101 and 102, and may send the training result (e.g., the generated target tracking model) to the terminals 101 and 102. In this way, the user can apply the generated target tracking model for target tracking.
Here, the database server 104 and the server 105 may be hardware or software. When they are hardware, they may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When they are software, they may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein. Database server 104 and server 105 may also be servers of a distributed system or servers that incorporate a blockchain. Database server 104 and server 105 may also be cloud servers, or smart cloud computing servers or smart cloud hosts with artificial intelligence technology.
It should be noted that the method for training the target tracking model or the method for target tracking provided by the embodiments of the present disclosure is generally performed by the server 105. Accordingly, a device for training a target tracking model or a device for target tracking is also generally provided in the server 105.
It is noted that database server 104 may not be provided in system architecture 100, as server 105 may perform the relevant functions of database server 104.
It should be understood that the number of terminals, networks, database servers, and servers in fig. 1 are merely illustrative. There may be any number of terminals, networks, database servers, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method of training a target tracking model according to the present disclosure is shown. The method for training the target tracking model can comprise the following steps:
step 201, a sample set is obtained.
In this embodiment, the performing agent (e.g., the server 105 shown in fig. 1) of the method of training the target tracking model may obtain the sample set in a variety of ways. For example, the executing entity may obtain the existing sample set stored therein from a database server (e.g., database server 104 shown in fig. 1) via a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal (e.g., terminals 101, 102 shown in FIG. 1). In this way, the executing entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set.
Here, the sample set may include at least one sample. Each sample includes a video frame and a real box for labeling a target object in the video frame. There may be one target object or a plurality of target objects in the video frame. A multi-target tracking model may be trained if a sample set of multiple target objects is used. The video frames in the sample set are truncated from the complete video. The real frames marked in the continuous video frames can form the track of the target object.
Step 202, a target tracking model is constructed.
In the embodiment, the target tracking model can be constructed by improving the existing target tracking model. The initial model may be chosen for the operational capabilities of the terminal device to which the model applies. For example, if the operation capability is strong (the operation capability is greater than the first predetermined capability), a model of an SDE (separation Detection and Embedding) algorithm, which completely separates the two links of Detection and Embedding, may be adopted, and the most representative is a DeepSORT algorithm. If the computing power is weak (the computing power is less than the second predetermined power, and the second predetermined power is less than or equal to the first predetermined power), a model of JDE (Joint Detection and attack program) algorithm can be adopted, such algorithm learns Detection and attack simultaneously in a shared neural network, and a multi-task learning idea is used to set a loss function. Representative algorithms are centrtrack and FairMOT. The design gives consideration to both precision and speed, and can realize high-precision real-time multi-target tracking.
The method adds a cross-over ratio (IoU) head on the basis of the head of an initial target tracking model for calculating a cross-over ratio loss value, as shown in fig. 3. IoU shows the degree of overlap of the bounding box with the ground route. IoU loss calculation model predicts the similarity of a frame and a real frame to obtain a more accurate detection frame position, and improves the detection performance from the perspective of the detected direct evaluation index IoU, thereby improving the upper limit of the precision of multi-target tracking.
In step 203, a sample is selected from the sample set.
In this embodiment, the executing subject may select a sample from the sample set obtained in step 201, and perform the training steps from step 203 to step 207. The selection manner and the number of samples are not limited in the present disclosure. For example, at least one sample may be randomly selected, or a sample with better sharpness (i.e., higher pixels) of the sample video frame may be selected from the samples. It is also possible to use oversized batch size and increase the number of training rounds if memory permits.
And step 204, inputting the video frame in the selected sample into a target tracking model, and outputting a prediction frame.
In this embodiment, the executive may input the video frame of the sample selected in step 203 into the target tracking model. By detecting and analyzing the video frame, a prediction frame for the position and classification of the target object can be output. The target tracking model can detect multiple target objects and can also detect single target objects. The target object may be a pedestrian, a vehicle, or the like. The prediction box indicates not only the position of the target object but also the category, the prediction score (i.e., the reliability of the detection result).
Step 205, calculating an original loss value and an intersection ratio loss value according to the difference between the real frame and the prediction frame in the selected sample, and calculating a total loss value according to a weighted sum of the original loss value and the intersection ratio loss value.
In the present embodiment, the original Loss value refers to the Loss value of the original target tracking model, and may include heatmap Loss (heatmap Loss), offset and Size Loss (Offset and Size Loss), and Identity Embedding Loss (Identity Embedding Loss). Where the heatmap head loss is computed by heatmap head in fig. 3, and the offset and size loss are computed by box size head and center offset head. The identification embedding loss is calculated from the Re-ID header. The calculation method of the original loss value is the prior art, and therefore, the description is not repeated. The present application introduces an IoU head to calculate IoU loss. IoU is the intersection ratio of the prediction box to the real box, ioU is larger and IoU loss value is smaller. If a plurality of target objects exist, ioU corresponding to each target object is calculated and accumulated together to be IoU of the video frame. A mathematical relationship of IoU loss value of 1/IoU, or 1-IoU, etc. may be set such that the greater IoU, the smaller the IoU loss value.
The weighted sum of the original loss value and the cross-over ratio loss value is taken as the total loss value. The weights of the different loss terms may be set individually empirically.
And step 206, if the total loss value is smaller than a preset threshold value, determining that the training of the target tracking model is finished.
In this embodiment, when the total loss value reaches the predetermined threshold, the prediction box may be considered to be close to or approximate the real box, indicating that the training of the target tracking model is completed. The predetermined threshold may be set according to actual requirements.
It should be noted that if a plurality of (at least two) samples are selected in step 203, the executive agent may compare the total loss value of each sample with a predetermined threshold value respectively. It can thus be determined whether the total loss value for each sample reaches a predetermined threshold.
Step 207, if the total loss value is not less than the predetermined threshold, adjusting the network parameters of the target tracking model, and continuing to execute steps 203-207.
In this embodiment, if the execution subject determines that the target tracking model is not trained, the relevant parameters in the target tracking model may be adjusted. For example, the weights in each convolution layer in the target tracking model are modified using a back propagation technique. And may return to step 203 to re-select samples from the sample set. So that the above 203-207 can be continuously performed.
It should be noted that the selection mode is not limited in the present disclosure. For example, in the case where there are a large number of samples in the sample set, the execution subject may select a non-selected sample from the sample set.
In the method for training the target tracking model in this embodiment, by improving the structure of the original target tracking model, ioU loss is introduced, the similarity between the model prediction frame and the real frame is calculated to obtain a more accurate detection frame position, and the detection performance is improved from the angle of the detected direct evaluation index IoU, so that the upper limit of the multi-target tracking precision is improved.
In some optional implementations of this embodiment, constructing the target tracking model includes: obtaining an original target tracking model; acquiring the computing capacity of a terminal applying a target tracking model; and if the operational capability is greater than the first preset capability, replacing a backbone network in the original target tracking model by HarDNet-85 to obtain the constructed target tracking model. The computing power of the terminal is related to hardware such as the computational performance peaks and memory bandwidth of the GPU. When the cloud computing power is high, harDNet-85 can be replaced to be used as a backbone network with higher precision, and the precision of the network is obviously improved in a CenterNet target detection network. Thereby improving the tracking accuracy without affecting the tracking speed.
In some optional implementations of this embodiment, constructing the target tracking model includes: obtaining an original target tracking model; acquiring the computing capacity of a terminal applying a target tracking model; and if the operational capability is smaller than the second preset capability, replacing the backbone network in the original target tracking model by using HRNetV2-W18, and replacing the neck in the original target tracking model by using the deep fusion characteristic pyramid structure to obtain the constructed target tracking model. When the cloud computing power is low, the HRNetV2-W18 can be selected as a lightweight backbone network, and meanwhile, due to the network structure characteristic of fusion of the depth layer characteristics, the network has good performance in a scene with a small or dense target. If the HRNetV2-W18 is selected as the lightweight backbone network, several layers of features from the network need to be fused, so that a DLA-FPN (Deep Layer Aggregation-Feature Pyramid) structure similar to a DLA structure and continuously stacked and fused from a high Layer to a shallow Layer is designed, the fusion of the features is better, and the detection and tracking of targets with different scales are more facilitated.
In some optional implementations of this embodiment, constructing the target tracking model includes: the deformable convolution in the target tracking model is removed. The FairMOT Neck part is optimized, in the original network structure, in order to strengthen the information contained in the shallow feature, the output of different levels is up-sampled into a feature map with 1/4 input resolution, and the up-sampling uses the variability convolution to further strengthen the semantic representation of the feature. Due to the design, prediction is time-consuming when the system is actually deployed on a cloud device, and therefore when cloud computing power is low, the deformable convolution is removed. Therefore, the time consumption of prediction is reduced, and the real-time performance of target tracking is improved.
In some optional implementations of this embodiment, constructing the target tracking model includes: the normal convolution in the header in the target tracking model is replaced with a depth separable convolution. The original common convolution can be replaced by the LiteConv depth separable convolution on the head part, certain precision is improved, and meanwhile parameter calculation amount is slightly reduced.
In some optional implementations of this embodiment, a video frame in a sample of the sample set includes at least one target object; and constructing a target tracking model, comprising: setting full connection layers (FC) with different dimensions for different categories at the re-recognition head of the target tracking model. In order to extend to multi-class tracking, the ReID head part can be divided according to classes, and different classes are provided with full connection layers with different dimensions, so that the Re-ID loss during training can not be confused among the classes and can be stably converged in the classes.
In some optional implementations of this embodiment, adjusting the network parameters of the target tracking model includes: and adjusting the network parameters of the target tracking model in a synchronous batch normalization mode and a moving average mode. The training part adds two methods of Synchronized Batch Normalization and EMA (iterative moving average) into FairMOT, and uses oversized Batch size and increases the number of training rounds under the permission of video memory. The cross-card synchronous batch normalization can be performed by using global samples, so that the batch size is increased, and the training effect is not influenced by the number of GPUs. The EMA is used to estimate a local mean of a variable such that updates to the variable are related to historical values over a period of time. The EMA can be regarded as the average value of the values of the variables in the past period, compared with the direct assignment of the variables, the value obtained by the sliding average is more gentle and smooth on the image, the jitter is smaller, and the fluctuation of the sliding average value is not large due to the abnormal value of a certain time.
The above strategy can significantly improve the accuracy of the model on the premise of not increasing the time consumption of model prediction.
With further reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for training the target tracking model according to the present embodiment. In the application scenario of fig. 3, a network receives input of a video sequence, sequentially passes through a Backbone network Backbone and a Neck portion Neck, then respectively obtains detection information and Re-ID information by detecting a Head and a Re-ID Head, and then is merged into a Tracker (matching module) for data association, so as to finally obtain a track. The part a is the optimization on the model structure, and comprises the optimization of a backbone network part by adopting HarDNet-85 and HRNetV2-W18, DLA-FPN, a Head part and a Loss part. Section b shows the optimization of trajectory matching. The optimization with respect to the training strategy is generally independent of the structure and post-processing strategy and is not shown on the schematic. And c, optimizing the detection Head, including newly designing an IoU Head. Part d is a diagram of extended multi-classes of Re-ID Head.
Referring to fig. 4, a flowchart 400 of one embodiment of a target tracking method provided by the present disclosure is shown. The target tracking method may include the steps of:
step 401, a video frame set to be detected is obtained.
In this embodiment, an executing subject of the target tracking method (for example, the server 105 shown in fig. 1) may acquire the set of video frames to be detected in various ways. For example, the execution agent may obtain the video stored therein from a database server (e.g., database server 104 shown in fig. 1) via a wired connection or a wireless connection. As another example, the executing entity may also receive video captured by a terminal (e.g., terminals 101, 102 shown in fig. 1) or other device. The video may be a surveillance video shot by some surveillance cameras.
Step 402, inputting a video frame set into a target tracking model, and outputting at least one detection frame in each video frame.
In this embodiment, the execution subject may input the video acquired in step 401 into the target tracking model, thereby outputting the detection frame. The detection box may be information for describing a target object in the image. For example, the detection box may identify whether a target object is detected in the image, and a location, a category, and the like in a case where the target object is detected.
In this embodiment, the target tracking model may be generated using the method described above in the embodiment of fig. 2. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.
In step 403, for each video frame, dividing the detection frame in the video frame into a high frame set and a low frame set according to the score of the detection frame.
In this embodiment, a plurality of target objects can be detected and marked by the detection frame, but the score of each detection frame is different, and the higher the score is, the more credible the detection frame is. The target tracking model has an output threshold of the detection box, and if the output threshold is lower than one output threshold, the detection box is regarded as invalid detection and is not output. The output threshold may be set lower than a conventional value, for example, the output threshold is set to 0.5 conventionally, and the target tracking model for the present application is set to 0.3. Then the detection box greater than or equal to the original output threshold of 0.5 may be considered as the high-split box and the detection box lower than the original output threshold but greater than the improved output threshold of 0.3 may be considered as the low-split box.
And step 404, for each video frame, performing first matching on the high frame set of the video frame and the previously determined tracking track, and performing second matching on the tracking track failed in the first matching and the low frame set of the video frame to obtain an updated tracking track.
In this embodiment, if there is no well-matched tracking track, matching is started from the second frame, and the second frame is matched with the first frame to generate the tracking track. And if the matched tracking track exists before, matching the high-resolution frame in the current video frame with the determined tracking track for the first time, and associating the successfully matched high-resolution frame with the tracking track to generate a new track. And matching the tracking track failed in the first matching with the low sub-frame set of the video frame for the second time, and associating the low sub-frame successfully matched with the tracking track to obtain an updated tracking track. The matching method is the prior art, such as the SORT algorithm, IOU-tracker, etc., and therefore, the description thereof is omitted. The SORT firstly predicts the future target positions by using Kalman filtering, calculates the target overlapping of the target positions and the targets in the future frames, and finally adopts Hungarian algorithm to match the targets to track. The IOU-tracker directly associates detection in adjacent frames through spatial overlapping thereof without using Kalman filtering, thereby achieving a tracking effect.
It should be noted that the target tracking method in this embodiment may be used to test the target tracking model generated in each of the above embodiments. And then the target tracking model can be continuously optimized according to the test result. The method may also be a practical application method of the target tracking model generated in the above embodiments. The target tracking model generated by the embodiments is adopted to track the target, which is beneficial to improving the performance of target tracking. If more target object tracks are found, the found target object tracks are more accurate, and the like.
In some optional implementations of this embodiment, the first matching and the second matching include a cross-over matching, and a threshold of the target tracking model output detection box is smaller than a threshold of the original target tracking model output detection box. After the network outputs and obtains the detection frame information and the Embedding characteristic, a IoU matching part is additionally added to an original matching module, and meanwhile, the score threshold of the output detection frame is reduced. And detecting a target boundary frame of the current image, predicting another group of target boundary frames by using Kalman filtering, and matching the two groups of boundary frames through IoU scores so as to complete multi-target tracking. The output track result can be ensured to have high recall, and the FP (False Positive) effect is not increased.
With continuing reference to FIG. 5, as an implementation of the methods illustrated in the above figures, the present disclosure provides one embodiment of an apparatus for training a target tracking model. The embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices.
As shown in fig. 5, the apparatus 500 for training a target tracking model according to the present embodiment may include: an acquisition unit 501, a construction unit 502, a training unit 503 and an adjustment unit 504. The obtaining unit 501 is configured to obtain a sample set, where samples in the sample set include a video frame and a real frame for labeling a target object in the video frame; a constructing unit 502 configured to construct a target tracking model, wherein a header of the target tracking model comprises an intersection ratio header for calculating an intersection ratio loss value; a training unit 503 configured to perform the following training steps: selecting a sample from the sample set; inputting a video frame in the selected sample into the target tracking model, and outputting a prediction frame; calculating an original loss value and an intersection ratio loss value according to the difference between a real frame and the prediction frame in the selected sample; if the weighted sum of the original loss value and the intersection ratio loss value is smaller than a preset threshold value, determining that the training of the target tracking model is finished; an adjusting unit 504 configured to adjust the network parameters of the target tracking model otherwise, and continue to perform the training step.
In some optional implementations of this embodiment, the construction unit 502 is further configured to: obtaining an original target tracking model; acquiring the computing capacity of a terminal applying the target tracking model; and if the operational capability is greater than the first preset capability, replacing a backbone network in the original target tracking model by HarDNet-85 to obtain the constructed target tracking model.
In some optional implementations of this embodiment, the construction unit 502 is further configured to: obtaining an original target tracking model; acquiring the computing capacity of a terminal applying the target tracking model; and if the operational capability is smaller than a second preset capability, replacing a backbone network in the original target tracking model by using HRNetV2-W18, and replacing a neck in the original target tracking model by using a deep fusion characteristic pyramid structure to obtain the constructed target tracking model.
In some optional implementations of this embodiment, the construction unit 502 is further configured to: removing the deformable convolution in the target tracking model.
In some optional implementations of this embodiment, the construction unit 502 is further configured to: replacing a normal convolution in a header in the target tracking model with a depth separable convolution.
In some optional implementations of this embodiment, a video frame in a sample of the sample set includes at least one target object; and the construction unit 502 is further configured to: and setting full connection layers with different dimensions for different categories in the ReID head of the target tracking model.
In some optional implementations of this embodiment, the adjusting unit 504 is further configured to: and adjusting the network parameters of the target tracking model in a synchronous batch normalization mode and a moving average mode.
With continued reference to FIG. 6, the present disclosure provides one embodiment of a target tracking device as an implementation of the methods illustrated in the above figures. The embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices.
As shown in fig. 6, the target tracking apparatus 600 of the present embodiment may include: an acquisition unit 601, a detection unit 602, a grouping unit 603, and a matching unit 604. The acquiring unit 601 is configured to acquire a set of video frames to be detected; a detection unit 602 configured to input the set of video frames into a target tracking model trained according to the apparatus 500, and output at least one detection box in each video frame; a grouping unit 603 configured to divide, for each video frame, a detection frame in the video frame into a high frame set and a low frame set according to a score of the detection frame; the matching unit 604 is configured to, for each video frame, perform first matching on the high frame set of the video frame and the previously determined tracking track, and perform second matching on the tracking track that fails to be matched for the first time and the low frame set of the video frame, so as to obtain an updated tracking track.
In some optional implementations of this embodiment, the first matching and the second matching include a cross-over matching, and a threshold of the target tracking model output detection box is smaller than a threshold of the original target tracking model output detection box.
In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the common customs of public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of flows 200 or 400.
A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of flow 200 or 400.
A computer program product comprising a computer program which, when executed by a processor, implements the method of flow 200 or 400.
FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
A number of components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as the method of training the target tracking model. For example, in some embodiments, the method of training the target tracking model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into RAM703 and executed by the computing unit 701, one or more steps of the method of training a target tracking model described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of training the target tracking model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (14)

1. A method of training a target tracking model, comprising:
obtaining a sample set, wherein samples in the sample set comprise video frames and real frames for labeling target objects in the video frames;
constructing a target tracking model, wherein the head of the target tracking model comprises an intersection ratio head and is used for calculating an intersection ratio loss value;
the following training steps are performed: selecting a sample from the sample set; inputting a video frame in the selected sample into the target tracking model, and outputting a prediction frame; calculating an original loss value and an intersection ratio loss value according to the difference between a real frame and the prediction frame in the selected sample; if the weighted sum of the original loss value and the intersection ratio loss value is smaller than a preset threshold value, determining that the training of the target tracking model is finished;
otherwise, adjusting the network parameters of the target tracking model, and continuing to execute the training step;
wherein the constructing of the target tracking model comprises:
obtaining an original target tracking model;
acquiring the computing capacity of a terminal applying the target tracking model;
if the operational capability is larger than the first preset capability, replacing a backbone network in the original target tracking model by HarDNet-85 to obtain a constructed target tracking model;
replacing a normal convolution in a header in the target tracking model with a depth separable convolution.
2. The method of claim 1, wherein the constructing a target tracking model comprises:
obtaining an original target tracking model;
acquiring the computing capacity of a terminal applying the target tracking model;
and if the operational capability is smaller than a second preset capability, replacing a backbone network in the original target tracking model by using HRNetV2-W18, and replacing a neck in the original target tracking model by using a deep fusion characteristic pyramid structure to obtain the constructed target tracking model.
3. The method of claim 2, wherein the constructing a target tracking model comprises:
removing the deformable convolution in the target tracking model.
4. The method of claim 1, wherein said adjusting network parameters of said target tracking model comprises:
and adjusting the network parameters of the target tracking model in a synchronous batch normalization mode and a moving average mode.
5. A target tracking method, comprising:
acquiring a video frame set to be detected;
inputting the set of video frames into a target tracking model trained according to the method of any one of claims 1-4, outputting at least one detection box in each video frame;
for each video frame, dividing the detection frame in the video frame into a high frame set and a low frame set according to the score of the detection frame;
and for each video frame, performing first matching on the high frame set of the video frame and the determined tracking track, and performing second matching on the tracking track failed in the first matching and the low frame set of the video frame to obtain an updated tracking track.
6. The method of claim 5, wherein the first and second matches comprise a cross-over match, and a threshold of the target tracking model output detection box is less than a threshold of an original target tracking model output detection box.
7. An apparatus for training a target tracking model, comprising:
an obtaining unit configured to obtain a sample set, wherein a sample in the sample set includes a video frame and a real frame for labeling a target object in the video frame;
a construction unit configured to construct a target tracking model, wherein a head of the target tracking model comprises an intersection ratio head for calculating an intersection ratio loss value;
a training unit configured to perform the following training steps: selecting a sample from the sample set; inputting a video frame in the selected sample into the target tracking model, and outputting a prediction frame; calculating an original loss value and an intersection ratio loss value according to the difference between a real frame and the prediction frame in the selected sample; if the weighted sum of the original loss value and the intersection ratio loss value is smaller than a preset threshold value, determining that the training of the target tracking model is finished;
an adjusting unit configured to adjust network parameters of the target tracking model and continue to perform the training step if the network parameters are not adjusted;
wherein the construction unit is further configured to:
obtaining an original target tracking model;
acquiring the computing capacity of a terminal applying the target tracking model;
if the operational capability is larger than the first preset capability, replacing a backbone network in the original target tracking model by HarDNet-85 to obtain a constructed target tracking model;
replacing a normal convolution in a header in the target tracking model with a depth separable convolution.
8. The apparatus of claim 7, wherein the construction unit is further configured to:
acquiring an original target tracking model;
acquiring the computing capacity of a terminal applying the target tracking model;
and if the operational capability is smaller than a second preset capability, replacing a backbone network in the original target tracking model by using HRNetV2-W18, and replacing a neck in the original target tracking model by using a deep fusion characteristic pyramid structure to obtain the constructed target tracking model.
9. The apparatus of claim 8, wherein the construction unit is further configured to:
removing the deformable convolution in the target tracking model.
10. The apparatus of claim 7, wherein the adjustment unit is further configured to:
and adjusting the network parameters of the target tracking model in a synchronous batch normalization mode and a moving average mode.
11. An object tracking device, comprising:
an acquisition unit configured to acquire a set of video frames to be detected;
a detection unit configured to input the set of video frames into a target tracking model trained by the apparatus according to any one of claims 7-10, and output at least one detection box in each video frame;
a grouping unit configured to divide, for each video frame, a detection frame in the video frame into a high frame set and a low frame set according to a score of the detection frame;
and the matching unit is configured to perform first matching on the high frame set of the video frame and the previously determined tracking track and perform second matching on the tracking track failed in the first matching and the low frame set of the video frame to obtain an updated tracking track for each video frame.
12. The apparatus of claim 11, wherein the first and second matches comprise cross-over matches, and a threshold of the target tracking model output detection box is less than a threshold of an original target tracking model output detection box.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202211424529.7A 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device Pending CN115690545A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211424529.7A CN115690545A (en) 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111464709.3A CN114169425B (en) 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device
CN202211424529.7A CN115690545A (en) 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202111464709.3A Division CN114169425B (en) 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device

Publications (1)

Publication Number Publication Date
CN115690545A true CN115690545A (en) 2023-02-03

Family

ID=80482729

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111464709.3A Active CN114169425B (en) 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device
CN202211424529.7A Pending CN115690545A (en) 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202111464709.3A Active CN114169425B (en) 2021-12-03 2021-12-03 Training target tracking model and target tracking method and device

Country Status (1)

Country Link
CN (2) CN114169425B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309696B (en) * 2022-12-23 2023-12-01 苏州驾驶宝智能科技有限公司 Multi-category multi-target tracking method and device based on improved generalized cross-over ratio
CN115908498B (en) * 2022-12-27 2024-01-02 清华大学 Multi-target tracking method and device based on category optimal matching

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903312A (en) * 2019-01-25 2019-06-18 北京工业大学 A kind of football sportsman based on video multi-target tracking runs distance statistics method
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
CN112232411A (en) * 2020-10-15 2021-01-15 浙江凌图科技有限公司 Optimization method of HarDNet-Lite on embedded platform
CN112288770A (en) * 2020-09-25 2021-01-29 航天科工深圳(集团)有限公司 Video real-time multi-target detection and tracking method and device based on deep learning
CN112837297A (en) * 2021-02-08 2021-05-25 福建医科大学附属协和医院 Progressive multi-scale craniofacial bone fracture detection method
CN112836657A (en) * 2021-02-08 2021-05-25 中国电子科技集团公司第三十八研究所 Pedestrian detection method and system based on lightweight YOLOv3
CN112883819A (en) * 2021-01-26 2021-06-01 恒睿(重庆)人工智能技术研究院有限公司 Multi-target tracking method, device, system and computer readable storage medium
CN112926410A (en) * 2021-02-03 2021-06-08 深圳市维海德技术股份有限公司 Target tracking method and device, storage medium and intelligent video system
CN113033661A (en) * 2021-03-25 2021-06-25 桂林电子科技大学 Target detection method based on embedded platform characteristic improvement
CN113034545A (en) * 2021-03-26 2021-06-25 河海大学 Vehicle tracking method based on CenterNet multi-target tracking algorithm
CN113052876A (en) * 2021-04-25 2021-06-29 合肥中科类脑智能技术有限公司 Video relay tracking method and system based on deep learning
CN113495575A (en) * 2021-08-18 2021-10-12 北京航空航天大学 Unmanned aerial vehicle autonomous landing visual guidance method based on attention mechanism
CN113537106A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Fish feeding behavior identification method based on YOLOv5
CN113674321A (en) * 2021-08-25 2021-11-19 燕山大学 Cloud-based multi-target tracking method under surveillance video
CN113724293A (en) * 2021-08-23 2021-11-30 上海电科智能系统股份有限公司 Vision-based intelligent internet public transport scene target tracking method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055854B2 (en) * 2018-08-23 2021-07-06 Seoul National University R&Db Foundation Method and system for real-time target tracking based on deep learning
CN111626350B (en) * 2020-05-25 2021-05-18 腾讯科技(深圳)有限公司 Target detection model training method, target detection method and device
CN113034548B (en) * 2021-04-25 2023-05-26 安徽科大擎天科技有限公司 Multi-target tracking method and system suitable for embedded terminal
CN113378760A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Training target detection model and method and device for detecting target

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109903312A (en) * 2019-01-25 2019-06-18 北京工业大学 A kind of football sportsman based on video multi-target tracking runs distance statistics method
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
CN112288770A (en) * 2020-09-25 2021-01-29 航天科工深圳(集团)有限公司 Video real-time multi-target detection and tracking method and device based on deep learning
CN112232411A (en) * 2020-10-15 2021-01-15 浙江凌图科技有限公司 Optimization method of HarDNet-Lite on embedded platform
CN112883819A (en) * 2021-01-26 2021-06-01 恒睿(重庆)人工智能技术研究院有限公司 Multi-target tracking method, device, system and computer readable storage medium
CN112926410A (en) * 2021-02-03 2021-06-08 深圳市维海德技术股份有限公司 Target tracking method and device, storage medium and intelligent video system
CN112836657A (en) * 2021-02-08 2021-05-25 中国电子科技集团公司第三十八研究所 Pedestrian detection method and system based on lightweight YOLOv3
CN112837297A (en) * 2021-02-08 2021-05-25 福建医科大学附属协和医院 Progressive multi-scale craniofacial bone fracture detection method
CN113033661A (en) * 2021-03-25 2021-06-25 桂林电子科技大学 Target detection method based on embedded platform characteristic improvement
CN113034545A (en) * 2021-03-26 2021-06-25 河海大学 Vehicle tracking method based on CenterNet multi-target tracking algorithm
CN113052876A (en) * 2021-04-25 2021-06-29 合肥中科类脑智能技术有限公司 Video relay tracking method and system based on deep learning
CN113537106A (en) * 2021-07-23 2021-10-22 仲恺农业工程学院 Fish feeding behavior identification method based on YOLOv5
CN113495575A (en) * 2021-08-18 2021-10-12 北京航空航天大学 Unmanned aerial vehicle autonomous landing visual guidance method based on attention mechanism
CN113724293A (en) * 2021-08-23 2021-11-30 上海电科智能系统股份有限公司 Vision-based intelligent internet public transport scene target tracking method and system
CN113674321A (en) * 2021-08-25 2021-11-19 燕山大学 Cloud-based multi-target tracking method under surveillance video

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JIANCHENG ZOU 等: "An Improved Object Detection Algorithm Based on CenterNet", ARTIFICIAL INTELLIGENCE AND SECURITY, 9 July 2021 (2021-07-09), pages 455 - 467, XP047602879, DOI: 10.1007/978-3-030-78609-0_39 *
YIFU ZHANG 等: "ByteTrack: Multi-Object Tracking by Associating Every Detection Box", ARXIV, 14 October 2021 (2021-10-14), pages 1 - 13 *
YIFU ZHANG 等: "FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking", ARXIV, 20 October 2021 (2021-10-20), pages 1 - 19 *
刘海莹 等: "基于方向自适应检测器的输电线路设备检测方法", 电网技术, 20 April 2021 (2021-04-20), pages 1 - 9 *
许延雷 等: "基于改进CenterNet的航拍图像目标检测算法", 激光与光电子学进展, vol. 58, no. 20, 25 October 2021 (2021-10-25), pages 1 - 2 *
齐榕 等: "基于YOLOv3的轻量级目标检测网络", 计算机应用与软件, vol. 37, no. 10, 12 October 2020 (2020-10-12), pages 208 - 213 *

Also Published As

Publication number Publication date
CN114169425B (en) 2023-02-03
CN114169425A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
EP4044117A1 (en) Target tracking method and apparatus, electronic device, and computer-readable storage medium
CN113657465B (en) Pre-training model generation method and device, electronic equipment and storage medium
US20180114071A1 (en) Method for analysing media content
CN113221677B (en) Track abnormality detection method and device, road side equipment and cloud control platform
CN114169425B (en) Training target tracking model and target tracking method and device
CN113971751A (en) Training feature extraction model, and method and device for detecting similar images
CN113642431A (en) Training method and device of target detection model, electronic equipment and storage medium
CN109902681B (en) User group relation determining method, device, equipment and storage medium
CN111985374A (en) Face positioning method and device, electronic equipment and storage medium
CN113177968A (en) Target tracking method and device, electronic equipment and storage medium
CN113313053A (en) Image processing method, apparatus, device, medium, and program product
CN112862005A (en) Video classification method and device, electronic equipment and storage medium
Han et al. Dr. vic: Decomposition and reasoning for video individual counting
CN113947188A (en) Training method of target detection network and vehicle detection method
CN116129328A (en) Method, device, equipment and storage medium for detecting carryover
CN112489077A (en) Target tracking method and device and computer system
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN113569911A (en) Vehicle identification method and device, electronic equipment and storage medium
CN106934339B (en) Target tracking and tracking target identification feature extraction method and device
CN115937950A (en) Multi-angle face data acquisition method, device, equipment and storage medium
CN115311680A (en) Human body image quality detection method and device, electronic equipment and storage medium
CN113569912A (en) Vehicle identification method and device, electronic equipment and storage medium
CN114419428A (en) Target detection method, target detection device and computer readable storage medium
CN113344121A (en) Method for training signboard classification model and signboard classification
CN112183431A (en) Real-time pedestrian number statistical method and device, camera and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination