CN111161311A - Visual multi-target tracking method and device based on deep learning - Google Patents
Visual multi-target tracking method and device based on deep learning Download PDFInfo
- Publication number
- CN111161311A CN111161311A CN201911252433.5A CN201911252433A CN111161311A CN 111161311 A CN111161311 A CN 111161311A CN 201911252433 A CN201911252433 A CN 201911252433A CN 111161311 A CN111161311 A CN 111161311A
- Authority
- CN
- China
- Prior art keywords
- tracking
- target
- image
- cross
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000013135 deep learning Methods 0.000 title claims abstract description 30
- 230000000007 visual effect Effects 0.000 title claims abstract description 29
- 238000001514 detection method Methods 0.000 claims abstract description 94
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 15
- 238000010586 diagram Methods 0.000 claims description 84
- 230000004044 response Effects 0.000 claims description 68
- 238000004422 calculation algorithm Methods 0.000 claims description 21
- 238000012216 screening Methods 0.000 claims description 21
- 230000009466 transformation Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 8
- 230000001629 suppression Effects 0.000 claims description 5
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 abstract description 7
- 238000005516 engineering process Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a visual multi-target tracking method and device based on deep learning, wherein the method comprises the following steps: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model, recording coordinate position information and acquiring a corresponding template image; acquiring images of each frame except the 1 st frame in the video as images of a region to be searched; and inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network so as to obtain a tracking result of the tracking target. According to the visual multi-target tracking method and device based on deep learning, the template images corresponding to the tracking targets and the images of the areas to be searched, which are acquired by the target detection network model, are respectively input into the target tracking network model constructed by the twin convolutional neural network, so that the tracking results of the tracking targets corresponding to the template images are acquired, the calculation amount is low, and multi-target real-time and accurate tracking is achieved.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a visual multi-target tracking method and device based on deep learning.
Background
Visual target tracking is a hot problem in the field of computer vision research, and with the rapid development of the development of computer technology, the target tracking technology is also greatly improved. With the rapid rise of artificial intelligence in recent years, the research of target tracking technology is receiving more and more attention.
The deep learning technology has strong characteristic representation capability, and obtains better effect than the traditional method in the applications of image classification, object recognition, natural language processing and the like, thereby gradually becoming the mainstream technology of image video research. The tracking method based on deep learning is an important branch in the target tracking method, and the appearance characteristic and the motion characteristic of the target are automatically learned and tracked by a model by utilizing the advantage of end-to-end training of a deep convolutional network, so that high-quality robust tracking is realized.
In recent years, related reports on multi-target tracking are also found. However, the multi-target tracking method disclosed in the prior art generally has a large calculation amount, and cannot realize real-time tracking, so that the tracking effect is poor.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a visual multi-target tracking method and apparatus based on deep learning.
In a first aspect, an embodiment of the present invention provides a visual multi-target tracking method based on deep learning, including: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
Further, the target detection network model is a YOLOv3 network model.
Further, the obtaining of the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model includes: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram; obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph; and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram. Further, the performing a cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a feature map of a cross-correlation operation result includes: and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
Further, the cross-correlation operation result characteristic diagram comprises a first cross-correlation operation result characteristic diagram and a second cross-correlation operation result characteristic diagram; the cross-correlation operation is performed on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram, and the method comprises the following steps: performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph; the method for obtaining the feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature map row to respectively obtain a classification branch response map and a regression branch response map includes: obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
Further, the obtaining a tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map includes: sorting a plurality of target detection frames corresponding to the tracking target through the classification branch response graph; and predicting the boundary frame of each target detection frame through the regression branch response graph, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm.
Further, the screening out a plurality of target detection frames corresponding to the tracking target through the sorting of the classification branch response graph includes: screening out a plurality of target detection frames corresponding to the tracking target through the classification branch response graph, and sequencing the target detection frames through a cosine window and a scale punishment; the preset screening algorithm is a non-maximum suppression algorithm.
In a second aspect, an embodiment of the present invention provides a visual multi-target tracking device based on deep learning, including: a template image acquisition module to: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; the image acquisition module of the area to be searched is used for: acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; a tracking result obtaining module configured to: respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
According to the visual multi-target tracking method and device based on deep learning provided by the embodiment of the invention, the candidate detection frame of the tracking target is obtained in real time by utilizing the target detection network model, so that the corresponding template image is obtained, the template image corresponding to each tracking target and the image of the area to be searched are respectively input into the target tracking network model constructed by the twin convolutional neural network, and the tracking result of the tracking target corresponding to the template image is obtained according to the output of the target tracking network model, so that the calculation amount is low, and the real-time and accurate tracking of the multi-target is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a depth learning-based visual multi-target tracking method according to an embodiment of the present invention;
FIG. 2 is a schematic processing flow diagram of a target tracking network model in the deep learning-based visual multi-target tracking method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a deep learning-based visual multi-target tracking device according to an embodiment of the present invention;
fig. 4 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a visual multi-target tracking method based on deep learning according to an embodiment of the present invention. As shown in fig. 1, the method includes:
102, acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched;
103, respectively inputting the template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
The target detection network model is used for carrying out target detection and carrying out target detection on a preset tracking target aiming at each frame of image in the video. Taking a tracking target as an example, as time goes by, the tracking target in the video frame changes, for example, some tracking targets disappear, and new tracking targets are added. Therefore, the target detection is carried out on each frame of image through the target detection network model, and the real-time update of the tracking target can be realized.
Specifically, in the process of target detection, the visual multi-target tracking device based on deep learning sequentially obtains candidate detection frames of a tracked target in a current video frame through a target detection network model according to the frame sequence of the video, records coordinate position information of the candidate detection frames, and obtains template images corresponding to the candidate detection frames according to the coordinate position information. If the current video frame has a tracking target, the number of the tracking targets is at least one, and can also be multiple. The candidate detection frame corresponds to a tracking target.
The visual multi-target tracking device based on deep learning acquires images of each frame except the 1 st frame in the video and takes the images as images of a region to be searched. Namely, the tracking target is found and tracked in the image of the area to be searched.
After the visual multi-target tracking device based on the deep learning obtains the template images and the images of the areas to be searched, each template image and the images of the areas to be searched are respectively input into a target tracking network model constructed by a twin convolutional neural network. The target tracking network model constructed by the twin convolutional neural network comprises two networks sharing weight, the template image and the image of the area to be searched can be respectively input into the two networks, and a tracking result is obtained through correlation calculation.
According to the embodiment of the invention, the target objects which disappear in the video image can be removed; for a target object which newly appears in the video, the target detection network can detect the target object and store the position coordinate detection frame information of the target object, and the target tracking network model can continuously acquire the position detection frame information of the target object and automatically track the target object, so that the accuracy and the real-time performance of multi-target tracking are ensured.
According to the embodiment of the invention, the candidate detection frame of the tracking target is obtained in real time by utilizing the target detection network model, so that the corresponding template image is obtained, the template image corresponding to each tracking target and the image of the area to be searched are respectively input into the target tracking network model constructed by the twin convolutional neural network, and the tracking result of the tracking target corresponding to the template image is obtained according to the output of the target tracking network model, so that the calculation amount is low, and the real-time and accurate tracking of multiple targets is realized.
Further, based on the above embodiment, the target detection network model is a YOLOv3 network model.
The YOLOv3 algorithm has good effect on object detection and recognition accuracy and speed, so the embodiment of the invention adopts the YOLOv3 network model to detect the target object, the detection mode of the YOLOv3 adopts an end-to-end idea, the Darknet network is used for training, the model takes the whole image as the input of the network, the model directly regresses the position of a boundary frame and the category of the boundary frame at an output layer by using a regression method to recognize the target object, and the coordinate position information of a candidate frame of the target object is stored.
On the basis of the above embodiment, the method and the device provided by the embodiment of the invention improve the accuracy of tracking target identification in multi-target tracking by adopting the YOLOv3 network model for target detection.
Fig. 2 is a schematic processing flow diagram of a target tracking network model in the deep learning-based visual multi-target tracking method according to an embodiment of the present invention. As shown in fig. 2, the obtaining of the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model includes: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram; obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph; and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram.
Specifically, the process of obtaining the tracking result of the tracked object by using the target tracking network model is as follows: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; since the image of the area to be searched is obtained from the entire video frame and the template image is obtained from the tracking object in the video frame, the size of the template image is generally smaller than that of the area to be searched. The template characteristic image obtained from the template image is also smaller than the characteristic image of the area to be searched obtained from the area image to be searched.
As shown in fig. 2, the image corresponding to 127 × 3 is a template image, and the image corresponding to 255 × 3 is an image of the region to be searched. The numbers therein indicate the dimensions of the image, e.g. 127 × 3, where 127 × 127 indicates the length × width of the image and 3 indicates 3 channels (RGB). And then, extracting features through a target tracking network model to respectively obtain feature images. As 15 × 256, a template feature image obtained by feature extraction of the template image is indicated, and 31 × 256, a region-to-be-searched feature image obtained by feature extraction of the region-to-be-searched image is indicated. Wherein g isθRepresenting the feature extraction operation by using a twin neural network.
Performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched (denoted by x d for cross-correlation operation), and performing cross-correlation operation by sliding the template characteristic image on the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram (17 x 256); and during cross-correlation calculation, the template characteristic image is slid on the characteristic image of the area to be searched, and cross-correlation operation is performed channel by channel, so that the number of channels is kept unchanged.
And obtaining a feature map row with the highest class probability according to the feature map of the cross-correlation operation result, wherein the class probability is the highest, namely the fitting confidence coefficient in the whole feature map of the cross-correlation operation result is the highest. After the cross-correlation operation, a feature map of 17 × 256 is obtained, and the feature map row is a feature cube with the highest probability (for example, 1 × 256) obtained according to the class probability in the feature map of the cross-correlation operation result of 17 × 256A feature map). The characteristic diagram of the result of the cross-correlation operation is connected with two branches, each branch is subjected to two layers of channel transformation convolution of 1 multiplied by 1, and the size of the characteristic diagram is unchanged, so that a classification branch response diagram (such as 17 x 2k in figure 2) and a regression branch response diagram (such as 17 x 4k in figure 2) are obtained respectively. bσAnd SφRepresenting a convolution operation. k refers to the number of object detection frames, that is, the number of object detection frames of different sizes corresponding to each position. The classification branch response graph can be used for screening the target detection frame through grading (score), and the regression branch response graph can enable the position of the network learning object to be regressed, so that more accurate prediction (boundary box prediction, box) can be obtained, and therefore the tracking result of the tracking target corresponding to the template image is obtained according to the classification branch response graph and the regression branch response graph, and tracking of the tracking object is completed.
On the basis of the embodiment, the tracking of the tracked object is realized by utilizing the target tracking network model through operations of feature extraction, cross-correlation operation, acquisition of the classification branch response diagram and the regression branch response diagram and the like, and the accuracy of multi-target tracking is improved.
Further, based on the above embodiment, the performing a cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a feature map of a cross-correlation operation result includes: and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
On the basis of the above embodiment, in the embodiment of the present invention, the template feature image is slid on the feature image of the region to be searched, and cross-correlation operation is performed channel by channel, so that the number of channels is kept unchanged.
Further, based on the above embodiment, the cross-correlation operation result feature map includes a first cross-correlation operation result feature map and a second cross-correlation operation result feature map; the cross-correlation operation is performed on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram, and the method comprises the following steps: performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph; the method for obtaining the feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature map row to respectively obtain a classification branch response map and a regression branch response map includes: obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
The cross-correlation operation result characteristic diagram comprises a first cross-correlation operation result characteristic diagram and a second cross-correlation operation result characteristic diagram. Performing convolution operation on the template feature image to obtain two same classification branch feature maps, and performing convolution operation on the feature image of the area to be searched to obtain two same regression branch feature maps; respectively combining the classification branch characteristic diagram and the other regression branch characteristic diagram to perform cross-correlation operation, namely combining one classification branch characteristic diagram and one regression branch characteristic diagram in pairs to perform cross-correlation operation, and combining the other classification branch characteristic diagram and the other regression branch characteristic diagram in pairs to perform cross-correlation operation to respectively obtain a first cross-correlation operation result characteristic diagram and a second cross-correlation operation result characteristic diagram;
obtaining a first feature map row with highest class probability according to the feature map of the first cross-correlation operation result, wherein the first feature map row is a feature cube (such as a feature map of 1 × 256) with highest class probability in the feature map of the first cross-correlation operation result; performing channel transformation convolution operation by using the first characteristic diagram row, and setting relevant labels of the classification branches to obtain a response diagram of the classification branches
Obtaining a second feature map row with highest class probability according to the second cross-correlation operation result feature map, where the second feature map row is a feature cube (e.g., a feature map of 1 × 256) with highest class probability in the second cross-correlation operation result feature map; and performing channel transformation convolution operation by using the second characteristic diagram row, and setting a regression branch correlation label to obtain the regression branch response diagram.
On the basis of the embodiment, the embodiment of the invention obtains the combination of two pairs of classification branch feature maps and regression branch feature maps by respectively carrying out convolution operation on the template feature image and the feature image of the area to be searched, and further obtains the feature map of the cross-correlation operation result by carrying out cross-correlation operation on each combination, thereby improving the accuracy of the cross-correlation operation result and further improving the accuracy of classification and tracking.
Further, based on the above embodiment, the obtaining a tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map includes: sorting a plurality of target detection frames corresponding to the tracking target through the classification branch response graph; and predicting the boundary frame of each target detection frame through the regression branch, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm.
When the tracking result of the tracking target corresponding to the template image is obtained according to the classification branch response diagram and the regression branch response diagram, a plurality of target detection frames corresponding to the tracking target can be screened out through the classification branch response diagram, and the target detection frames can be sorted through a cosine window and a scale penalty, so that the plurality of target detection frames corresponding to the tracking target can be screened out through the sorting of the classification branch response diagram. And predicting the boundary frame of each target detection frame through the regression branch, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm (such as a non-maximum suppression algorithm).
During prediction, k targets are sorted and screened out in the classification branches, then the targets are sorted through a cosine window and a scale penalty, a boundary frame of each target is obtained according to the regression branches, and finally a final result is obtained by using a non-maximum suppression algorithm.
On the basis of the embodiment, the embodiment of the invention screens out a plurality of target detection frames corresponding to the tracking target through sorting the branch response graphs, predicts the boundary frame of each target detection frame through the regression branch, and obtains the boundary frame corresponding to the tracking result by using a preset screening algorithm, thereby ensuring the reliability of multi-target tracking; by selecting a proper sorting and screening algorithm of the target detection box and a proper sorting and screening algorithm of the boundary box, the accuracy of multi-target tracking is improved.
The embodiment of the invention provides a multi-target tracking method combining target detection and a target tracking algorithm based on deep learning, which can accurately identify and track a target object, and the training process is off-line operation, so that the network inference speed is high, and the real-time effect can be achieved.
Fig. 3 is a schematic structural diagram of a deep learning-based visual multi-target tracking device according to an embodiment of the present invention. As shown in fig. 3, the apparatus includes a template image obtaining module 10, an image obtaining module 20 of a region to be searched, and a tracking result obtaining module 30, wherein: the template image acquisition module 10 is configured to: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; the image obtaining module 20 of the area to be searched is configured to: acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; the tracking result obtaining module 30 is configured to: respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
According to the embodiment of the invention, the candidate detection frame of the tracking target is obtained in real time by utilizing the target detection network model, so that the corresponding template image is obtained, the template image corresponding to each tracking target and the image of the area to be searched are respectively input into the target tracking network model constructed by the twin convolutional neural network, and the tracking result of the tracking target corresponding to the template image is obtained according to the output of the target tracking network model, so that the calculation amount is low, and the real-time and accurate tracking of multiple targets is realized.
Further, based on the above embodiment, the target detection network model is a YOLOv3 network model.
On the basis of the above embodiment, the method and the device provided by the embodiment of the invention improve the accuracy of tracking target identification in multi-target tracking by adopting the YOLOv3 network model for target detection.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to obtain the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model, specifically: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram; obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph; and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram.
On the basis of the embodiment, the tracking of the tracked object is completed by utilizing the target tracking network model through operations of feature extraction, cross-correlation operation, acquisition of the classification branch response diagram and the regression branch response diagram and the like, and the accuracy of multi-target tracking is improved.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to perform cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a feature map of a cross-correlation operation result, specifically: and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
On the basis of the embodiment, the embodiment of the invention ensures that the number of channels is unchanged by sliding the template characteristic image on the characteristic image of the area to be searched and performing cross-correlation operation channel by channel.
Further, based on the above embodiment, the cross-correlation operation result feature map includes a first cross-correlation operation result feature map and a second cross-correlation operation result feature map; the tracking result obtaining module 30 is specifically configured to, when being configured to perform cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a cross-correlation operation result feature map: performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph; the tracking result obtaining module 30 is specifically configured to, when being configured to obtain a feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and perform channel transformation convolution operation by using the feature map row to obtain a classification branch response map and a regression branch response map, respectively: obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
On the basis of the embodiment, the embodiment of the invention obtains the combination of two pairs of classification branch feature maps and regression branch feature maps by respectively carrying out convolution elements on the template feature image and the feature image of the area to be searched, and further obtains the feature map of the cross-correlation operation result by carrying out cross-correlation operation on each combination, thereby improving the accuracy of the cross-correlation operation result and further improving the accuracy of classification and tracking.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to obtain the tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map, specifically configured to: sorting and screening a plurality of target detection frames corresponding to the tracking target through the classification branch characteristic graph; and predicting the boundary frame of each target detection frame through the regression branch, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm.
On the basis of the embodiment, the embodiment of the invention screens out a plurality of target detection frames corresponding to the tracking target through sorting the branch response graphs, predicts the boundary frame of each target detection frame through regression branches, and obtains the boundary frame corresponding to the tracking result by using a preset screening algorithm, thereby ensuring the reliability of multi-target tracking.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to screen out a plurality of target detection frames corresponding to the tracking target through the sorting of the classification branch response graph, specifically, the tracking result obtaining module is configured to: screening out a plurality of target detection frames corresponding to the tracking target through the classification branch characteristic graph, and sequencing the target detection frames through a cosine window and a scale punishment; the preset screening algorithm is a non-maximum suppression algorithm.
On the basis of the above embodiment, the accuracy of multi-target tracking is improved by selecting the proper sorting and screening algorithm of the target detection frame and selecting the proper sorting and screening algorithm of the boundary frame in the embodiment of the invention.
The apparatus provided in the embodiment of the present invention is used for the method, and specific functions may refer to the method flow described above, which is not described herein again.
Fig. 4 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform the following method: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A visual multi-target tracking method based on deep learning is characterized by comprising the following steps:
sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more;
acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched;
respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
2. The deep learning based visual multi-target tracking method according to claim 1, wherein the target detection network model is a YOLOv3 network model.
3. The deep learning-based visual multi-target tracking method according to claim 1, wherein the obtaining of the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model comprises:
respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched;
performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram;
obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph;
and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram.
4. The visual multi-target tracking method based on deep learning of claim 3, wherein the cross-correlation operation is performed on the template feature image and the feature image of the area to be searched to obtain a cross-correlation operation result feature map, and the method comprises the following steps:
and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
5. The deep learning-based visual multi-target tracking method according to claim 3, wherein the cross-correlation result feature map comprises a first cross-correlation result feature map and a second cross-correlation result feature map; the cross-correlation operation is performed on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram, and the method comprises the following steps:
performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph;
the method for obtaining the feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature map row to respectively obtain a classification branch response map and a regression branch response map includes:
obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
6. The deep learning-based visual multi-target tracking method according to claim 3, wherein the obtaining of the tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map comprises:
sorting a plurality of target detection frames corresponding to the tracking target through the classification branch response graph;
and acquiring the boundary frame of each target detection frame through the regression branch response graph, and acquiring the boundary frame corresponding to the tracking result by using a preset screening algorithm.
7. The deep learning based visual multi-target tracking method according to claim 6, wherein the screening out a plurality of target detection boxes corresponding to the tracking targets through the sorting of the branch response graphs comprises:
screening out a plurality of target detection frames corresponding to the tracking target through the classification branch response graph, and sequencing the target detection frames through a cosine window and a scale punishment; the preset screening algorithm is a non-maximum suppression algorithm.
8. A visual multi-target tracking device based on deep learning is characterized by comprising:
a template image acquisition module to: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more;
the image acquisition module of the area to be searched is used for: acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched;
a tracking result obtaining module configured to: respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the deep learning based visual multi-target tracking method according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the deep learning based visual multi-target tracking method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911252433.5A CN111161311A (en) | 2019-12-09 | 2019-12-09 | Visual multi-target tracking method and device based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911252433.5A CN111161311A (en) | 2019-12-09 | 2019-12-09 | Visual multi-target tracking method and device based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111161311A true CN111161311A (en) | 2020-05-15 |
Family
ID=70556616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911252433.5A Pending CN111161311A (en) | 2019-12-09 | 2019-12-09 | Visual multi-target tracking method and device based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111161311A (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111724409A (en) * | 2020-05-18 | 2020-09-29 | 浙江工业大学 | Target tracking method based on densely connected twin neural network |
CN111797716A (en) * | 2020-06-16 | 2020-10-20 | 电子科技大学 | Single target tracking method based on Siamese network |
CN111882580A (en) * | 2020-07-17 | 2020-11-03 | 元神科技(杭州)有限公司 | Video multi-target tracking method and system |
CN111915644A (en) * | 2020-07-09 | 2020-11-10 | 苏州科技大学 | Real-time target tracking method of twin guiding anchor frame RPN network |
CN111932579A (en) * | 2020-08-12 | 2020-11-13 | 广东技术师范大学 | Method and device for adjusting equipment angle based on motion trail of tracked target |
CN112001252A (en) * | 2020-07-22 | 2020-11-27 | 北京交通大学 | Multi-target tracking method based on heteromorphic graph network |
CN112037254A (en) * | 2020-08-11 | 2020-12-04 | 浙江大华技术股份有限公司 | Target tracking method and related device |
CN112215080A (en) * | 2020-09-16 | 2021-01-12 | 电子科技大学 | Target tracking method using time sequence information |
CN112257527A (en) * | 2020-10-10 | 2021-01-22 | 西南交通大学 | Mobile phone detection method based on multi-target fusion and space-time video sequence |
CN112464769A (en) * | 2020-11-18 | 2021-03-09 | 西北工业大学 | High-resolution remote sensing image target detection method based on consistent multi-stage detection |
CN112489081A (en) * | 2020-11-30 | 2021-03-12 | 北京航空航天大学 | Visual target tracking method and device |
CN112598739A (en) * | 2020-12-25 | 2021-04-02 | 哈尔滨工业大学(深圳) | Mobile robot infrared target tracking method and system based on space-time characteristic aggregation network and storage medium |
CN112614159A (en) * | 2020-12-22 | 2021-04-06 | 浙江大学 | Cross-camera multi-target tracking method for warehouse scene |
CN112633078A (en) * | 2020-12-02 | 2021-04-09 | 西安电子科技大学 | Target tracking self-correcting method, system, medium, equipment, terminal and application |
CN112651994A (en) * | 2020-12-18 | 2021-04-13 | 零八一电子集团有限公司 | Ground multi-target tracking method |
CN112816474A (en) * | 2021-01-07 | 2021-05-18 | 武汉大学 | Target perception-based depth twin network hyperspectral video target tracking method |
CN112950675A (en) * | 2021-03-18 | 2021-06-11 | 深圳市商汤科技有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN112967289A (en) * | 2021-02-08 | 2021-06-15 | 上海西井信息科技有限公司 | Security check package matching method, system, equipment and storage medium |
CN112967315A (en) * | 2021-03-02 | 2021-06-15 | 北京百度网讯科技有限公司 | Target tracking method and device and electronic equipment |
CN113112525A (en) * | 2021-04-27 | 2021-07-13 | 北京百度网讯科技有限公司 | Target tracking method, network model, and training method, device, and medium thereof |
CN113160272A (en) * | 2021-03-19 | 2021-07-23 | 苏州科达科技股份有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN113344932A (en) * | 2021-06-01 | 2021-09-03 | 电子科技大学 | Semi-supervised single-target video segmentation method |
CN113705588A (en) * | 2021-10-28 | 2021-11-26 | 南昌工程学院 | Twin network target tracking method and system based on convolution self-attention module |
CN113763415A (en) * | 2020-06-04 | 2021-12-07 | 北京达佳互联信息技术有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN114170271A (en) * | 2021-11-18 | 2022-03-11 | 安徽清新互联信息科技有限公司 | Multi-target tracking method with self-tracking consciousness, equipment and storage medium |
WO2022116868A1 (en) * | 2020-12-03 | 2022-06-09 | Ping An Technology (Shenzhen) Co., Ltd. | Method, device, and computer program product for deep lesion tracker for monitoring lesions in four-dimensional longitudinal imaging |
CN115359240A (en) * | 2022-07-15 | 2022-11-18 | 北京中科思创云智能科技有限公司 | Small target detection method, device and equipment based on multi-frame image motion characteristics |
CN115661207A (en) * | 2022-11-14 | 2023-01-31 | 南昌工程学院 | Target tracking method and system based on space consistency matching and weight learning |
CN115984332A (en) * | 2023-02-14 | 2023-04-18 | 北京卓翼智能科技有限公司 | Unmanned aerial vehicle tracking method and device, electronic equipment and storage medium |
CN116977902A (en) * | 2023-08-14 | 2023-10-31 | 长春工业大学 | Target tracking method and system for on-board photoelectric stabilized platform of coastal defense |
WO2023207276A1 (en) * | 2022-04-29 | 2023-11-02 | 京东方科技集团股份有限公司 | Area location update method, security and protection system, and computer-readable storage medium |
WO2023216572A1 (en) * | 2022-05-07 | 2023-11-16 | 深圳先进技术研究院 | Cross-video target tracking method and system, and electronic device and storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104574445A (en) * | 2015-01-23 | 2015-04-29 | 北京航空航天大学 | Target tracking method and device |
US20170286774A1 (en) * | 2016-04-04 | 2017-10-05 | Xerox Corporation | Deep data association for online multi-class multi-object tracking |
CN107403175A (en) * | 2017-09-21 | 2017-11-28 | 昆明理工大学 | Visual tracking method and Visual Tracking System under a kind of movement background |
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN109325967A (en) * | 2018-09-14 | 2019-02-12 | 腾讯科技(深圳)有限公司 | Method for tracking target, device, medium and equipment |
CN109376572A (en) * | 2018-08-09 | 2019-02-22 | 同济大学 | Real-time vehicle detection and trace tracking method in traffic video based on deep learning |
CN109785385A (en) * | 2019-01-22 | 2019-05-21 | 中国科学院自动化研究所 | Visual target tracking method and system |
CN109948611A (en) * | 2019-03-14 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of method and device that method, the information of information area determination are shown |
CN109978921A (en) * | 2019-04-01 | 2019-07-05 | 南京信息工程大学 | A kind of real-time video target tracking algorithm based on multilayer attention mechanism |
CN110096960A (en) * | 2019-04-03 | 2019-08-06 | 罗克佳华科技集团股份有限公司 | Object detection method and device |
CN110097575A (en) * | 2019-04-28 | 2019-08-06 | 电子科技大学 | A kind of method for tracking target based on local feature and scale pond |
CN110111363A (en) * | 2019-04-28 | 2019-08-09 | 深兰科技(上海)有限公司 | A kind of tracking and equipment based on target detection |
CN110210551A (en) * | 2019-05-28 | 2019-09-06 | 北京工业大学 | A kind of visual target tracking method based on adaptive main body sensitivity |
CN110298404A (en) * | 2019-07-02 | 2019-10-01 | 西南交通大学 | A kind of method for tracking target based on triple twin Hash e-learnings |
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
-
2019
- 2019-12-09 CN CN201911252433.5A patent/CN111161311A/en active Pending
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104574445A (en) * | 2015-01-23 | 2015-04-29 | 北京航空航天大学 | Target tracking method and device |
US20170286774A1 (en) * | 2016-04-04 | 2017-10-05 | Xerox Corporation | Deep data association for online multi-class multi-object tracking |
CN107403175A (en) * | 2017-09-21 | 2017-11-28 | 昆明理工大学 | Visual tracking method and Visual Tracking System under a kind of movement background |
CN109191491A (en) * | 2018-08-03 | 2019-01-11 | 华中科技大学 | The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion |
CN109376572A (en) * | 2018-08-09 | 2019-02-22 | 同济大学 | Real-time vehicle detection and trace tracking method in traffic video based on deep learning |
CN109325967A (en) * | 2018-09-14 | 2019-02-12 | 腾讯科技(深圳)有限公司 | Method for tracking target, device, medium and equipment |
CN109785385A (en) * | 2019-01-22 | 2019-05-21 | 中国科学院自动化研究所 | Visual target tracking method and system |
CN109948611A (en) * | 2019-03-14 | 2019-06-28 | 腾讯科技(深圳)有限公司 | A kind of method and device that method, the information of information area determination are shown |
CN109978921A (en) * | 2019-04-01 | 2019-07-05 | 南京信息工程大学 | A kind of real-time video target tracking algorithm based on multilayer attention mechanism |
CN110096960A (en) * | 2019-04-03 | 2019-08-06 | 罗克佳华科技集团股份有限公司 | Object detection method and device |
CN110097575A (en) * | 2019-04-28 | 2019-08-06 | 电子科技大学 | A kind of method for tracking target based on local feature and scale pond |
CN110111363A (en) * | 2019-04-28 | 2019-08-09 | 深兰科技(上海)有限公司 | A kind of tracking and equipment based on target detection |
CN110210551A (en) * | 2019-05-28 | 2019-09-06 | 北京工业大学 | A kind of visual target tracking method based on adaptive main body sensitivity |
CN110335290A (en) * | 2019-06-04 | 2019-10-15 | 大连理工大学 | Twin candidate region based on attention mechanism generates network target tracking method |
CN110298404A (en) * | 2019-07-02 | 2019-10-01 | 西南交通大学 | A kind of method for tracking target based on triple twin Hash e-learnings |
Non-Patent Citations (1)
Title |
---|
张沁怡: "基于深度卷积网络的人车检测及跟踪算法研究", pages 9 - 18 * |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111724409A (en) * | 2020-05-18 | 2020-09-29 | 浙江工业大学 | Target tracking method based on densely connected twin neural network |
CN113763415A (en) * | 2020-06-04 | 2021-12-07 | 北京达佳互联信息技术有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN113763415B (en) * | 2020-06-04 | 2024-03-08 | 北京达佳互联信息技术有限公司 | Target tracking method, device, electronic equipment and storage medium |
CN111797716A (en) * | 2020-06-16 | 2020-10-20 | 电子科技大学 | Single target tracking method based on Siamese network |
CN111797716B (en) * | 2020-06-16 | 2022-05-03 | 电子科技大学 | Single target tracking method based on Siamese network |
CN111915644A (en) * | 2020-07-09 | 2020-11-10 | 苏州科技大学 | Real-time target tracking method of twin guiding anchor frame RPN network |
CN111915644B (en) * | 2020-07-09 | 2023-07-04 | 苏州科技大学 | Real-time target tracking method of twin guide anchor frame RPN network |
CN111882580A (en) * | 2020-07-17 | 2020-11-03 | 元神科技(杭州)有限公司 | Video multi-target tracking method and system |
CN111882580B (en) * | 2020-07-17 | 2023-10-24 | 元神科技(杭州)有限公司 | Video multi-target tracking method and system |
CN112001252A (en) * | 2020-07-22 | 2020-11-27 | 北京交通大学 | Multi-target tracking method based on heteromorphic graph network |
CN112001252B (en) * | 2020-07-22 | 2024-04-12 | 北京交通大学 | Multi-target tracking method based on different composition network |
CN112037254A (en) * | 2020-08-11 | 2020-12-04 | 浙江大华技术股份有限公司 | Target tracking method and related device |
CN111932579A (en) * | 2020-08-12 | 2020-11-13 | 广东技术师范大学 | Method and device for adjusting equipment angle based on motion trail of tracked target |
CN112215080B (en) * | 2020-09-16 | 2022-05-03 | 电子科技大学 | Target tracking method using time sequence information |
CN112215080A (en) * | 2020-09-16 | 2021-01-12 | 电子科技大学 | Target tracking method using time sequence information |
CN112257527B (en) * | 2020-10-10 | 2022-09-02 | 西南交通大学 | Mobile phone detection method based on multi-target fusion and space-time video sequence |
CN112257527A (en) * | 2020-10-10 | 2021-01-22 | 西南交通大学 | Mobile phone detection method based on multi-target fusion and space-time video sequence |
CN112464769A (en) * | 2020-11-18 | 2021-03-09 | 西北工业大学 | High-resolution remote sensing image target detection method based on consistent multi-stage detection |
CN112489081A (en) * | 2020-11-30 | 2021-03-12 | 北京航空航天大学 | Visual target tracking method and device |
CN112633078A (en) * | 2020-12-02 | 2021-04-09 | 西安电子科技大学 | Target tracking self-correcting method, system, medium, equipment, terminal and application |
CN112633078B (en) * | 2020-12-02 | 2024-02-02 | 西安电子科技大学 | Target tracking self-correction method, system, medium, equipment, terminal and application |
WO2022116868A1 (en) * | 2020-12-03 | 2022-06-09 | Ping An Technology (Shenzhen) Co., Ltd. | Method, device, and computer program product for deep lesion tracker for monitoring lesions in four-dimensional longitudinal imaging |
CN112651994A (en) * | 2020-12-18 | 2021-04-13 | 零八一电子集团有限公司 | Ground multi-target tracking method |
CN112614159A (en) * | 2020-12-22 | 2021-04-06 | 浙江大学 | Cross-camera multi-target tracking method for warehouse scene |
CN112598739B (en) * | 2020-12-25 | 2023-09-01 | 哈尔滨工业大学(深圳) | Mobile robot infrared target tracking method, system and storage medium based on space-time characteristic aggregation network |
CN112598739A (en) * | 2020-12-25 | 2021-04-02 | 哈尔滨工业大学(深圳) | Mobile robot infrared target tracking method and system based on space-time characteristic aggregation network and storage medium |
CN112816474B (en) * | 2021-01-07 | 2022-02-01 | 武汉大学 | Target perception-based depth twin network hyperspectral video target tracking method |
CN112816474A (en) * | 2021-01-07 | 2021-05-18 | 武汉大学 | Target perception-based depth twin network hyperspectral video target tracking method |
CN112967289A (en) * | 2021-02-08 | 2021-06-15 | 上海西井信息科技有限公司 | Security check package matching method, system, equipment and storage medium |
CN112967315A (en) * | 2021-03-02 | 2021-06-15 | 北京百度网讯科技有限公司 | Target tracking method and device and electronic equipment |
CN112967315B (en) * | 2021-03-02 | 2022-08-02 | 北京百度网讯科技有限公司 | Target tracking method and device and electronic equipment |
CN112950675A (en) * | 2021-03-18 | 2021-06-11 | 深圳市商汤科技有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN113160272B (en) * | 2021-03-19 | 2023-04-07 | 苏州科达科技股份有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN113160272A (en) * | 2021-03-19 | 2021-07-23 | 苏州科达科技股份有限公司 | Target tracking method and device, electronic equipment and storage medium |
CN113112525B (en) * | 2021-04-27 | 2023-09-01 | 北京百度网讯科技有限公司 | Target tracking method, network model, training method, training device and training medium thereof |
CN113112525A (en) * | 2021-04-27 | 2021-07-13 | 北京百度网讯科技有限公司 | Target tracking method, network model, and training method, device, and medium thereof |
CN113344932A (en) * | 2021-06-01 | 2021-09-03 | 电子科技大学 | Semi-supervised single-target video segmentation method |
CN113705588B (en) * | 2021-10-28 | 2022-01-25 | 南昌工程学院 | Twin network target tracking method and system based on convolution self-attention module |
CN113705588A (en) * | 2021-10-28 | 2021-11-26 | 南昌工程学院 | Twin network target tracking method and system based on convolution self-attention module |
CN114170271A (en) * | 2021-11-18 | 2022-03-11 | 安徽清新互联信息科技有限公司 | Multi-target tracking method with self-tracking consciousness, equipment and storage medium |
CN114170271B (en) * | 2021-11-18 | 2024-04-12 | 安徽清新互联信息科技有限公司 | Multi-target tracking method, equipment and storage medium with self-tracking consciousness |
WO2023207276A1 (en) * | 2022-04-29 | 2023-11-02 | 京东方科技集团股份有限公司 | Area location update method, security and protection system, and computer-readable storage medium |
WO2023216572A1 (en) * | 2022-05-07 | 2023-11-16 | 深圳先进技术研究院 | Cross-video target tracking method and system, and electronic device and storage medium |
CN115359240B (en) * | 2022-07-15 | 2024-03-15 | 北京中科思创云智能科技有限公司 | Small target detection method, device and equipment based on multi-frame image motion characteristics |
CN115359240A (en) * | 2022-07-15 | 2022-11-18 | 北京中科思创云智能科技有限公司 | Small target detection method, device and equipment based on multi-frame image motion characteristics |
CN115661207A (en) * | 2022-11-14 | 2023-01-31 | 南昌工程学院 | Target tracking method and system based on space consistency matching and weight learning |
CN115984332A (en) * | 2023-02-14 | 2023-04-18 | 北京卓翼智能科技有限公司 | Unmanned aerial vehicle tracking method and device, electronic equipment and storage medium |
CN116977902B (en) * | 2023-08-14 | 2024-01-23 | 长春工业大学 | Target tracking method and system for on-board photoelectric stabilized platform of coastal defense |
CN116977902A (en) * | 2023-08-14 | 2023-10-31 | 长春工业大学 | Target tracking method and system for on-board photoelectric stabilized platform of coastal defense |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111161311A (en) | Visual multi-target tracking method and device based on deep learning | |
JP7236545B2 (en) | Video target tracking method and apparatus, computer apparatus, program | |
Wang et al. | Detect globally, refine locally: A novel approach to saliency detection | |
US11842487B2 (en) | Detection model training method and apparatus, computer device and storage medium | |
CN107895367B (en) | Bone age identification method and system and electronic equipment | |
CN112052787B (en) | Target detection method and device based on artificial intelligence and electronic equipment | |
CN109446889B (en) | Object tracking method and device based on twin matching network | |
KR101640998B1 (en) | Image processing apparatus and image processing method | |
CN105844283A (en) | Method for identifying category of image, image search method and image search device | |
CN111401293B (en) | Gesture recognition method based on Head lightweight Mask scanning R-CNN | |
CN112712546A (en) | Target tracking method based on twin neural network | |
CN105303163B (en) | A kind of method and detection device of target detection | |
CN110827312A (en) | Learning method based on cooperative visual attention neural network | |
CN112102929A (en) | Medical image labeling method and device, storage medium and electronic equipment | |
WO2021103474A1 (en) | Image processing method and apparatus, storage medium and electronic apparatus | |
Meng et al. | Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering | |
CN113780145A (en) | Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium | |
CN115862119B (en) | Attention mechanism-based face age estimation method and device | |
CN111539390A (en) | Small target image identification method, equipment and system based on Yolov3 | |
CN110956157A (en) | Deep learning remote sensing image target detection method and device based on candidate frame selection | |
CN116246161A (en) | Method and device for identifying target fine type of remote sensing image under guidance of domain knowledge | |
CN114743045B (en) | Small sample target detection method based on double-branch area suggestion network | |
CN110633630A (en) | Behavior identification method and device and terminal equipment | |
Nugroho et al. | Comparison of deep learning-based object classification methods for detecting tomato ripeness | |
CN115527050A (en) | Image feature matching method, computer device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |