CN111161311A - Visual multi-target tracking method and device based on deep learning - Google Patents

Visual multi-target tracking method and device based on deep learning Download PDF

Info

Publication number
CN111161311A
CN111161311A CN201911252433.5A CN201911252433A CN111161311A CN 111161311 A CN111161311 A CN 111161311A CN 201911252433 A CN201911252433 A CN 201911252433A CN 111161311 A CN111161311 A CN 111161311A
Authority
CN
China
Prior art keywords
tracking
target
image
cross
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911252433.5A
Other languages
Chinese (zh)
Inventor
田寅
温博阁
唐海川
咸哓雨
李欣旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CRRC Industry Institute Co Ltd
Original Assignee
CRRC Industry Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CRRC Industry Institute Co Ltd filed Critical CRRC Industry Institute Co Ltd
Priority to CN201911252433.5A priority Critical patent/CN111161311A/en
Publication of CN111161311A publication Critical patent/CN111161311A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a visual multi-target tracking method and device based on deep learning, wherein the method comprises the following steps: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model, recording coordinate position information and acquiring a corresponding template image; acquiring images of each frame except the 1 st frame in the video as images of a region to be searched; and inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network so as to obtain a tracking result of the tracking target. According to the visual multi-target tracking method and device based on deep learning, the template images corresponding to the tracking targets and the images of the areas to be searched, which are acquired by the target detection network model, are respectively input into the target tracking network model constructed by the twin convolutional neural network, so that the tracking results of the tracking targets corresponding to the template images are acquired, the calculation amount is low, and multi-target real-time and accurate tracking is achieved.

Description

Visual multi-target tracking method and device based on deep learning
Technical Field
The invention relates to the technical field of computer vision, in particular to a visual multi-target tracking method and device based on deep learning.
Background
Visual target tracking is a hot problem in the field of computer vision research, and with the rapid development of the development of computer technology, the target tracking technology is also greatly improved. With the rapid rise of artificial intelligence in recent years, the research of target tracking technology is receiving more and more attention.
The deep learning technology has strong characteristic representation capability, and obtains better effect than the traditional method in the applications of image classification, object recognition, natural language processing and the like, thereby gradually becoming the mainstream technology of image video research. The tracking method based on deep learning is an important branch in the target tracking method, and the appearance characteristic and the motion characteristic of the target are automatically learned and tracked by a model by utilizing the advantage of end-to-end training of a deep convolutional network, so that high-quality robust tracking is realized.
In recent years, related reports on multi-target tracking are also found. However, the multi-target tracking method disclosed in the prior art generally has a large calculation amount, and cannot realize real-time tracking, so that the tracking effect is poor.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a visual multi-target tracking method and apparatus based on deep learning.
In a first aspect, an embodiment of the present invention provides a visual multi-target tracking method based on deep learning, including: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
Further, the target detection network model is a YOLOv3 network model.
Further, the obtaining of the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model includes: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram; obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph; and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram. Further, the performing a cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a feature map of a cross-correlation operation result includes: and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
Further, the cross-correlation operation result characteristic diagram comprises a first cross-correlation operation result characteristic diagram and a second cross-correlation operation result characteristic diagram; the cross-correlation operation is performed on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram, and the method comprises the following steps: performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph; the method for obtaining the feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature map row to respectively obtain a classification branch response map and a regression branch response map includes: obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
Further, the obtaining a tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map includes: sorting a plurality of target detection frames corresponding to the tracking target through the classification branch response graph; and predicting the boundary frame of each target detection frame through the regression branch response graph, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm.
Further, the screening out a plurality of target detection frames corresponding to the tracking target through the sorting of the classification branch response graph includes: screening out a plurality of target detection frames corresponding to the tracking target through the classification branch response graph, and sequencing the target detection frames through a cosine window and a scale punishment; the preset screening algorithm is a non-maximum suppression algorithm.
In a second aspect, an embodiment of the present invention provides a visual multi-target tracking device based on deep learning, including: a template image acquisition module to: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; the image acquisition module of the area to be searched is used for: acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; a tracking result obtaining module configured to: respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
According to the visual multi-target tracking method and device based on deep learning provided by the embodiment of the invention, the candidate detection frame of the tracking target is obtained in real time by utilizing the target detection network model, so that the corresponding template image is obtained, the template image corresponding to each tracking target and the image of the area to be searched are respectively input into the target tracking network model constructed by the twin convolutional neural network, and the tracking result of the tracking target corresponding to the template image is obtained according to the output of the target tracking network model, so that the calculation amount is low, and the real-time and accurate tracking of the multi-target is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a depth learning-based visual multi-target tracking method according to an embodiment of the present invention;
FIG. 2 is a schematic processing flow diagram of a target tracking network model in the deep learning-based visual multi-target tracking method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a deep learning-based visual multi-target tracking device according to an embodiment of the present invention;
fig. 4 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a visual multi-target tracking method based on deep learning according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step 101, sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to a frame sequence of a video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more;
102, acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched;
103, respectively inputting the template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
The target detection network model is used for carrying out target detection and carrying out target detection on a preset tracking target aiming at each frame of image in the video. Taking a tracking target as an example, as time goes by, the tracking target in the video frame changes, for example, some tracking targets disappear, and new tracking targets are added. Therefore, the target detection is carried out on each frame of image through the target detection network model, and the real-time update of the tracking target can be realized.
Specifically, in the process of target detection, the visual multi-target tracking device based on deep learning sequentially obtains candidate detection frames of a tracked target in a current video frame through a target detection network model according to the frame sequence of the video, records coordinate position information of the candidate detection frames, and obtains template images corresponding to the candidate detection frames according to the coordinate position information. If the current video frame has a tracking target, the number of the tracking targets is at least one, and can also be multiple. The candidate detection frame corresponds to a tracking target.
The visual multi-target tracking device based on deep learning acquires images of each frame except the 1 st frame in the video and takes the images as images of a region to be searched. Namely, the tracking target is found and tracked in the image of the area to be searched.
After the visual multi-target tracking device based on the deep learning obtains the template images and the images of the areas to be searched, each template image and the images of the areas to be searched are respectively input into a target tracking network model constructed by a twin convolutional neural network. The target tracking network model constructed by the twin convolutional neural network comprises two networks sharing weight, the template image and the image of the area to be searched can be respectively input into the two networks, and a tracking result is obtained through correlation calculation.
According to the embodiment of the invention, the target objects which disappear in the video image can be removed; for a target object which newly appears in the video, the target detection network can detect the target object and store the position coordinate detection frame information of the target object, and the target tracking network model can continuously acquire the position detection frame information of the target object and automatically track the target object, so that the accuracy and the real-time performance of multi-target tracking are ensured.
According to the embodiment of the invention, the candidate detection frame of the tracking target is obtained in real time by utilizing the target detection network model, so that the corresponding template image is obtained, the template image corresponding to each tracking target and the image of the area to be searched are respectively input into the target tracking network model constructed by the twin convolutional neural network, and the tracking result of the tracking target corresponding to the template image is obtained according to the output of the target tracking network model, so that the calculation amount is low, and the real-time and accurate tracking of multiple targets is realized.
Further, based on the above embodiment, the target detection network model is a YOLOv3 network model.
The YOLOv3 algorithm has good effect on object detection and recognition accuracy and speed, so the embodiment of the invention adopts the YOLOv3 network model to detect the target object, the detection mode of the YOLOv3 adopts an end-to-end idea, the Darknet network is used for training, the model takes the whole image as the input of the network, the model directly regresses the position of a boundary frame and the category of the boundary frame at an output layer by using a regression method to recognize the target object, and the coordinate position information of a candidate frame of the target object is stored.
On the basis of the above embodiment, the method and the device provided by the embodiment of the invention improve the accuracy of tracking target identification in multi-target tracking by adopting the YOLOv3 network model for target detection.
Fig. 2 is a schematic processing flow diagram of a target tracking network model in the deep learning-based visual multi-target tracking method according to an embodiment of the present invention. As shown in fig. 2, the obtaining of the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model includes: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram; obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph; and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram.
Specifically, the process of obtaining the tracking result of the tracked object by using the target tracking network model is as follows: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; since the image of the area to be searched is obtained from the entire video frame and the template image is obtained from the tracking object in the video frame, the size of the template image is generally smaller than that of the area to be searched. The template characteristic image obtained from the template image is also smaller than the characteristic image of the area to be searched obtained from the area image to be searched.
As shown in fig. 2, the image corresponding to 127 × 3 is a template image, and the image corresponding to 255 × 3 is an image of the region to be searched. The numbers therein indicate the dimensions of the image, e.g. 127 × 3, where 127 × 127 indicates the length × width of the image and 3 indicates 3 channels (RGB). And then, extracting features through a target tracking network model to respectively obtain feature images. As 15 × 256, a template feature image obtained by feature extraction of the template image is indicated, and 31 × 256, a region-to-be-searched feature image obtained by feature extraction of the region-to-be-searched image is indicated. Wherein g isθRepresenting the feature extraction operation by using a twin neural network.
Performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched (denoted by x d for cross-correlation operation), and performing cross-correlation operation by sliding the template characteristic image on the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram (17 x 256); and during cross-correlation calculation, the template characteristic image is slid on the characteristic image of the area to be searched, and cross-correlation operation is performed channel by channel, so that the number of channels is kept unchanged.
And obtaining a feature map row with the highest class probability according to the feature map of the cross-correlation operation result, wherein the class probability is the highest, namely the fitting confidence coefficient in the whole feature map of the cross-correlation operation result is the highest. After the cross-correlation operation, a feature map of 17 × 256 is obtained, and the feature map row is a feature cube with the highest probability (for example, 1 × 256) obtained according to the class probability in the feature map of the cross-correlation operation result of 17 × 256A feature map). The characteristic diagram of the result of the cross-correlation operation is connected with two branches, each branch is subjected to two layers of channel transformation convolution of 1 multiplied by 1, and the size of the characteristic diagram is unchanged, so that a classification branch response diagram (such as 17 x 2k in figure 2) and a regression branch response diagram (such as 17 x 4k in figure 2) are obtained respectively. bσAnd SφRepresenting a convolution operation. k refers to the number of object detection frames, that is, the number of object detection frames of different sizes corresponding to each position. The classification branch response graph can be used for screening the target detection frame through grading (score), and the regression branch response graph can enable the position of the network learning object to be regressed, so that more accurate prediction (boundary box prediction, box) can be obtained, and therefore the tracking result of the tracking target corresponding to the template image is obtained according to the classification branch response graph and the regression branch response graph, and tracking of the tracking object is completed.
On the basis of the embodiment, the tracking of the tracked object is realized by utilizing the target tracking network model through operations of feature extraction, cross-correlation operation, acquisition of the classification branch response diagram and the regression branch response diagram and the like, and the accuracy of multi-target tracking is improved.
Further, based on the above embodiment, the performing a cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a feature map of a cross-correlation operation result includes: and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
On the basis of the above embodiment, in the embodiment of the present invention, the template feature image is slid on the feature image of the region to be searched, and cross-correlation operation is performed channel by channel, so that the number of channels is kept unchanged.
Further, based on the above embodiment, the cross-correlation operation result feature map includes a first cross-correlation operation result feature map and a second cross-correlation operation result feature map; the cross-correlation operation is performed on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram, and the method comprises the following steps: performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph; the method for obtaining the feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature map row to respectively obtain a classification branch response map and a regression branch response map includes: obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
The cross-correlation operation result characteristic diagram comprises a first cross-correlation operation result characteristic diagram and a second cross-correlation operation result characteristic diagram. Performing convolution operation on the template feature image to obtain two same classification branch feature maps, and performing convolution operation on the feature image of the area to be searched to obtain two same regression branch feature maps; respectively combining the classification branch characteristic diagram and the other regression branch characteristic diagram to perform cross-correlation operation, namely combining one classification branch characteristic diagram and one regression branch characteristic diagram in pairs to perform cross-correlation operation, and combining the other classification branch characteristic diagram and the other regression branch characteristic diagram in pairs to perform cross-correlation operation to respectively obtain a first cross-correlation operation result characteristic diagram and a second cross-correlation operation result characteristic diagram;
obtaining a first feature map row with highest class probability according to the feature map of the first cross-correlation operation result, wherein the first feature map row is a feature cube (such as a feature map of 1 × 256) with highest class probability in the feature map of the first cross-correlation operation result; performing channel transformation convolution operation by using the first characteristic diagram row, and setting relevant labels of the classification branches to obtain a response diagram of the classification branches
Obtaining a second feature map row with highest class probability according to the second cross-correlation operation result feature map, where the second feature map row is a feature cube (e.g., a feature map of 1 × 256) with highest class probability in the second cross-correlation operation result feature map; and performing channel transformation convolution operation by using the second characteristic diagram row, and setting a regression branch correlation label to obtain the regression branch response diagram.
On the basis of the embodiment, the embodiment of the invention obtains the combination of two pairs of classification branch feature maps and regression branch feature maps by respectively carrying out convolution operation on the template feature image and the feature image of the area to be searched, and further obtains the feature map of the cross-correlation operation result by carrying out cross-correlation operation on each combination, thereby improving the accuracy of the cross-correlation operation result and further improving the accuracy of classification and tracking.
Further, based on the above embodiment, the obtaining a tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map includes: sorting a plurality of target detection frames corresponding to the tracking target through the classification branch response graph; and predicting the boundary frame of each target detection frame through the regression branch, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm.
When the tracking result of the tracking target corresponding to the template image is obtained according to the classification branch response diagram and the regression branch response diagram, a plurality of target detection frames corresponding to the tracking target can be screened out through the classification branch response diagram, and the target detection frames can be sorted through a cosine window and a scale penalty, so that the plurality of target detection frames corresponding to the tracking target can be screened out through the sorting of the classification branch response diagram. And predicting the boundary frame of each target detection frame through the regression branch, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm (such as a non-maximum suppression algorithm).
During prediction, k targets are sorted and screened out in the classification branches, then the targets are sorted through a cosine window and a scale penalty, a boundary frame of each target is obtained according to the regression branches, and finally a final result is obtained by using a non-maximum suppression algorithm.
On the basis of the embodiment, the embodiment of the invention screens out a plurality of target detection frames corresponding to the tracking target through sorting the branch response graphs, predicts the boundary frame of each target detection frame through the regression branch, and obtains the boundary frame corresponding to the tracking result by using a preset screening algorithm, thereby ensuring the reliability of multi-target tracking; by selecting a proper sorting and screening algorithm of the target detection box and a proper sorting and screening algorithm of the boundary box, the accuracy of multi-target tracking is improved.
The embodiment of the invention provides a multi-target tracking method combining target detection and a target tracking algorithm based on deep learning, which can accurately identify and track a target object, and the training process is off-line operation, so that the network inference speed is high, and the real-time effect can be achieved.
Fig. 3 is a schematic structural diagram of a deep learning-based visual multi-target tracking device according to an embodiment of the present invention. As shown in fig. 3, the apparatus includes a template image obtaining module 10, an image obtaining module 20 of a region to be searched, and a tracking result obtaining module 30, wherein: the template image acquisition module 10 is configured to: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; the image obtaining module 20 of the area to be searched is configured to: acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; the tracking result obtaining module 30 is configured to: respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
According to the embodiment of the invention, the candidate detection frame of the tracking target is obtained in real time by utilizing the target detection network model, so that the corresponding template image is obtained, the template image corresponding to each tracking target and the image of the area to be searched are respectively input into the target tracking network model constructed by the twin convolutional neural network, and the tracking result of the tracking target corresponding to the template image is obtained according to the output of the target tracking network model, so that the calculation amount is low, and the real-time and accurate tracking of multiple targets is realized.
Further, based on the above embodiment, the target detection network model is a YOLOv3 network model.
On the basis of the above embodiment, the method and the device provided by the embodiment of the invention improve the accuracy of tracking target identification in multi-target tracking by adopting the YOLOv3 network model for target detection.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to obtain the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model, specifically: respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched; performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram; obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph; and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram.
On the basis of the embodiment, the tracking of the tracked object is completed by utilizing the target tracking network model through operations of feature extraction, cross-correlation operation, acquisition of the classification branch response diagram and the regression branch response diagram and the like, and the accuracy of multi-target tracking is improved.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to perform cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a feature map of a cross-correlation operation result, specifically: and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
On the basis of the embodiment, the embodiment of the invention ensures that the number of channels is unchanged by sliding the template characteristic image on the characteristic image of the area to be searched and performing cross-correlation operation channel by channel.
Further, based on the above embodiment, the cross-correlation operation result feature map includes a first cross-correlation operation result feature map and a second cross-correlation operation result feature map; the tracking result obtaining module 30 is specifically configured to, when being configured to perform cross-correlation operation on the template feature image and the feature image of the area to be searched to obtain a cross-correlation operation result feature map: performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph; the tracking result obtaining module 30 is specifically configured to, when being configured to obtain a feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and perform channel transformation convolution operation by using the feature map row to obtain a classification branch response map and a regression branch response map, respectively: obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
On the basis of the embodiment, the embodiment of the invention obtains the combination of two pairs of classification branch feature maps and regression branch feature maps by respectively carrying out convolution elements on the template feature image and the feature image of the area to be searched, and further obtains the feature map of the cross-correlation operation result by carrying out cross-correlation operation on each combination, thereby improving the accuracy of the cross-correlation operation result and further improving the accuracy of classification and tracking.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to obtain the tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map, specifically configured to: sorting and screening a plurality of target detection frames corresponding to the tracking target through the classification branch characteristic graph; and predicting the boundary frame of each target detection frame through the regression branch, and obtaining the boundary frame corresponding to the tracking result by using a preset screening algorithm.
On the basis of the embodiment, the embodiment of the invention screens out a plurality of target detection frames corresponding to the tracking target through sorting the branch response graphs, predicts the boundary frame of each target detection frame through regression branches, and obtains the boundary frame corresponding to the tracking result by using a preset screening algorithm, thereby ensuring the reliability of multi-target tracking.
Further, based on the above embodiment, when the tracking result obtaining module 30 is configured to screen out a plurality of target detection frames corresponding to the tracking target through the sorting of the classification branch response graph, specifically, the tracking result obtaining module is configured to: screening out a plurality of target detection frames corresponding to the tracking target through the classification branch characteristic graph, and sequencing the target detection frames through a cosine window and a scale punishment; the preset screening algorithm is a non-maximum suppression algorithm.
On the basis of the above embodiment, the accuracy of multi-target tracking is improved by selecting the proper sorting and screening algorithm of the target detection frame and selecting the proper sorting and screening algorithm of the boundary frame in the embodiment of the invention.
The apparatus provided in the embodiment of the present invention is used for the method, and specific functions may refer to the method flow described above, which is not described herein again.
Fig. 4 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call logic instructions in the memory 430 to perform the following method: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more; acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched; respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A visual multi-target tracking method based on deep learning is characterized by comprising the following steps:
sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more;
acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched;
respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
2. The deep learning based visual multi-target tracking method according to claim 1, wherein the target detection network model is a YOLOv3 network model.
3. The deep learning-based visual multi-target tracking method according to claim 1, wherein the obtaining of the tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model comprises:
respectively extracting the characteristics of the template image and the image of the area to be searched to obtain a template characteristic image and a characteristic image of the area to be searched;
performing cross-correlation operation on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram;
obtaining a feature graph row with the highest class probability according to the feature graph of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature graph row to respectively obtain a classification branch response graph and a regression branch response graph;
and acquiring the tracking result of the tracking target corresponding to the template image according to the classification branch response diagram and the regression branch response diagram.
4. The visual multi-target tracking method based on deep learning of claim 3, wherein the cross-correlation operation is performed on the template feature image and the feature image of the area to be searched to obtain a cross-correlation operation result feature map, and the method comprises the following steps:
and sliding the template characteristic image on the characteristic image of the area to be searched, and performing cross-correlation operation channel by channel to obtain a cross-correlation operation result characteristic image.
5. The deep learning-based visual multi-target tracking method according to claim 3, wherein the cross-correlation result feature map comprises a first cross-correlation result feature map and a second cross-correlation result feature map; the cross-correlation operation is performed on the template characteristic image and the characteristic image of the area to be searched to obtain a cross-correlation operation result characteristic diagram, and the method comprises the following steps:
performing convolution operation on the template characteristic image to obtain two classification branch characteristic graphs, and performing convolution operation on the characteristic image of the area to be searched to obtain two regression branch characteristic graphs; respectively combining the classification branch feature graph and the other regression branch feature graph pairwise to perform cross-correlation operation to obtain a first cross-correlation operation result feature graph and a second cross-correlation operation result feature graph;
the method for obtaining the feature map row with the highest class probability according to the feature map of the cross-correlation operation result, and performing channel transformation convolution operation by using the feature map row to respectively obtain a classification branch response map and a regression branch response map includes:
obtaining a first characteristic diagram row with the highest class probability according to the characteristic diagram of the first cross-correlation operation result, and performing channel transformation convolution operation by using the first characteristic diagram row to obtain the classification branch response diagram; and obtaining a second characteristic diagram row with the highest class probability according to the second cross-correlation operation result characteristic diagram, and performing channel transformation convolution operation by using the second characteristic diagram row to obtain the regression branch response diagram.
6. The deep learning-based visual multi-target tracking method according to claim 3, wherein the obtaining of the tracking result of the tracking target corresponding to the template image according to the classification branch response map and the regression branch response map comprises:
sorting a plurality of target detection frames corresponding to the tracking target through the classification branch response graph;
and acquiring the boundary frame of each target detection frame through the regression branch response graph, and acquiring the boundary frame corresponding to the tracking result by using a preset screening algorithm.
7. The deep learning based visual multi-target tracking method according to claim 6, wherein the screening out a plurality of target detection boxes corresponding to the tracking targets through the sorting of the branch response graphs comprises:
screening out a plurality of target detection frames corresponding to the tracking target through the classification branch response graph, and sequencing the target detection frames through a cosine window and a scale punishment; the preset screening algorithm is a non-maximum suppression algorithm.
8. A visual multi-target tracking device based on deep learning is characterized by comprising:
a template image acquisition module to: sequentially acquiring candidate detection frames of a tracking target in a current video frame through a target detection network model according to the frame sequence of the video, recording coordinate position information of the candidate detection frames, and acquiring template images corresponding to the candidate detection frames according to the coordinate position information; wherein the tracking targets are one or more;
the image acquisition module of the area to be searched is used for: acquiring images of each frame except the 1 st frame in the video, and taking the images as images of a region to be searched;
a tracking result obtaining module configured to: respectively inputting each template image and the image of the area to be searched into a target tracking network model constructed by a twin convolutional neural network; and acquiring a tracking result of the tracking target corresponding to the template image according to the output of the target tracking network model.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the deep learning based visual multi-target tracking method according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs the steps of the deep learning based visual multi-target tracking method according to any one of claims 1 to 7.
CN201911252433.5A 2019-12-09 2019-12-09 Visual multi-target tracking method and device based on deep learning Pending CN111161311A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911252433.5A CN111161311A (en) 2019-12-09 2019-12-09 Visual multi-target tracking method and device based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911252433.5A CN111161311A (en) 2019-12-09 2019-12-09 Visual multi-target tracking method and device based on deep learning

Publications (1)

Publication Number Publication Date
CN111161311A true CN111161311A (en) 2020-05-15

Family

ID=70556616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911252433.5A Pending CN111161311A (en) 2019-12-09 2019-12-09 Visual multi-target tracking method and device based on deep learning

Country Status (1)

Country Link
CN (1) CN111161311A (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724409A (en) * 2020-05-18 2020-09-29 浙江工业大学 Target tracking method based on densely connected twin neural network
CN111797716A (en) * 2020-06-16 2020-10-20 电子科技大学 Single target tracking method based on Siamese network
CN111882580A (en) * 2020-07-17 2020-11-03 元神科技(杭州)有限公司 Video multi-target tracking method and system
CN111915644A (en) * 2020-07-09 2020-11-10 苏州科技大学 Real-time target tracking method of twin guiding anchor frame RPN network
CN111932579A (en) * 2020-08-12 2020-11-13 广东技术师范大学 Method and device for adjusting equipment angle based on motion trail of tracked target
CN112001252A (en) * 2020-07-22 2020-11-27 北京交通大学 Multi-target tracking method based on heteromorphic graph network
CN112037254A (en) * 2020-08-11 2020-12-04 浙江大华技术股份有限公司 Target tracking method and related device
CN112215080A (en) * 2020-09-16 2021-01-12 电子科技大学 Target tracking method using time sequence information
CN112257527A (en) * 2020-10-10 2021-01-22 西南交通大学 Mobile phone detection method based on multi-target fusion and space-time video sequence
CN112464769A (en) * 2020-11-18 2021-03-09 西北工业大学 High-resolution remote sensing image target detection method based on consistent multi-stage detection
CN112489081A (en) * 2020-11-30 2021-03-12 北京航空航天大学 Visual target tracking method and device
CN112598739A (en) * 2020-12-25 2021-04-02 哈尔滨工业大学(深圳) Mobile robot infrared target tracking method and system based on space-time characteristic aggregation network and storage medium
CN112614159A (en) * 2020-12-22 2021-04-06 浙江大学 Cross-camera multi-target tracking method for warehouse scene
CN112633078A (en) * 2020-12-02 2021-04-09 西安电子科技大学 Target tracking self-correcting method, system, medium, equipment, terminal and application
CN112651994A (en) * 2020-12-18 2021-04-13 零八一电子集团有限公司 Ground multi-target tracking method
CN112816474A (en) * 2021-01-07 2021-05-18 武汉大学 Target perception-based depth twin network hyperspectral video target tracking method
CN112950675A (en) * 2021-03-18 2021-06-11 深圳市商汤科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN112967289A (en) * 2021-02-08 2021-06-15 上海西井信息科技有限公司 Security check package matching method, system, equipment and storage medium
CN112967315A (en) * 2021-03-02 2021-06-15 北京百度网讯科技有限公司 Target tracking method and device and electronic equipment
CN113112525A (en) * 2021-04-27 2021-07-13 北京百度网讯科技有限公司 Target tracking method, network model, and training method, device, and medium thereof
CN113160272A (en) * 2021-03-19 2021-07-23 苏州科达科技股份有限公司 Target tracking method and device, electronic equipment and storage medium
CN113344932A (en) * 2021-06-01 2021-09-03 电子科技大学 Semi-supervised single-target video segmentation method
CN113705588A (en) * 2021-10-28 2021-11-26 南昌工程学院 Twin network target tracking method and system based on convolution self-attention module
CN113763415A (en) * 2020-06-04 2021-12-07 北京达佳互联信息技术有限公司 Target tracking method and device, electronic equipment and storage medium
CN114170271A (en) * 2021-11-18 2022-03-11 安徽清新互联信息科技有限公司 Multi-target tracking method with self-tracking consciousness, equipment and storage medium
WO2022116868A1 (en) * 2020-12-03 2022-06-09 Ping An Technology (Shenzhen) Co., Ltd. Method, device, and computer program product for deep lesion tracker for monitoring lesions in four-dimensional longitudinal imaging
CN115359240A (en) * 2022-07-15 2022-11-18 北京中科思创云智能科技有限公司 Small target detection method, device and equipment based on multi-frame image motion characteristics
CN115661207A (en) * 2022-11-14 2023-01-31 南昌工程学院 Target tracking method and system based on space consistency matching and weight learning
CN115984332A (en) * 2023-02-14 2023-04-18 北京卓翼智能科技有限公司 Unmanned aerial vehicle tracking method and device, electronic equipment and storage medium
CN116977902A (en) * 2023-08-14 2023-10-31 长春工业大学 Target tracking method and system for on-board photoelectric stabilized platform of coastal defense
WO2023207276A1 (en) * 2022-04-29 2023-11-02 京东方科技集团股份有限公司 Area location update method, security and protection system, and computer-readable storage medium
WO2023216572A1 (en) * 2022-05-07 2023-11-16 深圳先进技术研究院 Cross-video target tracking method and system, and electronic device and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574445A (en) * 2015-01-23 2015-04-29 北京航空航天大学 Target tracking method and device
US20170286774A1 (en) * 2016-04-04 2017-10-05 Xerox Corporation Deep data association for online multi-class multi-object tracking
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN109325967A (en) * 2018-09-14 2019-02-12 腾讯科技(深圳)有限公司 Method for tracking target, device, medium and equipment
CN109376572A (en) * 2018-08-09 2019-02-22 同济大学 Real-time vehicle detection and trace tracking method in traffic video based on deep learning
CN109785385A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Visual target tracking method and system
CN109948611A (en) * 2019-03-14 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device that method, the information of information area determination are shown
CN109978921A (en) * 2019-04-01 2019-07-05 南京信息工程大学 A kind of real-time video target tracking algorithm based on multilayer attention mechanism
CN110096960A (en) * 2019-04-03 2019-08-06 罗克佳华科技集团股份有限公司 Object detection method and device
CN110097575A (en) * 2019-04-28 2019-08-06 电子科技大学 A kind of method for tracking target based on local feature and scale pond
CN110111363A (en) * 2019-04-28 2019-08-09 深兰科技(上海)有限公司 A kind of tracking and equipment based on target detection
CN110210551A (en) * 2019-05-28 2019-09-06 北京工业大学 A kind of visual target tracking method based on adaptive main body sensitivity
CN110298404A (en) * 2019-07-02 2019-10-01 西南交通大学 A kind of method for tracking target based on triple twin Hash e-learnings
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574445A (en) * 2015-01-23 2015-04-29 北京航空航天大学 Target tracking method and device
US20170286774A1 (en) * 2016-04-04 2017-10-05 Xerox Corporation Deep data association for online multi-class multi-object tracking
CN107403175A (en) * 2017-09-21 2017-11-28 昆明理工大学 Visual tracking method and Visual Tracking System under a kind of movement background
CN109191491A (en) * 2018-08-03 2019-01-11 华中科技大学 The method for tracking target and system of the twin network of full convolution based on multilayer feature fusion
CN109376572A (en) * 2018-08-09 2019-02-22 同济大学 Real-time vehicle detection and trace tracking method in traffic video based on deep learning
CN109325967A (en) * 2018-09-14 2019-02-12 腾讯科技(深圳)有限公司 Method for tracking target, device, medium and equipment
CN109785385A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Visual target tracking method and system
CN109948611A (en) * 2019-03-14 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device that method, the information of information area determination are shown
CN109978921A (en) * 2019-04-01 2019-07-05 南京信息工程大学 A kind of real-time video target tracking algorithm based on multilayer attention mechanism
CN110096960A (en) * 2019-04-03 2019-08-06 罗克佳华科技集团股份有限公司 Object detection method and device
CN110097575A (en) * 2019-04-28 2019-08-06 电子科技大学 A kind of method for tracking target based on local feature and scale pond
CN110111363A (en) * 2019-04-28 2019-08-09 深兰科技(上海)有限公司 A kind of tracking and equipment based on target detection
CN110210551A (en) * 2019-05-28 2019-09-06 北京工业大学 A kind of visual target tracking method based on adaptive main body sensitivity
CN110335290A (en) * 2019-06-04 2019-10-15 大连理工大学 Twin candidate region based on attention mechanism generates network target tracking method
CN110298404A (en) * 2019-07-02 2019-10-01 西南交通大学 A kind of method for tracking target based on triple twin Hash e-learnings

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张沁怡: "基于深度卷积网络的人车检测及跟踪算法研究", pages 9 - 18 *

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724409A (en) * 2020-05-18 2020-09-29 浙江工业大学 Target tracking method based on densely connected twin neural network
CN113763415A (en) * 2020-06-04 2021-12-07 北京达佳互联信息技术有限公司 Target tracking method and device, electronic equipment and storage medium
CN113763415B (en) * 2020-06-04 2024-03-08 北京达佳互联信息技术有限公司 Target tracking method, device, electronic equipment and storage medium
CN111797716A (en) * 2020-06-16 2020-10-20 电子科技大学 Single target tracking method based on Siamese network
CN111797716B (en) * 2020-06-16 2022-05-03 电子科技大学 Single target tracking method based on Siamese network
CN111915644A (en) * 2020-07-09 2020-11-10 苏州科技大学 Real-time target tracking method of twin guiding anchor frame RPN network
CN111915644B (en) * 2020-07-09 2023-07-04 苏州科技大学 Real-time target tracking method of twin guide anchor frame RPN network
CN111882580A (en) * 2020-07-17 2020-11-03 元神科技(杭州)有限公司 Video multi-target tracking method and system
CN111882580B (en) * 2020-07-17 2023-10-24 元神科技(杭州)有限公司 Video multi-target tracking method and system
CN112001252A (en) * 2020-07-22 2020-11-27 北京交通大学 Multi-target tracking method based on heteromorphic graph network
CN112001252B (en) * 2020-07-22 2024-04-12 北京交通大学 Multi-target tracking method based on different composition network
CN112037254A (en) * 2020-08-11 2020-12-04 浙江大华技术股份有限公司 Target tracking method and related device
CN111932579A (en) * 2020-08-12 2020-11-13 广东技术师范大学 Method and device for adjusting equipment angle based on motion trail of tracked target
CN112215080B (en) * 2020-09-16 2022-05-03 电子科技大学 Target tracking method using time sequence information
CN112215080A (en) * 2020-09-16 2021-01-12 电子科技大学 Target tracking method using time sequence information
CN112257527B (en) * 2020-10-10 2022-09-02 西南交通大学 Mobile phone detection method based on multi-target fusion and space-time video sequence
CN112257527A (en) * 2020-10-10 2021-01-22 西南交通大学 Mobile phone detection method based on multi-target fusion and space-time video sequence
CN112464769A (en) * 2020-11-18 2021-03-09 西北工业大学 High-resolution remote sensing image target detection method based on consistent multi-stage detection
CN112489081A (en) * 2020-11-30 2021-03-12 北京航空航天大学 Visual target tracking method and device
CN112633078A (en) * 2020-12-02 2021-04-09 西安电子科技大学 Target tracking self-correcting method, system, medium, equipment, terminal and application
CN112633078B (en) * 2020-12-02 2024-02-02 西安电子科技大学 Target tracking self-correction method, system, medium, equipment, terminal and application
WO2022116868A1 (en) * 2020-12-03 2022-06-09 Ping An Technology (Shenzhen) Co., Ltd. Method, device, and computer program product for deep lesion tracker for monitoring lesions in four-dimensional longitudinal imaging
CN112651994A (en) * 2020-12-18 2021-04-13 零八一电子集团有限公司 Ground multi-target tracking method
CN112614159A (en) * 2020-12-22 2021-04-06 浙江大学 Cross-camera multi-target tracking method for warehouse scene
CN112598739B (en) * 2020-12-25 2023-09-01 哈尔滨工业大学(深圳) Mobile robot infrared target tracking method, system and storage medium based on space-time characteristic aggregation network
CN112598739A (en) * 2020-12-25 2021-04-02 哈尔滨工业大学(深圳) Mobile robot infrared target tracking method and system based on space-time characteristic aggregation network and storage medium
CN112816474B (en) * 2021-01-07 2022-02-01 武汉大学 Target perception-based depth twin network hyperspectral video target tracking method
CN112816474A (en) * 2021-01-07 2021-05-18 武汉大学 Target perception-based depth twin network hyperspectral video target tracking method
CN112967289A (en) * 2021-02-08 2021-06-15 上海西井信息科技有限公司 Security check package matching method, system, equipment and storage medium
CN112967315A (en) * 2021-03-02 2021-06-15 北京百度网讯科技有限公司 Target tracking method and device and electronic equipment
CN112967315B (en) * 2021-03-02 2022-08-02 北京百度网讯科技有限公司 Target tracking method and device and electronic equipment
CN112950675A (en) * 2021-03-18 2021-06-11 深圳市商汤科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN113160272B (en) * 2021-03-19 2023-04-07 苏州科达科技股份有限公司 Target tracking method and device, electronic equipment and storage medium
CN113160272A (en) * 2021-03-19 2021-07-23 苏州科达科技股份有限公司 Target tracking method and device, electronic equipment and storage medium
CN113112525B (en) * 2021-04-27 2023-09-01 北京百度网讯科技有限公司 Target tracking method, network model, training method, training device and training medium thereof
CN113112525A (en) * 2021-04-27 2021-07-13 北京百度网讯科技有限公司 Target tracking method, network model, and training method, device, and medium thereof
CN113344932A (en) * 2021-06-01 2021-09-03 电子科技大学 Semi-supervised single-target video segmentation method
CN113705588B (en) * 2021-10-28 2022-01-25 南昌工程学院 Twin network target tracking method and system based on convolution self-attention module
CN113705588A (en) * 2021-10-28 2021-11-26 南昌工程学院 Twin network target tracking method and system based on convolution self-attention module
CN114170271A (en) * 2021-11-18 2022-03-11 安徽清新互联信息科技有限公司 Multi-target tracking method with self-tracking consciousness, equipment and storage medium
CN114170271B (en) * 2021-11-18 2024-04-12 安徽清新互联信息科技有限公司 Multi-target tracking method, equipment and storage medium with self-tracking consciousness
WO2023207276A1 (en) * 2022-04-29 2023-11-02 京东方科技集团股份有限公司 Area location update method, security and protection system, and computer-readable storage medium
WO2023216572A1 (en) * 2022-05-07 2023-11-16 深圳先进技术研究院 Cross-video target tracking method and system, and electronic device and storage medium
CN115359240B (en) * 2022-07-15 2024-03-15 北京中科思创云智能科技有限公司 Small target detection method, device and equipment based on multi-frame image motion characteristics
CN115359240A (en) * 2022-07-15 2022-11-18 北京中科思创云智能科技有限公司 Small target detection method, device and equipment based on multi-frame image motion characteristics
CN115661207A (en) * 2022-11-14 2023-01-31 南昌工程学院 Target tracking method and system based on space consistency matching and weight learning
CN115984332A (en) * 2023-02-14 2023-04-18 北京卓翼智能科技有限公司 Unmanned aerial vehicle tracking method and device, electronic equipment and storage medium
CN116977902B (en) * 2023-08-14 2024-01-23 长春工业大学 Target tracking method and system for on-board photoelectric stabilized platform of coastal defense
CN116977902A (en) * 2023-08-14 2023-10-31 长春工业大学 Target tracking method and system for on-board photoelectric stabilized platform of coastal defense

Similar Documents

Publication Publication Date Title
CN111161311A (en) Visual multi-target tracking method and device based on deep learning
JP7236545B2 (en) Video target tracking method and apparatus, computer apparatus, program
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
US11842487B2 (en) Detection model training method and apparatus, computer device and storage medium
CN107895367B (en) Bone age identification method and system and electronic equipment
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN109446889B (en) Object tracking method and device based on twin matching network
KR101640998B1 (en) Image processing apparatus and image processing method
CN105844283A (en) Method for identifying category of image, image search method and image search device
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN112712546A (en) Target tracking method based on twin neural network
CN105303163B (en) A kind of method and detection device of target detection
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN112102929A (en) Medical image labeling method and device, storage medium and electronic equipment
WO2021103474A1 (en) Image processing method and apparatus, storage medium and electronic apparatus
Meng et al. Globally measuring the similarity of superpixels by binary edge maps for superpixel clustering
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN115862119B (en) Attention mechanism-based face age estimation method and device
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN110956157A (en) Deep learning remote sensing image target detection method and device based on candidate frame selection
CN116246161A (en) Method and device for identifying target fine type of remote sensing image under guidance of domain knowledge
CN114743045B (en) Small sample target detection method based on double-branch area suggestion network
CN110633630A (en) Behavior identification method and device and terminal equipment
Nugroho et al. Comparison of deep learning-based object classification methods for detecting tomato ripeness
CN115527050A (en) Image feature matching method, computer device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination