CN114140494A - Single-target tracking system and method in complex scene, electronic device and storage medium - Google Patents
Single-target tracking system and method in complex scene, electronic device and storage medium Download PDFInfo
- Publication number
- CN114140494A CN114140494A CN202110742736.6A CN202110742736A CN114140494A CN 114140494 A CN114140494 A CN 114140494A CN 202110742736 A CN202110742736 A CN 202110742736A CN 114140494 A CN114140494 A CN 114140494A
- Authority
- CN
- China
- Prior art keywords
- tracking
- target
- frame
- candidate
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000013135 deep learning Methods 0.000 claims abstract description 29
- 238000007781 pre-processing Methods 0.000 claims abstract description 24
- 238000012216 screening Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 230000007547 defect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a single-target tracking system, a method, electronic equipment and a storage medium under a complex scene, wherein the tracking system comprises: a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target; a tracking prescreening module for performing: acquiring a template region and a search region of the preprocessing module, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison module; a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity; a threshold linkage module to perform: and adjusting the size of the search area of the candidate template in the preprocessing module according to the confidence coefficient of the output candidate frame.
Description
Technical Field
The invention relates to a single-target tracking system and method in a complex scene, electronic equipment and a storage medium, and belongs to the technical field of video monitoring and security.
Background
Target tracking is an important component in computer vision research, and has great application requirements in the fields of monitoring security, unmanned driving, accurate guidance and the like. The application scenes can be divided into a civil field and a military field, and the two fields have own characteristics respectively. In the civil field, as the occurrence time and the duration of a target are uncertain, a video monitoring system needs to execute work for a long time with high stability; in the military field, the flight speed of a high-speed maneuvering target can exceed Mach 5, and a tracking system is required to ensure real-time performance and accuracy in a complex battlefield environment. Due to the fact that the situation exists, the target to be tracked is identified and marked manually and cannot meet the requirements of practical application on a tracking system, and the research of a target tracking algorithm replacing a manual method is of great significance.
In recent years, a single-target tracking algorithm based on a twin network series of deep learning is greatly improved, but in a practical scene, the interference encountered by a target is more extremely complicated, so that the tracking performance is greatly reduced.
Patent 1: CN 201910882990.9A robust single target tracking method based on deep learning. The method is characterized in that whether template updating is started or not is determined by setting a threshold, the template is updated by using confidence coefficient, and the characteristics are updated in time by using the change of a target, so that error tracking caused by updating the template is avoided. Patent 2: a single-target tracking method based on multiple characteristics is CN 110807794A. The design adopts a correlation filter tracking method to respectively carry out correlation operation on the convolution characteristic and the difference image characteristic, and after response graphs obtained by the correlation operation are fused, a target is tracked by taking a fusion result as a dynamic target coordinate correction basis.
Disclosure of Invention
The disadvantages of the prior art are as follows: patent 1: a robust single target tracking method based on deep learning has the main defects that a tracked score is used for setting a threshold value to update a target template, but once a plurality of targets with the same attribute are nearby, other targets are likely to be tracked and still have higher confidence coefficient, so that the targets are completely lost when the next template is updated wrongly; patent 2: the main design defect of the multi-feature-based single target tracking method is that a traditional target tracking algorithm is still used, and when a target is greatly deformed, and illumination changes and is shielded, the target still can be lost, and subsequent false alarm and the like are brought.
The invention aims to overcome the technical defects in the prior art, solve the technical problems and provide a single-target tracking system, a single-target tracking method, electronic equipment and a storage medium in a complex scene.
The invention specifically adopts the following technical scheme: single target tracking system under complicated scene includes:
a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
a tracking prescreening module for performing: acquiring a template region and a search region of the preprocessing module, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison module;
a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
a threshold linkage module to perform: and adjusting the sizes of the candidate template region and the search region in the preprocessing module according to the confidence degree of the output candidate frame.
The invention also provides a single-target tracking method in a complex scene, which comprises the following steps:
the pretreatment step specifically comprises the following steps: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
tracking and primary screening, specifically comprising: acquiring a template region and a search region of the preprocessing step, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison step;
the characteristic comparison step specifically comprises the following steps: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
the threshold value linkage step specifically comprises the following steps: and adjusting the sizes of the candidate template region and the search region in the preprocessing step according to the confidence degree of the output candidate frame.
As a preferred embodiment, the preprocessing step specifically includes:
step SS 11: detecting the video stream by using a target detection algorithm, or manually selecting a current Frame of the video streaminitTo obtain the frame B of the tracked targetinitThe template area is cut to obtain an object O _ crop;
step SS 12: RGB three channels (R) for obtaining images in video streammean,Gmean,Bmean) The RGB three-channel mean value is: RGB (Red, Green, blue) color filtermean=(Rmean+Gmean+Bmean)/3;
Step SS 13: respectively calculating the sizes of the template area and the search area according to a formula (1) and a formula (2); with a frame BinitThe center of the template picture is taken as a central point, a square with the length and the width both being Z _ sz is cut out from an original picture, then the size of the square is adjusted to (127 ) through an interpolation algorithm to obtain a template picture Z _ crop, similarly, the square with the length and the width both being X _ sz is cut out from a current frame according to the center of a frame transmitted by a previous frame tracking frame, and then the square is adjusted to (271 ) through the interpolation algorithm to obtain a search area X _ crop; once clipping exceeds the image boundary, the mean RGB in step SS12 is usedmeanFilling pixels to ensure that the area obtained by cutting is positioned in the image;
x_sz=z_sz*271/127 (2)
wherein, x is the horizontal coordinate of the center point of the initial frame, y is the vertical coordinate of the center of the initial frame, w is the width of the initial frame, and h is the height of the initial frame.
As a preferred embodiment, the tracking prescreening step specifically includes:
step SS 21: collecting data sets required by single target tracking, wherein the data sets comprise nine data sets including COCO, GOT10K, VOT2020, LASOT, TrackingNet, VID, DET, YOUTUBEBB and UAV123 for training a neural network;
step SS 22: training to obtain a single target tracking model based on deep learning;
step SS 23: respectively transmitting the template region and the search region into a single-target tracking model in the step SS22 to obtain a series of candidate frames and corresponding confidence degrees, then sequencing the candidate frames from large to small according to the confidence degrees through an NMS algorithm, selecting the first 10 frames to obtain a candidate set of tracking frames { (B)1,Scoretracking1),(B2,Scoretracking2)...(B10,Scoretracking10) }; wherein B represents the coordinates of the frame, ScoretrackingRepresenting the corresponding confidence score;
step SS 24: and cutting the corresponding 10 borders in the step SS23 in the video frame in the original image to obtain candidate crop areas.
As a preferred embodiment, the threshold of the NMS algorithm in said step SS23 is 0.4.
As a preferred embodiment, the feature comparison step specifically includes:
step SS 31: collecting a re-recognition data set of the pedestrian for training to obtain a deep learning metric learning network, wherein the deep learning metric learning network measures the cosine distance between learning objects, and then finds the cluster to which the deep learning metric learning network belongs according to the nearest distance;
step SS 32: adjusting the size of each of the O _ crop obtained in the step SS11 in the preprocessing step and the 10 candidate regions obtained in the step SS24 in the tracking preliminary screening step to 64 (width) × 128 (height);
step SS 33: transmitting the regions obtained by cutting in the step SS32 to the deep learning metric learning network in the step SS31 to obtain respective 128-dimensional Feature vectors FeatureinitAnd Featurecandidate={Featurecandidate1,Featurecandidate2...,Featurecandidate10Are calculated, and then respective cosine similarity scores, Score, are calculated, respectivelyreid={Scorereid1,Scorereid2...,Scorereid10};
Step SS 34: and (4) fusing the tracking Score with the Score of the candidate frame according to the formula (3) to obtain a final tracking Scorefinal;
Scorefinal=w*Scoretracking+(1-w)*Scorereid (3)
Step SS 35: will ScorefinalAnd sorting from large to small, and outputting the frame corresponding to the highest confidence coefficient.
In a preferred embodiment, w in step SS34 is 0.4.
As a preferred embodiment, the threshold linking step specifically includes:
step SS 41: judgment of ScorefinalWhether or not it is greater than the threshold value ScorethresholdIf yes, the module is not entered, and the module is exited; such asIf the threshold value is less than the threshold value, the step SS42 is entered;
step SS 42: readjusting the size of the search area according to the formula (4), and then performing crop processing when the next frame is used;
the re-search area is:
x_sz=1.5*z_sz*271/127 (4)。
the invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the method being implemented when the processor executes the program.
The invention also proposes a storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method.
The invention achieves the following beneficial effects: 1: high-precision, stable and real-time tracking can be realized for any framed target with the size exceeding 10 pixel points, a tracking model with better performance is obtained by using a richer training data set, and tracking can be completed even if the target is subjected to distance deformation and illumination influence; 2: when the target passes through the complex interference of the same attribute class, the patent can still realize stable tracking and does not lose the tracking target, 3: the mutual fusion of the feature re-comparison and the deep learning tracking algorithm greatly reduces the probability of tracking failure, and the tracking reliability is further improved by re-feature comparison of the tracking candidate frame; 4: when the target is lost on the picture for a period of time and reappears, the method can still realize accurate identification and retracing of the target.
Drawings
FIG. 1 is a schematic diagram of a principle topology of a deep learning-based single-target tracking system in a complex scenario of the present invention;
FIG. 2 is a flowchart of a single-target tracking method based on deep learning in a complex scene.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1: the invention provides a single-target tracking system under a complex scene, which comprises:
a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
a tracking prescreening module for performing: acquiring a template area and a search area of the preprocessing module, wherein the tracking module adopts the latest single-target tracking algorithm SimCar based on deep learning at present, screens out the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and then transmits the first 10 candidate tracking frames into a feature comparison module;
a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
a threshold linkage module to perform: adjusting the size of the search area of the candidate template in the preprocessing module according to the confidence coefficient of the output candidate frame; in order to achieve the ability for the disappearance of the target to reappear while still being recognizable and trackable.
The single-target tracking system under the complex scene can greatly improve the tracking precision and robustness of the target under various complex conditions, consumes less time and can completely achieve the real-time effect.
Example 2: the invention also provides a single-target tracking method in a complex scene, which comprises the following steps:
the pretreatment step specifically comprises the following steps: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
tracking and primary screening, specifically comprising: obtaining a template area and a search area of the preprocessing step, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS algorithm, and then transmitting the first 10 candidate tracking frames to a feature comparison step;
the characteristic comparison step specifically comprises the following steps: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
the threshold value linkage step specifically comprises the following steps: and adjusting the sizes of the candidate template region and the search region in the preprocessing step according to the confidence degree of the output candidate frame.
Preferably, the pretreatment step specifically comprises:
step SS 11: detecting the video stream by using a target detection algorithm (both a traditional detection algorithm and a deep learning algorithm), or manually selecting a current Frame of the video streaminitTo obtain the frame B of the tracked targetinitThe template area is cut to obtain an object O _ crop;
step SS 12: RGB three channels (R) for obtaining images in video streammean,Gmean,Bmean) The RGB three-channel mean value is: RGB (Red, Green, blue) color filtermean=(Rmean+Gmean+Bmean)/3;
Step SS 13: respectively calculating the sizes of the template area and the search area according to a formula (1) and a formula (2); with a frame BinitThe center of the template picture is taken as a central point, a square with the length and the width both being Z _ sz is cut out from an original picture, then the size of the square is adjusted to (127 ) through an interpolation algorithm to obtain a template picture Z _ crop, similarly, the square with the length and the width both being X _ sz is cut out from a current frame according to the center of a frame transmitted by a previous frame tracking frame, and then the square is adjusted to (271 ) through the interpolation algorithm to obtain a search area X _ crop; once clipping exceeds the image boundary, the mean RGB in step SS12 is usedmeanFilling pixels to ensure that the area obtained by cutting is positioned in the image;
x_sz=z_sz*271/127 (2)
wherein, x is the horizontal coordinate of the center point of the initial frame, y is the vertical coordinate of the center of the initial frame, w is the width of the initial frame, and h is the height of the initial frame.
Preferably, the tracking prescreening step specifically includes:
step SS 21: collecting data sets required by single target tracking, wherein the data sets comprise nine data sets including COCO, GOT10K, VOT2020, LASOT, TrackingNet, VID, DET, YOUTUBEBB and UAV123 for training a neural network; the data set contains richer data, so that the robustness of the target is further improved;
step SS 22: training to obtain a single-target tracking model based on deep learning according to a single-target tracking algorithm siamcar;
step SS 23: respectively transmitting the template region and the search region into a single-target tracking model in the step SS22 to obtain a series of candidate frames and corresponding confidence degrees, then sequencing the candidate frames from large to small according to the confidence degrees through an NMS algorithm, selecting the first 10 frames to obtain a candidate set of tracking frames { (B)1,Scoretracking1),(B2,Scoretracking2)...(B10,Scoretracking10) }; wherein B represents the coordinates of the frame, ScoretrackingRepresenting the corresponding confidence score;
step SS 24: and (4) cutting 10 frames in the corresponding step SS23 in the video frame to obtain candidate crop areas.
Preferably, the threshold of the NMS algorithm in step SS23 is 0.4, and experiments show that the value has a good effect on classes such as pedestrians.
Preferably, the feature comparison step specifically includes:
step SS 31: according to the table 1, a re-recognition data set of pedestrians is collected and trained to obtain a deep learning metric learning network, the deep learning metric learning network measures cosine distances between learning objects, and then the clusters to which the deep learning metric learning network belongs are found according to the nearest distance;
step SS 32: adjusting the size of each of the O _ crop obtained in the step SS11 in the preprocessing step and the 10 candidate regions obtained in the step SS24 in the tracking preliminary screening step to 64 (width) × 128 (height);
step SS 33: transmitting the regions obtained by cutting in the step SS32 to the deep learning metric learning network in the step SS31 to obtain respective 128-dimensional Feature vectors FeatureinitAnd Featurecandidate={Featurecandidate1,Featurecandidate2...,Featurecandidate10Are calculated, and then respective cosine similarity scores, Score, are calculated, respectivelyreid={Scorereid1,Scorereid2...,Scorereid10};
Step SS 34: and (4) fusing the tracking Score with the Score of the candidate frame according to the formula (3) to obtain a final tracking Scorefinal;
Scorefinal=w*Scoretracking+(1-w)*Scorereid (3)
Step SS 35: will ScorefinalAnd sorting from large to small, and outputting the frame corresponding to the highest confidence coefficient.
Table one: and learning a network architecture based on deep learning depth cosine measurement.
Preferably, w in the step SS34 is 0.4, and experiments prove that when w is 0.4, the experimental effect is the best.
Preferably, the threshold value linkage step specifically includes: setting a threshold according to the output confidence, and starting a linkage mechanism to react on a preprocessing module when the score is smaller than the threshold to enlarge a search area so as to further find a target which temporarily disappears;
step SS 41: judgment of ScorefinalWhether or not it is greater than the threshold value ScorethresholdIf yes, the module is not entered, and the module is exited; if less than the threshold, go to step SS 42;
step SS 42: readjusting the size of the search area according to the formula (4), and then performing crop processing when the next frame is used;
the re-search area is:
x_sz=1.5*z_sz*271/127 (4)。
the invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the method being implemented when the processor executes the program.
The invention also proposes a storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.
Claims (10)
1. Single target tracking system under complicated scene, its characterized in that includes:
a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
a tracking prescreening module for performing: acquiring a template region and a search region of the preprocessing module, transmitting the template region and the search region into a single-target tracking algorithm for deep learning, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and then transmitting the first 10 candidate tracking frames into a feature comparison module;
a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
a threshold linkage module to perform: and adjusting the size of the search area of the candidate template in the preprocessing module according to the confidence coefficient of the output candidate frame.
2. The single-target tracking method under the complex scene is characterized by comprising the following steps:
the pretreatment step specifically comprises the following steps: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
tracking and primary screening, specifically comprising: acquiring a template region and a search region of the preprocessing step, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison step;
the characteristic comparison step specifically comprises the following steps: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
the threshold value linkage step specifically comprises the following steps: and adjusting the size of the search area of the candidate template in the preprocessing step according to the confidence degree of the output candidate frame.
3. The method for tracking the single target under the complex scene according to claim 2, wherein the preprocessing step specifically comprises:
step SS 11: detecting the video stream by using a target detection algorithm, or manually selecting a current Frame of the video streaminitTo obtain the frame B of the tracked targetinitThe template area is cut to obtain an object O _ crop;
step SS 12: RGB three channels (R) for obtaining images in video streammean,Gmean,Bmean) The RGB three-channel mean value is: RGB (Red, Green, blue) color filtermean=(Rmean+Gmean+Bmean)/3;
Step SS 13: respectively calculating the sizes of the template area and the search area according to a formula (1) and a formula (2); with a frame BinitThe center of the template picture is taken as a central point, a square with the length and the width both being Z _ sz is cut out from an original picture, then the size of the square is adjusted to (127 ) through an interpolation algorithm to obtain a template picture Z _ crop, similarly, the square with the length and the width both being X _ sz is cut out from a current frame according to the center of a frame transmitted by a previous frame tracking frame, and then the square is adjusted to (271 ) through the interpolation algorithm to obtain a search area X _ crop; once clipping exceeds the image boundary, the mean RGB in step SS12 is usedmeanFilling pixels to ensure that the area obtained by cutting is positioned in the image;
x_sz=z_sz*271/127 (2)
wherein, x is the horizontal coordinate of the center point of the initial frame, y is the vertical coordinate of the center of the initial frame, w is the width of the initial frame, and h is the height of the initial frame.
4. The single-target tracking method under the complex scene according to claim 2, wherein the tracking preliminary screening step specifically comprises:
step SS 21: collecting data sets required by single target tracking, wherein the data sets comprise nine data sets including COCO, GOT10K, VOT2020, LASOT, TrackingNet, VID, DET, YOUTUBEBB and UAV123 for training a neural network;
step SS 22: training to obtain a single-target tracking model based on deep learning according to a single-target tracking algorithm siamcar;
step SS 23: respectively transmitting the template region and the search region into a single-target tracking model in the step SS22 to obtain a series of candidate frames and corresponding confidence degrees, then sequencing the candidate frames from large to small according to the confidence degrees through an NMS algorithm, selecting the first 10 frames to obtain a candidate set of tracking frames { (B)1,Scoretracking1),(B2,Scoretracking2)...(B10,Scoretracking10) }; wherein B represents the coordinates of the frame, ScoretrackingRepresenting the corresponding confidence score;
step SS 24: and cutting the corresponding 10 borders in the step SS23 in the video frame in the original image to obtain candidate crop areas.
5. The single-target tracking method under the complex scene according to claim 4, wherein the threshold of the NMS algorithm in the step SS23 is 0.4.
6. The method for tracking a single target under a complex scene according to claim 2, wherein the feature comparison step specifically comprises:
step SS 31: collecting a re-recognition data set of the pedestrian for training to obtain a deep learning metric learning network, wherein the deep learning metric learning network measures the cosine distance between learning objects, and then finds the cluster to which the deep learning metric learning network belongs according to the nearest distance;
step SS 32: adjusting the size of each of the O _ crop obtained in the step SS11 in the preprocessing step and the 10 candidate regions obtained in the step SS24 in the tracking preliminary screening step to 64 (width) × 128 (height);
step SS 33: transmitting the regions obtained by cutting in the step SS32 to the deep learning metric learning network in the step SS31 to obtain respective 128-dimensional Feature vectors FeatureinitAnd Featurecandidate={Featurecandidate1,Featurecandidate2...,Featurecandidate10Are calculated, and then respective cosine similarity scores, Score, are calculated, respectivelyreid={Scorereid1,Scorereid2...,Scorereid10};
Step SS 34: and (4) fusing the tracking Score with the Score of the candidate frame according to the formula (3) to obtain a final tracking Scorefinal;
Scorefinal=w*Scoretracking+(1-w)*Scorereid
(3)
Step SS 35: will ScorefinalAnd sorting from large to small, and outputting the frame corresponding to the highest confidence coefficient.
7. The method for tracking the single target under the complex scene as recited in claim 6, wherein w in the step SS34 is selected to be 0.4.
8. The single-target tracking method under the complex scene according to claim 2, wherein the threshold value linkage step specifically comprises:
step SS 41: judgment of ScorefinalWhether or not it is greater than the threshold value ScorethresholdIf yes, the module is not entered, and the module is exited; if less than the threshold, go to step SS 42;
step SS 42: readjusting the size of the search area according to the formula (4), and then performing crop processing when the next frame is used;
the re-search area is: x _ sz ═ 1.5 ═ z _ sz ═ 271/127 (4).
9. Electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 2 to 8 are implemented when the processor executes the program.
10. Storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 2 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110742736.6A CN114140494A (en) | 2021-06-30 | 2021-06-30 | Single-target tracking system and method in complex scene, electronic device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110742736.6A CN114140494A (en) | 2021-06-30 | 2021-06-30 | Single-target tracking system and method in complex scene, electronic device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114140494A true CN114140494A (en) | 2022-03-04 |
Family
ID=80394204
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110742736.6A Pending CN114140494A (en) | 2021-06-30 | 2021-06-30 | Single-target tracking system and method in complex scene, electronic device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114140494A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114694184A (en) * | 2022-05-27 | 2022-07-01 | 电子科技大学 | Pedestrian re-identification method and system based on multi-template feature updating |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084829A (en) * | 2019-03-12 | 2019-08-02 | 上海阅面网络科技有限公司 | Method for tracking target, device, electronic equipment and computer readable storage medium |
CN110555870A (en) * | 2019-09-09 | 2019-12-10 | 北京理工大学 | DCF tracking confidence evaluation and classifier updating method based on neural network |
CN110647836A (en) * | 2019-09-18 | 2020-01-03 | 中国科学院光电技术研究所 | Robust single-target tracking method based on deep learning |
CN110853076A (en) * | 2019-11-08 | 2020-02-28 | 重庆市亿飞智联科技有限公司 | Target tracking method, device, equipment and storage medium |
TW202026940A (en) * | 2019-01-09 | 2020-07-16 | 圓展科技股份有限公司 | Target tracking method |
CN112001946A (en) * | 2020-07-14 | 2020-11-27 | 浙江大华技术股份有限公司 | Target object tracking method, computer equipment and device |
-
2021
- 2021-06-30 CN CN202110742736.6A patent/CN114140494A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW202026940A (en) * | 2019-01-09 | 2020-07-16 | 圓展科技股份有限公司 | Target tracking method |
CN110084829A (en) * | 2019-03-12 | 2019-08-02 | 上海阅面网络科技有限公司 | Method for tracking target, device, electronic equipment and computer readable storage medium |
CN110555870A (en) * | 2019-09-09 | 2019-12-10 | 北京理工大学 | DCF tracking confidence evaluation and classifier updating method based on neural network |
CN110647836A (en) * | 2019-09-18 | 2020-01-03 | 中国科学院光电技术研究所 | Robust single-target tracking method based on deep learning |
CN110853076A (en) * | 2019-11-08 | 2020-02-28 | 重庆市亿飞智联科技有限公司 | Target tracking method, device, equipment and storage medium |
CN112001946A (en) * | 2020-07-14 | 2020-11-27 | 浙江大华技术股份有限公司 | Target object tracking method, computer equipment and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114694184A (en) * | 2022-05-27 | 2022-07-01 | 电子科技大学 | Pedestrian re-identification method and system based on multi-template feature updating |
CN114694184B (en) * | 2022-05-27 | 2022-10-14 | 电子科技大学 | Pedestrian re-identification method and system based on multi-template feature updating |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111462200B (en) | Cross-video pedestrian positioning and tracking method, system and equipment | |
CN104598883B (en) | Target knows method for distinguishing again in a kind of multiple-camera monitoring network | |
CN112215155A (en) | Face tracking method and system based on multi-feature fusion | |
CN110189375B (en) | Image target identification method based on monocular vision measurement | |
CN108197604A (en) | Fast face positioning and tracing method based on embedded device | |
CN113223045B (en) | Vision and IMU sensor fusion positioning system based on dynamic object semantic segmentation | |
CN108564598B (en) | Improved online Boosting target tracking method | |
CN105160649A (en) | Multi-target tracking method and system based on kernel function unsupervised clustering | |
CN105718882A (en) | Resolution adaptive feature extracting and fusing for pedestrian re-identification method | |
CN110570453A (en) | Visual odometer method based on binocular vision and closed-loop tracking characteristics | |
CN110443279B (en) | Unmanned aerial vehicle image vehicle detection method based on lightweight neural network | |
CN111160212A (en) | Improved tracking learning detection system and method based on YOLOv3-Tiny | |
CN106599918B (en) | vehicle tracking method and system | |
CN115841649A (en) | Multi-scale people counting method for urban complex scene | |
CN114708300A (en) | Anti-blocking self-adaptive target tracking method and system | |
CN114926859A (en) | Pedestrian multi-target tracking method in dense scene combined with head tracking | |
CN116109950A (en) | Low-airspace anti-unmanned aerial vehicle visual detection, identification and tracking method | |
CN114140494A (en) | Single-target tracking system and method in complex scene, electronic device and storage medium | |
Yu et al. | A unified transformer based tracker for anti-uav tracking | |
CN113781523A (en) | Football detection tracking method and device, electronic equipment and storage medium | |
CN113096016A (en) | Low-altitude aerial image splicing method and system | |
CN115731287B (en) | Moving target retrieval method based on aggregation and topological space | |
CN107730535B (en) | Visible light infrared cascade video tracking method | |
CN115880643A (en) | Social distance monitoring method and device based on target detection algorithm | |
CN114283199B (en) | Dynamic scene-oriented dotted line fusion semantic SLAM method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |