CN114140494A - Single-target tracking system and method in complex scene, electronic device and storage medium - Google Patents

Single-target tracking system and method in complex scene, electronic device and storage medium Download PDF

Info

Publication number
CN114140494A
CN114140494A CN202110742736.6A CN202110742736A CN114140494A CN 114140494 A CN114140494 A CN 114140494A CN 202110742736 A CN202110742736 A CN 202110742736A CN 114140494 A CN114140494 A CN 114140494A
Authority
CN
China
Prior art keywords
tracking
target
frame
candidate
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110742736.6A
Other languages
Chinese (zh)
Inventor
苏晋鹏
曹颂
钟星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Turing Video Technology Co ltd
Original Assignee
Hangzhou Turing Video Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Turing Video Technology Co ltd filed Critical Hangzhou Turing Video Technology Co ltd
Priority to CN202110742736.6A priority Critical patent/CN114140494A/en
Publication of CN114140494A publication Critical patent/CN114140494A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a single-target tracking system, a method, electronic equipment and a storage medium under a complex scene, wherein the tracking system comprises: a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target; a tracking prescreening module for performing: acquiring a template region and a search region of the preprocessing module, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison module; a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity; a threshold linkage module to perform: and adjusting the size of the search area of the candidate template in the preprocessing module according to the confidence coefficient of the output candidate frame.

Description

Single-target tracking system and method in complex scene, electronic device and storage medium
Technical Field
The invention relates to a single-target tracking system and method in a complex scene, electronic equipment and a storage medium, and belongs to the technical field of video monitoring and security.
Background
Target tracking is an important component in computer vision research, and has great application requirements in the fields of monitoring security, unmanned driving, accurate guidance and the like. The application scenes can be divided into a civil field and a military field, and the two fields have own characteristics respectively. In the civil field, as the occurrence time and the duration of a target are uncertain, a video monitoring system needs to execute work for a long time with high stability; in the military field, the flight speed of a high-speed maneuvering target can exceed Mach 5, and a tracking system is required to ensure real-time performance and accuracy in a complex battlefield environment. Due to the fact that the situation exists, the target to be tracked is identified and marked manually and cannot meet the requirements of practical application on a tracking system, and the research of a target tracking algorithm replacing a manual method is of great significance.
In recent years, a single-target tracking algorithm based on a twin network series of deep learning is greatly improved, but in a practical scene, the interference encountered by a target is more extremely complicated, so that the tracking performance is greatly reduced.
Patent 1: CN 201910882990.9A robust single target tracking method based on deep learning. The method is characterized in that whether template updating is started or not is determined by setting a threshold, the template is updated by using confidence coefficient, and the characteristics are updated in time by using the change of a target, so that error tracking caused by updating the template is avoided. Patent 2: a single-target tracking method based on multiple characteristics is CN 110807794A. The design adopts a correlation filter tracking method to respectively carry out correlation operation on the convolution characteristic and the difference image characteristic, and after response graphs obtained by the correlation operation are fused, a target is tracked by taking a fusion result as a dynamic target coordinate correction basis.
Disclosure of Invention
The disadvantages of the prior art are as follows: patent 1: a robust single target tracking method based on deep learning has the main defects that a tracked score is used for setting a threshold value to update a target template, but once a plurality of targets with the same attribute are nearby, other targets are likely to be tracked and still have higher confidence coefficient, so that the targets are completely lost when the next template is updated wrongly; patent 2: the main design defect of the multi-feature-based single target tracking method is that a traditional target tracking algorithm is still used, and when a target is greatly deformed, and illumination changes and is shielded, the target still can be lost, and subsequent false alarm and the like are brought.
The invention aims to overcome the technical defects in the prior art, solve the technical problems and provide a single-target tracking system, a single-target tracking method, electronic equipment and a storage medium in a complex scene.
The invention specifically adopts the following technical scheme: single target tracking system under complicated scene includes:
a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
a tracking prescreening module for performing: acquiring a template region and a search region of the preprocessing module, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison module;
a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
a threshold linkage module to perform: and adjusting the sizes of the candidate template region and the search region in the preprocessing module according to the confidence degree of the output candidate frame.
The invention also provides a single-target tracking method in a complex scene, which comprises the following steps:
the pretreatment step specifically comprises the following steps: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
tracking and primary screening, specifically comprising: acquiring a template region and a search region of the preprocessing step, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison step;
the characteristic comparison step specifically comprises the following steps: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
the threshold value linkage step specifically comprises the following steps: and adjusting the sizes of the candidate template region and the search region in the preprocessing step according to the confidence degree of the output candidate frame.
As a preferred embodiment, the preprocessing step specifically includes:
step SS 11: detecting the video stream by using a target detection algorithm, or manually selecting a current Frame of the video streaminitTo obtain the frame B of the tracked targetinitThe template area is cut to obtain an object O _ crop;
step SS 12: RGB three channels (R) for obtaining images in video streammean,Gmean,Bmean) The RGB three-channel mean value is: RGB (Red, Green, blue) color filtermean=(Rmean+Gmean+Bmean)/3;
Step SS 13: respectively calculating the sizes of the template area and the search area according to a formula (1) and a formula (2); with a frame BinitThe center of the template picture is taken as a central point, a square with the length and the width both being Z _ sz is cut out from an original picture, then the size of the square is adjusted to (127 ) through an interpolation algorithm to obtain a template picture Z _ crop, similarly, the square with the length and the width both being X _ sz is cut out from a current frame according to the center of a frame transmitted by a previous frame tracking frame, and then the square is adjusted to (271 ) through the interpolation algorithm to obtain a search area X _ crop; once clipping exceeds the image boundary, the mean RGB in step SS12 is usedmeanFilling pixels to ensure that the area obtained by cutting is positioned in the image;
Figure RE-GDA0003454206420000041
x_sz=z_sz*271/127 (2)
wherein, x is the horizontal coordinate of the center point of the initial frame, y is the vertical coordinate of the center of the initial frame, w is the width of the initial frame, and h is the height of the initial frame.
As a preferred embodiment, the tracking prescreening step specifically includes:
step SS 21: collecting data sets required by single target tracking, wherein the data sets comprise nine data sets including COCO, GOT10K, VOT2020, LASOT, TrackingNet, VID, DET, YOUTUBEBB and UAV123 for training a neural network;
step SS 22: training to obtain a single target tracking model based on deep learning;
step SS 23: respectively transmitting the template region and the search region into a single-target tracking model in the step SS22 to obtain a series of candidate frames and corresponding confidence degrees, then sequencing the candidate frames from large to small according to the confidence degrees through an NMS algorithm, selecting the first 10 frames to obtain a candidate set of tracking frames { (B)1,Scoretracking1),(B2,Scoretracking2)...(B10,Scoretracking10) }; wherein B represents the coordinates of the frame, ScoretrackingRepresenting the corresponding confidence score;
step SS 24: and cutting the corresponding 10 borders in the step SS23 in the video frame in the original image to obtain candidate crop areas.
As a preferred embodiment, the threshold of the NMS algorithm in said step SS23 is 0.4.
As a preferred embodiment, the feature comparison step specifically includes:
step SS 31: collecting a re-recognition data set of the pedestrian for training to obtain a deep learning metric learning network, wherein the deep learning metric learning network measures the cosine distance between learning objects, and then finds the cluster to which the deep learning metric learning network belongs according to the nearest distance;
step SS 32: adjusting the size of each of the O _ crop obtained in the step SS11 in the preprocessing step and the 10 candidate regions obtained in the step SS24 in the tracking preliminary screening step to 64 (width) × 128 (height);
step SS 33: transmitting the regions obtained by cutting in the step SS32 to the deep learning metric learning network in the step SS31 to obtain respective 128-dimensional Feature vectors FeatureinitAnd Featurecandidate={Featurecandidate1,Featurecandidate2...,Featurecandidate10Are calculated, and then respective cosine similarity scores, Score, are calculated, respectivelyreid={Scorereid1,Scorereid2...,Scorereid10};
Step SS 34: and (4) fusing the tracking Score with the Score of the candidate frame according to the formula (3) to obtain a final tracking Scorefinal
Scorefinal=w*Scoretracking+(1-w)*Scorereid (3)
Step SS 35: will ScorefinalAnd sorting from large to small, and outputting the frame corresponding to the highest confidence coefficient.
In a preferred embodiment, w in step SS34 is 0.4.
As a preferred embodiment, the threshold linking step specifically includes:
step SS 41: judgment of ScorefinalWhether or not it is greater than the threshold value ScorethresholdIf yes, the module is not entered, and the module is exited; such asIf the threshold value is less than the threshold value, the step SS42 is entered;
step SS 42: readjusting the size of the search area according to the formula (4), and then performing crop processing when the next frame is used;
the re-search area is:
x_sz=1.5*z_sz*271/127 (4)。
the invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the method being implemented when the processor executes the program.
The invention also proposes a storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method.
The invention achieves the following beneficial effects: 1: high-precision, stable and real-time tracking can be realized for any framed target with the size exceeding 10 pixel points, a tracking model with better performance is obtained by using a richer training data set, and tracking can be completed even if the target is subjected to distance deformation and illumination influence; 2: when the target passes through the complex interference of the same attribute class, the patent can still realize stable tracking and does not lose the tracking target, 3: the mutual fusion of the feature re-comparison and the deep learning tracking algorithm greatly reduces the probability of tracking failure, and the tracking reliability is further improved by re-feature comparison of the tracking candidate frame; 4: when the target is lost on the picture for a period of time and reappears, the method can still realize accurate identification and retracing of the target.
Drawings
FIG. 1 is a schematic diagram of a principle topology of a deep learning-based single-target tracking system in a complex scenario of the present invention;
FIG. 2 is a flowchart of a single-target tracking method based on deep learning in a complex scene.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Example 1: the invention provides a single-target tracking system under a complex scene, which comprises:
a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
a tracking prescreening module for performing: acquiring a template area and a search area of the preprocessing module, wherein the tracking module adopts the latest single-target tracking algorithm SimCar based on deep learning at present, screens out the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and then transmits the first 10 candidate tracking frames into a feature comparison module;
a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
a threshold linkage module to perform: adjusting the size of the search area of the candidate template in the preprocessing module according to the confidence coefficient of the output candidate frame; in order to achieve the ability for the disappearance of the target to reappear while still being recognizable and trackable.
The single-target tracking system under the complex scene can greatly improve the tracking precision and robustness of the target under various complex conditions, consumes less time and can completely achieve the real-time effect.
Example 2: the invention also provides a single-target tracking method in a complex scene, which comprises the following steps:
the pretreatment step specifically comprises the following steps: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
tracking and primary screening, specifically comprising: obtaining a template area and a search area of the preprocessing step, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS algorithm, and then transmitting the first 10 candidate tracking frames to a feature comparison step;
the characteristic comparison step specifically comprises the following steps: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
the threshold value linkage step specifically comprises the following steps: and adjusting the sizes of the candidate template region and the search region in the preprocessing step according to the confidence degree of the output candidate frame.
Preferably, the pretreatment step specifically comprises:
step SS 11: detecting the video stream by using a target detection algorithm (both a traditional detection algorithm and a deep learning algorithm), or manually selecting a current Frame of the video streaminitTo obtain the frame B of the tracked targetinitThe template area is cut to obtain an object O _ crop;
step SS 12: RGB three channels (R) for obtaining images in video streammean,Gmean,Bmean) The RGB three-channel mean value is: RGB (Red, Green, blue) color filtermean=(Rmean+Gmean+Bmean)/3;
Step SS 13: respectively calculating the sizes of the template area and the search area according to a formula (1) and a formula (2); with a frame BinitThe center of the template picture is taken as a central point, a square with the length and the width both being Z _ sz is cut out from an original picture, then the size of the square is adjusted to (127 ) through an interpolation algorithm to obtain a template picture Z _ crop, similarly, the square with the length and the width both being X _ sz is cut out from a current frame according to the center of a frame transmitted by a previous frame tracking frame, and then the square is adjusted to (271 ) through the interpolation algorithm to obtain a search area X _ crop; once clipping exceeds the image boundary, the mean RGB in step SS12 is usedmeanFilling pixels to ensure that the area obtained by cutting is positioned in the image;
Figure RE-GDA0003454206420000081
x_sz=z_sz*271/127 (2)
wherein, x is the horizontal coordinate of the center point of the initial frame, y is the vertical coordinate of the center of the initial frame, w is the width of the initial frame, and h is the height of the initial frame.
Preferably, the tracking prescreening step specifically includes:
step SS 21: collecting data sets required by single target tracking, wherein the data sets comprise nine data sets including COCO, GOT10K, VOT2020, LASOT, TrackingNet, VID, DET, YOUTUBEBB and UAV123 for training a neural network; the data set contains richer data, so that the robustness of the target is further improved;
step SS 22: training to obtain a single-target tracking model based on deep learning according to a single-target tracking algorithm siamcar;
step SS 23: respectively transmitting the template region and the search region into a single-target tracking model in the step SS22 to obtain a series of candidate frames and corresponding confidence degrees, then sequencing the candidate frames from large to small according to the confidence degrees through an NMS algorithm, selecting the first 10 frames to obtain a candidate set of tracking frames { (B)1,Scoretracking1),(B2,Scoretracking2)...(B10,Scoretracking10) }; wherein B represents the coordinates of the frame, ScoretrackingRepresenting the corresponding confidence score;
step SS 24: and (4) cutting 10 frames in the corresponding step SS23 in the video frame to obtain candidate crop areas.
Preferably, the threshold of the NMS algorithm in step SS23 is 0.4, and experiments show that the value has a good effect on classes such as pedestrians.
Preferably, the feature comparison step specifically includes:
step SS 31: according to the table 1, a re-recognition data set of pedestrians is collected and trained to obtain a deep learning metric learning network, the deep learning metric learning network measures cosine distances between learning objects, and then the clusters to which the deep learning metric learning network belongs are found according to the nearest distance;
step SS 32: adjusting the size of each of the O _ crop obtained in the step SS11 in the preprocessing step and the 10 candidate regions obtained in the step SS24 in the tracking preliminary screening step to 64 (width) × 128 (height);
step SS 33: transmitting the regions obtained by cutting in the step SS32 to the deep learning metric learning network in the step SS31 to obtain respective 128-dimensional Feature vectors FeatureinitAnd Featurecandidate={Featurecandidate1,Featurecandidate2...,Featurecandidate10Are calculated, and then respective cosine similarity scores, Score, are calculated, respectivelyreid={Scorereid1,Scorereid2...,Scorereid10};
Step SS 34: and (4) fusing the tracking Score with the Score of the candidate frame according to the formula (3) to obtain a final tracking Scorefinal
Scorefinal=w*Scoretracking+(1-w)*Scorereid (3)
Step SS 35: will ScorefinalAnd sorting from large to small, and outputting the frame corresponding to the highest confidence coefficient.
Table one: and learning a network architecture based on deep learning depth cosine measurement.
Figure RE-GDA0003454206420000091
Figure RE-GDA0003454206420000101
Preferably, w in the step SS34 is 0.4, and experiments prove that when w is 0.4, the experimental effect is the best.
Preferably, the threshold value linkage step specifically includes: setting a threshold according to the output confidence, and starting a linkage mechanism to react on a preprocessing module when the score is smaller than the threshold to enlarge a search area so as to further find a target which temporarily disappears;
step SS 41: judgment of ScorefinalWhether or not it is greater than the threshold value ScorethresholdIf yes, the module is not entered, and the module is exited; if less than the threshold, go to step SS 42;
step SS 42: readjusting the size of the search area according to the formula (4), and then performing crop processing when the next frame is used;
the re-search area is:
x_sz=1.5*z_sz*271/127 (4)。
the invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the method being implemented when the processor executes the program.
The invention also proposes a storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. Single target tracking system under complicated scene, its characterized in that includes:
a pre-processing module to perform: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
a tracking prescreening module for performing: acquiring a template region and a search region of the preprocessing module, transmitting the template region and the search region into a single-target tracking algorithm for deep learning, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and then transmitting the first 10 candidate tracking frames into a feature comparison module;
a feature comparison module to perform: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
a threshold linkage module to perform: and adjusting the size of the search area of the candidate template in the preprocessing module according to the confidence coefficient of the output candidate frame.
2. The single-target tracking method under the complex scene is characterized by comprising the following steps:
the pretreatment step specifically comprises the following steps: processing an initial target frame and a subsequent video frame according to an incoming violation template frame to obtain a template area and a search area; transmitting the target template area into a re-recognition network to obtain the initial characteristics of the target;
tracking and primary screening, specifically comprising: acquiring a template region and a search region of the preprocessing step, transmitting the template region and the search region into a deep learning single-target tracking algorithm, screening the first 10 candidate tracking frames with high confidence coefficient through an NMS (network management system) algorithm, and transmitting the first 10 candidate tracking frames into a feature comparison step;
the characteristic comparison step specifically comprises the following steps: respectively comparing the cosine similarity of the first 10 candidate tracking frames with the initial features of the target, and selecting the optimal target tracking frame according to the similarity;
the threshold value linkage step specifically comprises the following steps: and adjusting the size of the search area of the candidate template in the preprocessing step according to the confidence degree of the output candidate frame.
3. The method for tracking the single target under the complex scene according to claim 2, wherein the preprocessing step specifically comprises:
step SS 11: detecting the video stream by using a target detection algorithm, or manually selecting a current Frame of the video streaminitTo obtain the frame B of the tracked targetinitThe template area is cut to obtain an object O _ crop;
step SS 12: RGB three channels (R) for obtaining images in video streammean,Gmean,Bmean) The RGB three-channel mean value is: RGB (Red, Green, blue) color filtermean=(Rmean+Gmean+Bmean)/3;
Step SS 13: respectively calculating the sizes of the template area and the search area according to a formula (1) and a formula (2); with a frame BinitThe center of the template picture is taken as a central point, a square with the length and the width both being Z _ sz is cut out from an original picture, then the size of the square is adjusted to (127 ) through an interpolation algorithm to obtain a template picture Z _ crop, similarly, the square with the length and the width both being X _ sz is cut out from a current frame according to the center of a frame transmitted by a previous frame tracking frame, and then the square is adjusted to (271 ) through the interpolation algorithm to obtain a search area X _ crop; once clipping exceeds the image boundary, the mean RGB in step SS12 is usedmeanFilling pixels to ensure that the area obtained by cutting is positioned in the image;
Figure RE-FDA0003454206410000021
x_sz=z_sz*271/127 (2)
wherein, x is the horizontal coordinate of the center point of the initial frame, y is the vertical coordinate of the center of the initial frame, w is the width of the initial frame, and h is the height of the initial frame.
4. The single-target tracking method under the complex scene according to claim 2, wherein the tracking preliminary screening step specifically comprises:
step SS 21: collecting data sets required by single target tracking, wherein the data sets comprise nine data sets including COCO, GOT10K, VOT2020, LASOT, TrackingNet, VID, DET, YOUTUBEBB and UAV123 for training a neural network;
step SS 22: training to obtain a single-target tracking model based on deep learning according to a single-target tracking algorithm siamcar;
step SS 23: respectively transmitting the template region and the search region into a single-target tracking model in the step SS22 to obtain a series of candidate frames and corresponding confidence degrees, then sequencing the candidate frames from large to small according to the confidence degrees through an NMS algorithm, selecting the first 10 frames to obtain a candidate set of tracking frames { (B)1,Scoretracking1),(B2,Scoretracking2)...(B10,Scoretracking10) }; wherein B represents the coordinates of the frame, ScoretrackingRepresenting the corresponding confidence score;
step SS 24: and cutting the corresponding 10 borders in the step SS23 in the video frame in the original image to obtain candidate crop areas.
5. The single-target tracking method under the complex scene according to claim 4, wherein the threshold of the NMS algorithm in the step SS23 is 0.4.
6. The method for tracking a single target under a complex scene according to claim 2, wherein the feature comparison step specifically comprises:
step SS 31: collecting a re-recognition data set of the pedestrian for training to obtain a deep learning metric learning network, wherein the deep learning metric learning network measures the cosine distance between learning objects, and then finds the cluster to which the deep learning metric learning network belongs according to the nearest distance;
step SS 32: adjusting the size of each of the O _ crop obtained in the step SS11 in the preprocessing step and the 10 candidate regions obtained in the step SS24 in the tracking preliminary screening step to 64 (width) × 128 (height);
step SS 33: transmitting the regions obtained by cutting in the step SS32 to the deep learning metric learning network in the step SS31 to obtain respective 128-dimensional Feature vectors FeatureinitAnd Featurecandidate={Featurecandidate1,Featurecandidate2...,Featurecandidate10Are calculated, and then respective cosine similarity scores, Score, are calculated, respectivelyreid={Scorereid1,Scorereid2...,Scorereid10};
Step SS 34: and (4) fusing the tracking Score with the Score of the candidate frame according to the formula (3) to obtain a final tracking Scorefinal
Scorefinal=w*Scoretracking+(1-w)*Scorereid
(3)
Step SS 35: will ScorefinalAnd sorting from large to small, and outputting the frame corresponding to the highest confidence coefficient.
7. The method for tracking the single target under the complex scene as recited in claim 6, wherein w in the step SS34 is selected to be 0.4.
8. The single-target tracking method under the complex scene according to claim 2, wherein the threshold value linkage step specifically comprises:
step SS 41: judgment of ScorefinalWhether or not it is greater than the threshold value ScorethresholdIf yes, the module is not entered, and the module is exited; if less than the threshold, go to step SS 42;
step SS 42: readjusting the size of the search area according to the formula (4), and then performing crop processing when the next frame is used;
the re-search area is: x _ sz ═ 1.5 ═ z _ sz ═ 271/127 (4).
9. Electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 2 to 8 are implemented when the processor executes the program.
10. Storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 2 to 8.
CN202110742736.6A 2021-06-30 2021-06-30 Single-target tracking system and method in complex scene, electronic device and storage medium Pending CN114140494A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110742736.6A CN114140494A (en) 2021-06-30 2021-06-30 Single-target tracking system and method in complex scene, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110742736.6A CN114140494A (en) 2021-06-30 2021-06-30 Single-target tracking system and method in complex scene, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN114140494A true CN114140494A (en) 2022-03-04

Family

ID=80394204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110742736.6A Pending CN114140494A (en) 2021-06-30 2021-06-30 Single-target tracking system and method in complex scene, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114140494A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694184A (en) * 2022-05-27 2022-07-01 电子科技大学 Pedestrian re-identification method and system based on multi-template feature updating

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084829A (en) * 2019-03-12 2019-08-02 上海阅面网络科技有限公司 Method for tracking target, device, electronic equipment and computer readable storage medium
CN110555870A (en) * 2019-09-09 2019-12-10 北京理工大学 DCF tracking confidence evaluation and classifier updating method based on neural network
CN110647836A (en) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN110853076A (en) * 2019-11-08 2020-02-28 重庆市亿飞智联科技有限公司 Target tracking method, device, equipment and storage medium
TW202026940A (en) * 2019-01-09 2020-07-16 圓展科技股份有限公司 Target tracking method
CN112001946A (en) * 2020-07-14 2020-11-27 浙江大华技术股份有限公司 Target object tracking method, computer equipment and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW202026940A (en) * 2019-01-09 2020-07-16 圓展科技股份有限公司 Target tracking method
CN110084829A (en) * 2019-03-12 2019-08-02 上海阅面网络科技有限公司 Method for tracking target, device, electronic equipment and computer readable storage medium
CN110555870A (en) * 2019-09-09 2019-12-10 北京理工大学 DCF tracking confidence evaluation and classifier updating method based on neural network
CN110647836A (en) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN110853076A (en) * 2019-11-08 2020-02-28 重庆市亿飞智联科技有限公司 Target tracking method, device, equipment and storage medium
CN112001946A (en) * 2020-07-14 2020-11-27 浙江大华技术股份有限公司 Target object tracking method, computer equipment and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114694184A (en) * 2022-05-27 2022-07-01 电子科技大学 Pedestrian re-identification method and system based on multi-template feature updating
CN114694184B (en) * 2022-05-27 2022-10-14 电子科技大学 Pedestrian re-identification method and system based on multi-template feature updating

Similar Documents

Publication Publication Date Title
CN111462200B (en) Cross-video pedestrian positioning and tracking method, system and equipment
CN104598883B (en) Target knows method for distinguishing again in a kind of multiple-camera monitoring network
CN112215155A (en) Face tracking method and system based on multi-feature fusion
CN110189375B (en) Image target identification method based on monocular vision measurement
CN108197604A (en) Fast face positioning and tracing method based on embedded device
CN113223045B (en) Vision and IMU sensor fusion positioning system based on dynamic object semantic segmentation
CN108564598B (en) Improved online Boosting target tracking method
CN105160649A (en) Multi-target tracking method and system based on kernel function unsupervised clustering
CN105718882A (en) Resolution adaptive feature extracting and fusing for pedestrian re-identification method
CN110570453A (en) Visual odometer method based on binocular vision and closed-loop tracking characteristics
CN110443279B (en) Unmanned aerial vehicle image vehicle detection method based on lightweight neural network
CN111160212A (en) Improved tracking learning detection system and method based on YOLOv3-Tiny
CN106599918B (en) vehicle tracking method and system
CN115841649A (en) Multi-scale people counting method for urban complex scene
CN114708300A (en) Anti-blocking self-adaptive target tracking method and system
CN114926859A (en) Pedestrian multi-target tracking method in dense scene combined with head tracking
CN116109950A (en) Low-airspace anti-unmanned aerial vehicle visual detection, identification and tracking method
CN114140494A (en) Single-target tracking system and method in complex scene, electronic device and storage medium
Yu et al. A unified transformer based tracker for anti-uav tracking
CN113781523A (en) Football detection tracking method and device, electronic equipment and storage medium
CN113096016A (en) Low-altitude aerial image splicing method and system
CN115731287B (en) Moving target retrieval method based on aggregation and topological space
CN107730535B (en) Visible light infrared cascade video tracking method
CN115880643A (en) Social distance monitoring method and device based on target detection algorithm
CN114283199B (en) Dynamic scene-oriented dotted line fusion semantic SLAM method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination