WO2022193456A1 - 目标跟踪方法及装置、电子设备和存储介质 - Google Patents

目标跟踪方法及装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2022193456A1
WO2022193456A1 PCT/CN2021/100558 CN2021100558W WO2022193456A1 WO 2022193456 A1 WO2022193456 A1 WO 2022193456A1 CN 2021100558 W CN2021100558 W CN 2021100558W WO 2022193456 A1 WO2022193456 A1 WO 2022193456A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
tracking
target object
current image
tracking result
Prior art date
Application number
PCT/CN2021/100558
Other languages
English (en)
French (fr)
Inventor
周靖皓
乔磊
李搏
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Priority to US17/880,592 priority Critical patent/US20220383517A1/en
Publication of WO2022193456A1 publication Critical patent/WO2022193456A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular, to a target tracking method and device, an electronic device and a storage medium.
  • target tracking based on image processing technology plays an increasingly important role in the fields of intelligent monitoring, automatic driving and image annotation, so target tracking is also facing higher requirements.
  • target tracking an initial frame is usually given in a certain frame (such as the first frame) of a video frame sequence to specify the target object to be tracked, and the specified target object is tracked thereafter. Due to some interference problems such as occlusion, illumination changes, and scale changes, target tracking has always been a big challenge.
  • the present disclosure provides a technical solution for target tracking.
  • a target tracking method comprising:
  • the obtaining the first tracking parameter from the template image of the target object includes:
  • a first image feature of the template image is extracted as a first tracking parameter.
  • the tracking of the target object on the current image based on the first tracking parameter to obtain the first predicted tracking result of the current image includes:
  • a first predicted tracking result of the current image is determined based on the first tracking parameter and the second image feature.
  • the extracting the first image feature of the template image, as the first tracking parameter includes: performing feature extraction on the template image through at least two layers of different depths of the first preset network, to obtain at least two layers of the template image. Two-level first image features, and the at least two-level first image features are used as first tracking parameters;
  • the extracting the second image feature of the current image includes: performing feature extraction on the current image through at least two layers of different depths to obtain at least two-level second image features of the current image;
  • the determining the first predicted tracking result of the current image based on the first tracking parameter and the second image feature includes: for the at least two-level first image feature and the at least two-level second image For any level of the features, based on the first image feature and the second image feature of the level, determine the intermediate prediction result of the level; according to the corresponding at least two levels of the first image feature and the at least two levels of the second image feature At least two intermediate prediction results of the current image are fused to obtain the first prediction tracking result of the current image.
  • the determining of the second tracking parameter based on the template image and the historical image of the target object includes:
  • An updated second tracking parameter is obtained based on the initial second tracking parameter and the fourth image feature of the historical image.
  • the determining the initial second tracking parameter based on the third image feature includes: initializing an online module of the second preset network based on the third image feature, to obtain the the initial second tracking parameter; the obtaining the updated second tracking parameter based on the initial second tracking parameter and the fourth image feature of the historical image, comprising: combining the initial second tracking parameter and the historical The fourth image feature of the image is input to the online module, and the updated second tracking parameter is obtained via the online module.
  • the historical image is an image area pre-cut from historical video frames, and the probability that the historical image belongs to the target object is greater than or equal to a first threshold.
  • the obtaining the third image feature of the template image includes:
  • a weighted sum of the at least two-level first image features is determined to obtain a third image feature of the template image.
  • the tracking of the target object on the current image based on the second tracking parameter to obtain a second predicted tracking result of the current image includes:
  • a second predicted tracking result for the current image is determined based on the second tracking parameter and the fifth image feature.
  • the obtaining the fifth image feature of the current image includes:
  • a weighted sum of the at least two-level second image features is determined to obtain a fifth image feature of the current image.
  • the obtaining the tracking result of the target object in the current image based on the first predicted tracking result and the second predicted tracking result includes:
  • the third weight and the fourth weight determine the weighted sum of the first prediction tracking result and the second prediction tracking result, and obtain the third prediction tracking result of the current image
  • the tracking result of the target object in the current image is determined.
  • the determining the tracking result of the target object in the current image according to the third predicted tracking result includes:
  • the third prediction tracking result determine the first bounding box with the highest probability of belonging to the target object in the current image
  • the third prediction tracking result determining a second bounding box in the current image that has an overlapping area with the first bounding box
  • a detection frame of the target object in the current image is determined according to the first bounding box and the second bounding box.
  • the determining the detection frame of the target object in the current image according to the first bounding box and the second bounding box includes:
  • a weighted sum of the first bounding box and the second bounding box is determined to obtain a detection frame of the target object in the current image.
  • a target tracking device comprising:
  • an obtaining module for obtaining the first tracking parameter from the template image of the target object
  • a first target tracking module configured to track a target object on the current image based on the first tracking parameter to obtain a first predicted tracking result of the current image
  • a determining module for determining a second tracking parameter based on the template image and a historical image of the target object, wherein the historical image represents an image that precedes the current image and contains the target object;
  • a second target tracking module configured to track a target object on the current image based on the second tracking parameter, to obtain a second predicted tracking result of the current image
  • a fusion module configured to obtain a tracking result of the target object in the current image based on the first predicted tracking result and the second predicted tracking result.
  • the obtaining module is used for:
  • a first image feature of the template image is extracted as a first tracking parameter.
  • the first target tracking module is used for:
  • a first predicted tracking result of the current image is determined based on the first tracking parameter and the second image feature.
  • the obtaining module is configured to: perform feature extraction on the template image through at least two layers of different depths of the first preset network, obtain at least two-level first image features of the template image, and extract the at least two-level first image features.
  • the first image feature is used as the first tracking parameter;
  • the first target tracking module is configured to: perform feature extraction on the current image through at least two layers of different depths to obtain at least two-level second image features of the current image; for the at least two-level first image features Any one of the image features and the at least two levels of the second image features, based on the first image features and the second image features of the level, determine the intermediate prediction result of the level; according to the at least two levels of the first image features At least two intermediate prediction results corresponding to the at least two-level second image features are fused to obtain the first prediction tracking result of the current image.
  • the determining module is used for:
  • An updated second tracking parameter is obtained based on the initial second tracking parameter and the fourth image feature of the historical image.
  • the determining module is used for:
  • the initial second tracking parameter and the fourth image feature of the historical image are input into the online module, and the updated second tracking parameter is obtained via the online module.
  • the historical image is an image area pre-cut from historical video frames, and the probability that the historical image belongs to the target object is greater than or equal to a first threshold.
  • the determining module is used for:
  • a weighted sum of the at least two-level first image features is determined to obtain a third image feature of the template image.
  • the second target tracking module is used for:
  • a second predicted tracking result for the current image is determined based on the second tracking parameter and the fifth image feature.
  • the second target tracking module is used for:
  • a weighted sum of the at least two-level second image features is determined to obtain a fifth image feature of the current image.
  • the fusion module is used for:
  • the third weight and the fourth weight determine the weighted sum of the first prediction tracking result and the second prediction tracking result, and obtain the third prediction tracking result of the current image
  • the tracking result of the target object in the current image is determined.
  • the fusion module is used for:
  • the third prediction tracking result determine the first bounding box with the highest probability of belonging to the target object in the current image
  • the third prediction tracking result determining a second bounding box in the current image that has an overlapping area with the first bounding box
  • a detection frame of the target object in the current image is determined according to the first bounding box and the second bounding box.
  • the fusion module is used for:
  • a weighted sum of the first bounding box and the second bounding box is determined to obtain a detection frame of the target object in the current image.
  • an electronic device comprising: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke the memory storage executable instructions to perform the above method.
  • a computer-readable storage medium having computer program instructions stored thereon, the computer program instructions implementing the above method when executed by a processor.
  • a computer program product comprising computer-readable code, or a non-volatile computer-readable storage medium carrying the computer-readable code, when the computer-readable code is stored in an electronic device When running, the processor in the electronic device executes the above method.
  • the first tracking parameter is obtained from the template image of the target object, and the target object is tracked on the current image based on the first tracking parameter, so as to obtain the first predicted tracking result of the current image.
  • a first predictive tracking result with high accuracy can be obtained; by determining the second tracking parameter based on the template image and the historical image of the target object, and based on the second tracking parameter, the target object is performed on the current image based on the second tracking parameter.
  • the ability to discriminate similar objects can be improved during the tracking process, so that the success rate of tracking the target objects can be improved when interference from similar objects is encountered.
  • FIG. 1 shows a flowchart of a target tracking method provided by an embodiment of the present disclosure.
  • FIG. 2 shows a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • FIG. 3 shows a block diagram of a target tracking apparatus provided by an embodiment of the present disclosure.
  • FIG. 4 shows a block diagram of an electronic device 800 provided by an embodiment of the present disclosure.
  • FIG. 5 shows a block diagram of an electronic device 1900 provided by an embodiment of the present disclosure.
  • the target tracking method usually completes the tracking and positioning of subsequent frames based on the template image of the first frame.
  • This method has a weak ability to discriminate similar objects in the tracking process, and is prone to tracking failure when encountering interference from similar objects.
  • the embodiments of the present disclosure provide a target tracking method and apparatus, electronic device and storage medium, by obtaining a first tracking parameter from a template image of the target object, and tracking the target object on the current image based on the first tracking parameter, Obtain the first predictive tracking result of the current image, thereby obtaining a first predictive tracking result with high accuracy; by determining the second tracking parameter based on the template image and the historical image of the target object, based on the The second tracking parameter is used to track the target object on the current image, and the second prediction tracking result of the current image is obtained, so that the second prediction tracking result with higher robustness can be obtained in combination with the information of the historical image of the target object.
  • the tracking result of the target object in the current image By obtaining the tracking result of the target object in the current image based on the first predictive tracking result and the second predictive tracking result, the tracking result with both accuracy and robustness can be obtained. .
  • the ability to discriminate similar objects can be improved during the tracking process, so that the success rate of tracking the target objects can be improved when interference from similar objects is encountered.
  • FIG. 1 shows a flowchart of a target tracking method provided by an embodiment of the present disclosure.
  • the target tracking method may be executed by a terminal device or a server or other processing device.
  • the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, or a wearable devices, etc.
  • the target tracking method may be implemented by a processor invoking computer-readable instructions stored in a memory. As shown in FIG. 1 , the target tracking method includes steps S11 to S15.
  • step S11 the first tracking parameters are obtained from the template image of the target object.
  • step S12 the target object is tracked on the current image based on the first tracking parameter to obtain a first predicted tracking result of the current image.
  • a second tracking parameter is determined based on the template image and a historical image of the target object, wherein the historical image represents an image that precedes the current image and contains the target object.
  • step S14 the target object is tracked on the current image based on the second tracking parameter to obtain a second predicted tracking result of the current image.
  • step S15 based on the first predicted tracking result and the second predicted tracking result, the tracking result of the target object in the current image is obtained.
  • the target object may represent an object that needs to be tracked.
  • the target tracking method provided by the embodiment of the present disclosure may be respectively executed for each target object.
  • the type of the target object can be a person, an object, or an animal.
  • the template image of the target object may be an image containing the target object.
  • the template image of the target object may be an image of a specified area in a certain frame (eg, the first frame) of the target video, or may not be an image in the target video.
  • the image in the designated area selected by the user in the first frame of the target video may be used as the template image of the target object.
  • an image in a designated area framed by the user in other videos may be used as the template image of the target object.
  • the image uploaded or selected by the user may be used as the template image of the target object.
  • the first tracking parameter may represent a tracking parameter obtained from a template image.
  • information can be extracted from the template image to obtain the first tracking parameter.
  • the first tracking parameter may contain information of the template image.
  • the first tracking parameter may include at least one of feature information, color information, texture information, etc. of the template image.
  • the obtaining the first tracking parameter from the template image of the target object includes: extracting a first image feature of the template image as the first tracking parameter.
  • the first image feature represents an image feature of the template image.
  • the first image feature may be one level or at least two levels, and the first tracking parameter may include one level or at least two levels of first image features.
  • the first predicted tracking result may represent a tracking result predicted in the current image according to the first tracking parameter.
  • the probability that each pixel in the current image belongs to the target object may be represented by a probability value or a heat value.
  • performing the tracking of the target object on the current image based on the first tracking parameter to obtain the first predicted tracking result of the current image includes: extracting a second image of the current image feature; determining a first predicted tracking result of the current image based on the first tracking parameter and the second image feature.
  • the second image feature represents an image feature of the current image.
  • the second image feature may be one level or at least two levels.
  • the extracting the first image feature of the template image as the first tracking parameter includes: performing feature extraction on the template image through at least two layers of different depths of the first preset network , obtain at least two-level first image features of the template image, and use the at least two-level first image features as first tracking parameters;
  • the extracting the second image features of the current image includes: At least two layers of different depths perform feature extraction on the current image to obtain at least two levels of second image features of the current image; determining the current image based on the first tracking parameter and the second image feature
  • the first predictive tracking result of the image including: for any one of the at least two levels of the first image feature and the at least two levels of the second image feature, based on the first image feature and the second image feature of the level, Determine the intermediate prediction result of this level; and obtain the first prediction tracking result of the current image by fusion according to at least two intermediate prediction results corresponding to the at least two-level first image features and the at least two-level second image features.
  • the first preset network may be a Siamese network, such as SiamRPN++.
  • SiamRPN++ performs classification and positioning based on RPN (Region Proposal Network), which is conducive to obtaining more accurate positioning coordinates.
  • the first image feature may include 3 levels, which are the image features of the template images output by blocks 2 (block2), 3 (block3) and 4 (block4) of SiamRPN++ respectively;
  • the second image feature may include 3 levels, respectively is the image feature of the current image output by block 2, block 3 and block 4 of SiamRPN++.
  • the at least two-level first image features include first-level first image features, second-level first image features, and third-level first image features
  • the at least two-level second image features include first-level second image features, The second-level second image feature and the third-level second image feature
  • the first-level first image feature and the first-level second image feature can be convolved through the depth separable correlation layer to obtain the middle corresponding to the first-level The prediction result
  • the second-level first image feature and the second-level second image feature can be convolved through the depth-separable correlation layer to obtain the intermediate prediction result corresponding to the second-level
  • the third-level The first image feature of the first level and the second image feature of the third level are convolved to obtain the intermediate prediction result corresponding to the third level
  • the intermediate prediction result corresponding to the first level, the intermediate prediction result corresponding to the second level and the third level can be obtained
  • the corresponding intermediate prediction results are fused to obtain the first prediction tracking result of the current image.
  • the second-level first image features and the second-level second image features may be interpolated to the first-level
  • the size of the first image feature and the first-level second image feature is the same; before the third-level first image feature and the third-level second image feature are convolved, the third-level first image feature and the third-level second image feature can be convolved.
  • the level 2 image feature is interpolated to the same size as the level 1 first image feature and the level 1 second image feature.
  • the output of block 3 and block 4 of SiamRPN++ can be interpolated, so that the size of the feature map obtained by interpolation is the same as the size of the feature map output by block 2, so that the receptive field of the first preset network can be improved, so that the The accuracy of target tracking performed by the first preset network is further improved.
  • the first predicted tracking result is determined by utilizing at least two levels of first image features of the template image and at least two levels of second image features of the current image, and for the at least two levels of first image features and the at least two levels of first image features In any one of the two-level second image features, based on the first image feature and the second image feature of the level, determine the intermediate prediction result of the level, and according to the at least two-level first image feature and the at least two At least two intermediate prediction results corresponding to the second image features of the second level are fused to obtain the first prediction tracking result of the current image, so that the template image and the richer image information of the current image can be used, so that the current image can be quickly While efficiently extracting the potential area of the target object, it initially filters out interference information, reduces redundant calculations, and can compare the first image features and the second image features at the same level, thereby improving the first prediction and tracking results. accuracy.
  • Equation 1 can be used to determine the first predicted tracking result of the current image
  • z represents the template image
  • xi represents the current image.
  • Represents the output of the lth block of the first preset network. represents the first image feature of the template image z output by the lth block of the first preset network after the template image z is input into the first preset network; represents the second image feature of the current image xi output by the lth block of the first preset network after the current image xi is input into the first preset network.
  • the correlation between the first image feature and the second image feature of the same level can be used as the intermediate prediction result of this level.
  • ⁇ l means The corresponding weights, where ⁇ l can be trained simultaneously with other parameters in the first preset network.
  • At least two levels of first image features may be fused to obtain a first fusion feature; at least two levels of second image features may be fused to obtain a second fusion feature; according to the first fusion feature and The second fusion feature obtains the first predicted tracking result of the current image.
  • the first image feature of the template image and the second image feature of the current image may be respectively the first level, that is, the first level first image feature of the template image and the first level first image feature of the current image may be The second image feature determines the first predicted tracking result of the current image.
  • the second tracking parameter may represent a tracking parameter determined based on the template image and the historical image.
  • the second tracking parameter may be determined based on the information of the template image and the historical image. That is, the second tracking parameter may contain both template image and historical image information.
  • the second tracking parameter may be determined based on the template image and the historical images in the support set. Wherein, in the process of target tracking, the historical images in the support set may be updated, and correspondingly, the second tracking parameter may be updated in response to the update of the historical images in the support set.
  • the second tracking parameter is determined based on the template image and the historical image, and the target object is tracked on the current image based on the second tracking parameter, so that the anti-interference ability of similar objects can be improved, thereby obtaining Robust second prediction tracking results.
  • the second predicted tracking result may represent the tracking result predicted in the current image according to the second tracking parameter.
  • a probability value or a heat value or the like may be used to represent the probability that each pixel in the current image belongs to the target object.
  • the determining the second tracking parameter based on the template image and the historical image of the target object includes: obtaining a third image feature of the template image; based on the third image feature, determine the initial second tracking parameter; and obtain the updated second tracking parameter based on the initial second tracking parameter and the fourth image feature of the historical image.
  • the third image feature is an image feature of the template image.
  • at least two-level first image features of the template image may be fused to obtain the third image feature of the template image.
  • the third image feature of the template image may be the same as the first image feature of the template image.
  • the second tracking parameter may be determined based on the template image and the respective historical images in the support set.
  • the support set can be updated in the process of target tracking. For example, if the probability that any image area in the current image belongs to the target object is greater than or equal to the first threshold, the image area in the current image may be added to the support set as a new historical image. In one example, the number of historical images in the support set is less than or equal to the second threshold.
  • the historical images added to the support set first may be deleted.
  • the template image is not included in the support set, that is, the second tracking parameter is determined not only based on the template image but also based on the information of the target object in other historical images other than the template image.
  • the initial value of the second tracking parameter may be the third image feature, and may be updated with the update of historical images.
  • an initial second tracking parameter is determined, and based on the initial second tracking parameter and the fourth tracking parameter of the historical image
  • the updated second tracking parameters can be obtained from the image features, so that in the process of target tracking, the second tracking parameters can be continuously updated along with the update of historical images, thereby enhancing the anti-interference ability of similar objects.
  • the determining the initial second tracking parameter based on the third image feature includes: based on the third image feature, initializing an online module of a second preset network to obtain the initial tracking parameter. the second tracking parameter; the obtaining the updated second tracking parameter based on the initial second tracking parameter and the fourth image feature of the historical image, including: combining the initial second tracking parameter and the historical image
  • the fourth image feature of is input to the online module, and the updated second tracking parameter is obtained via the online module.
  • the second tracking parameter may be updated through an online module of the second preset network.
  • the initial second tracking parameter ie, the third image feature
  • the fourth image feature of the historical image may be input into the online module to obtain the updated second tracking parameter.
  • the current second tracking parameter and the fourth image feature of each historical image in the current support set can be input into the online module to obtain the updated second tracking parameter. That is, the second tracking parameter can be updated in real-time in response to an update of historical images in the support set.
  • the initial second tracking parameter is obtained by initializing the online module of the second preset network based on the third image feature, and the initial second tracking parameter and the historical image are combined.
  • the fourth image feature is input to the online module, and the updated second tracking parameters are obtained through the online module, so that in the process of target tracking, along with the update of historical images, the online module of the second preset network can
  • the second tracking parameter is continuously updated, so that the anti-interference ability to similar objects can be enhanced.
  • the historical image is an image area pre-cut from a historical video frame, and the probability that the historical image belongs to the target object is greater than or equal to a first threshold.
  • the historical video may represent a video frame in the target video that precedes the current image.
  • the support set can be expressed as where M represents the number of historical images in the support set, x j represents the jth historical image in the support set, and y j represents the pseudo-label of x j .
  • the pseudo labels of the historical images in the support set can be determined according to the Gaussian distribution of the probability that each position in the historical images belongs to the target object.
  • the fourth image feature and the current second tracking parameter of each historical image in the support set can be input into the online module, and the predicted probability that each historical image belongs to the target object is output via the online module.
  • the loss function corresponding to the second tracking parameter can be obtained. Based on the loss function, the second tracking parameter can be updated by using the gradient descent method.
  • the internal parameters of the second preset network may not be updated, so that the computing efficiency can be improved.
  • the obtaining the third image feature of the template image includes: obtaining at least two-level first image features of the template image, and obtaining the at least two-level first image features one by one. at least two corresponding first weights; according to the at least two first weights, determine the weighted sum of the at least two-level first image features, and obtain the third image feature of the template image. Determining the second tracking parameter based on the third image feature determined in this example can further improve the robustness of tracking the target object on the current image.
  • the third image feature may also be determined according to an average value of at least two levels of the first image features.
  • performing the tracking of the target object on the current image based on the second tracking parameter to obtain the second predicted tracking result of the current image includes: obtaining the first tracking result of the current image. Five image features; determining a second predicted tracking result of the current image based on the second tracking parameter and the fifth image feature.
  • the fifth image feature is an image feature of the current image.
  • the fifth image feature and the second tracking parameter may be convolutionally increased in dimension through the dimension-enhancing correlation layer to obtain the second predicted tracking result.
  • the accuracy of the determined second predictive tracking result can be improved.
  • the obtaining the fifth image feature of the current image includes: obtaining at least two-level second image features of the current image, and obtaining the at least two-level second image features one by one with the at least two-level second image features at least two corresponding second weights; according to the at least two second weights, determine the weighted sum of the at least two-level second image features to obtain the fifth image feature of the current image. Based on the fifth image feature determined in this example, the robustness of the second predicted tracking result can be further improved.
  • Equation 2 can be used to determine the second prediction tracking result of the current image
  • ⁇ l represents corresponding weight. Indicates that the use of ⁇ l pair weighted, Represents the weighted sum of the three-level second image features extracted from the current image xi by three blocks of the first preset network (three network blocks of different depths).
  • Equation 3 can be used to determine the second tracking parameter
  • the support set includes M historical images, x j represents the jth historical image in the support set, Pseudo labels representing x j .
  • ⁇ l represents corresponding weight. Indicates that the use of ⁇ l pair weighted.
  • denotes the online module
  • denotes the internal parameters of the online module.
  • the second tracking parameter An update will occur.
  • the fourth image feature and the current second tracking parameter of the M historical images in the support set can be input into the online module ⁇ , and the predicted probability that each historical image belongs to the target object is output via the online module ⁇ . According to the predicted probability that each historical image belongs to the target object, and the pseudo-label of each historical image
  • the loss function corresponding to the second tracking parameter can be obtained. Based on the loss function, the second tracking parameter can be updated by using the gradient descent method to obtain the updated second tracking parameter.
  • the obtaining the tracking result of the target object in the current image based on the first predicted tracking result and the second predicted tracking result includes: obtaining the first tracking result The third weight corresponding to the prediction tracking result and the fourth weight corresponding to the second prediction tracking result; according to the third weight and the fourth weight, determine the difference between the first prediction tracking result and the second prediction tracking result.
  • the weighted sum is used to obtain the third prediction tracking result of the current image; and the tracking result of the target object in the current image is determined according to the third prediction tracking result.
  • the third weight and the fourth weight may be hyperparameters, respectively.
  • the sum of the third weight and the fourth weight may be equal to 1, the third weight may be greater than 0 and less than 1, and the fourth weight may be greater than 0 and less than 1. Of course, the sum of the third weight and the fourth weight may not be equal to 1.
  • the third predictive tracking result may be determined according to the weighted sum of the first predictive tracking result and the second predictive tracking result.
  • the third predictive tracking result of the current image is obtained, And according to the third predicted tracking result, the tracking result of the target object in the current image is determined, and the tracking result of the target object in the current image thus obtained can have both accuracy and robustness.
  • Equation 4 can be used to determine the tracking result of the target object in the current image
  • represents the fourth weight corresponding to the second prediction tracking result
  • 1- ⁇ represents the third weight corresponding to the first prediction tracking result
  • the determining the tracking result of the target object in the current image according to the third predicted tracking result includes: determining the current image according to the third predicted tracking result The first bounding box with the highest probability of belonging to the target object in the current image; according to the third prediction tracking result, determine the second bounding box in the current image that has an overlapping area with the first bounding box; according to the first bounding box The bounding box and the second bounding box determine the detection frame of the target object in the current image.
  • bounding box regression may be performed based on the third prediction tracking result to obtain multiple candidate boxes of the target object in the current image.
  • the candidate box with the highest probability of belonging to the target object may be used as the first bounding box, and the candidate box having an overlapping area with the first bounding box may be used as the second bounding box.
  • the number of the second bounding boxes may be one or more.
  • the first bounding box with the highest probability of belonging to the target object but also the second bounding box overlapping with the first bounding box is used.
  • the information of multiple candidate boxes gets more accurate detection boxes.
  • the determining the detection frame of the target object in the current image according to the first bounding box and the second bounding box includes: determining the second bounding box and the first boundary The intersection ratio of boxes; according to the intersection ratio, a fifth weight corresponding to the second bounding box is determined; based on the fifth weight, a weighted sum of the first bounding box and the second bounding box is determined , to obtain the detection frame of the target object in the current image.
  • the weight corresponding to the first bounding box may be 1, and the fifth weight corresponding to any second bounding box may be equal to the intersection ratio of the second bounding box and the first bounding box.
  • the weight corresponding to the first bounding box may be positively correlated with the probability that the first bounding box belongs to the target object; the fifth weight corresponding to any second bounding box may be the intersection of the second bounding box and the first bounding box. and is positively correlated with the probability that the second bounding box belongs to the target object.
  • the weighted sum of the first bounding box and each of the second bounding boxes may be determined; the sum of the weight corresponding to the first bounding box and the fifth weight corresponding to each of the second bounding boxes may be determined to obtain the weight sum; The ratio of the sum of the weights as the detection frame of the target object in the current image.
  • a fifth weight corresponding to the second bounding box is determined, and based on the first bounding box With five weights, the weighted sum of the first bounding box and the second bounding box is determined to obtain the detection frame of the target object in the current image, thereby improving the stability of the tracking result.
  • the fifth weights corresponding to the second bounding boxes may be the same.
  • the average value of each second bounding box may be calculated, and the average value and the average value of the first bounding box may be used as the detection frame of the target object in the current image.
  • the first bounding box can also be directly used as the detection frame of the target object.
  • the first tracking parameter is obtained from the template image of the target object, and the target object is tracked on the current image based on the first tracking parameter, so as to obtain the first predicted tracking result of the current image.
  • a first predictive tracking result with high accuracy can be obtained; by determining the second tracking parameter based on the template image and the historical image of the target object, and based on the second tracking parameter, the target object is performed on the current image based on the second tracking parameter.
  • the ability to discriminate similar objects can be improved during the tracking process, so that the success rate of tracking the target objects can be improved when interference from similar objects is encountered.
  • the target tracking method provided by the embodiments of the present disclosure can be applied to tracking tasks such as single target tracking or multi-target tracking.
  • FIG. 2 shows a schematic diagram of an application scenario provided by an embodiment of the present disclosure.
  • the application scenario provides a target tracker, where the target tracker includes a first preset network and a second preset network, and the second preset network includes an online module.
  • the input of the first preset network can be the template image z and the current image xi of the target object, and the output can be the first prediction tracking result
  • the input of the second preset network can be the third image feature of the template image z, the fourth image feature of each historical image x j in the support set, and the fifth image feature of the current image x i , and the output can be the second prediction tracking result Calculate the first prediction tracking result Tracking results with second forecast
  • the weighted sum of , the final tracking result of the target object in the current image xi can be obtained.
  • the first preset network and the second preset network are respectively introduced below.
  • the first preset network can use SiamRPN++.
  • the template image z is input into SiamRPN++, and the first-level first image features, second-level first image features and The third level first image feature.
  • the current image xi is input into SiamRPN++, and the first-level second image feature, the second-level second image feature, and the third-level second image feature of the current image x i can be output via block 2, block 3, and block 4 of SiamRPN++, respectively. .
  • the correlation between the first-level first image feature and the first-level second image feature can be calculated through the depthwise separable correlation layer (DW-C, depthwise correlation), and the intermediate prediction result corresponding to the first level can be obtained;
  • the layer calculates the correlation between the second-level first image feature and the second-level second image feature, and obtains the intermediate prediction result corresponding to the second-level;
  • the third-level first image feature and the third-level first image feature and the third-level first image feature are calculated through the depth separable correlation layer.
  • Two image features, the intermediate prediction results corresponding to the third level are obtained.
  • the first prediction tracking result can be obtained by calculating the weighted sum of the three-level intermediate prediction results.
  • the outputs of block 3 and block 4 of SiamRPN++ can also be interpolated, so that the size of the feature map obtained by interpolation is the same as the size of the feature map output by block 2, thereby improving the performance of the first preset network.
  • the receptive field can be further improved, thereby further improving the accuracy of tracking the target object by the first preset network.
  • a first prediction tracking result with higher accuracy can be obtained, that is, the accuracy of the position of the target object obtained through the first preset network regression is higher.
  • the online module of the second preset network can be used to update the second tracking parameter.
  • the initial value of the second tracking parameter may be the third image feature of the template image z.
  • the online module can input the third image feature of the template image z and the fourth image feature of the historical image in the support set to obtain the updated second tracking parameter.
  • the online module may input the current second tracking parameter and the fourth image feature of each historical image in the support set to obtain the updated second tracking parameter.
  • the second predicted tracking result can be obtained by calculating the correlation between the fifth image feature of the current image xi and the latest second tracking parameter through an up-channel correlation layer (UP-C, up-channel correlation).
  • UP-C up-channel correlation layer
  • a tracking result having both accuracy and robustness can be obtained.
  • the interfering objects and the target object can be accurately distinguished, thereby It can make the tracking result more accurate.
  • the target object may be blocked by pavilions, bridges, buildings, etc., when the target object appears again, by using the target tracking method provided by the embodiment of the present disclosure, it can be retrieved efficiently and accurately. target.
  • the target tracking method provided by the embodiment of the present disclosure can also be applied to automatic labeling, so that more accurate automatic labeling data can be improved.
  • the target tracking method provided by the embodiment of the present disclosure has high classification accuracy and regression accuracy, high stability, can be better adapted to long-term target tracking tasks, and has a relatively fast tracking speed. Real-time tracking.
  • the present disclosure also provides target tracking devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any target tracking method provided by the present disclosure.
  • target tracking devices electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any target tracking method provided by the present disclosure.
  • FIG. 3 shows a block diagram of a target tracking apparatus provided by an embodiment of the present disclosure.
  • the target tracking device includes:
  • a target tracking device comprising:
  • a first target tracking module 32 configured to track a target object on the current image based on the first tracking parameter, to obtain a first predicted tracking result of the current image
  • a determination module 33 configured to determine a second tracking parameter based on the template image and a historical image of the target object, wherein the historical image represents an image that precedes the current image and contains the target object;
  • a second target tracking module 34 configured to track a target object on the current image based on the second tracking parameter, to obtain a second predicted tracking result of the current image
  • the fusion module 35 is configured to obtain the tracking result of the target object in the current image based on the first predicted tracking result and the second predicted tracking result.
  • the obtaining module 31 is used for:
  • a first image feature of the template image is extracted as a first tracking parameter.
  • the first target tracking module 32 is used for:
  • a first predicted tracking result of the current image is determined based on the first tracking parameter and the second image feature.
  • the obtaining module 31 is configured to: perform feature extraction on the template image through at least two layers of different depths of the first preset network to obtain at least two-level first image features of the template image, and extract the at least two levels of the first image features of the template image. level first image feature as the first tracking parameter;
  • the first target tracking module 32 is configured to: perform feature extraction on the current image through at least two layers of different depths, to obtain at least two-level second image features of the current image; for the at least two-level second image features; Any one of an image feature and the at least two-level second image feature, based on the first image feature and the second image feature of the level, determine the intermediate prediction result of the level; according to the at least two-level first image feature The feature and at least two intermediate prediction results corresponding to the at least two-level second image feature are fused to obtain the first prediction tracking result of the current image.
  • the determining module 33 is used for:
  • An updated second tracking parameter is obtained based on the initial second tracking parameter and the fourth image feature of the historical image.
  • the determining module 33 is used for:
  • the initial second tracking parameter and the fourth image feature of the historical image are input into the online module, and the updated second tracking parameter is obtained via the online module.
  • the historical image is an image area pre-cut from historical video frames, and the probability that the historical image belongs to the target object is greater than or equal to a first threshold.
  • the determining module 33 is used for:
  • a weighted sum of the at least two-level first image features is determined to obtain a third image feature of the template image.
  • the second target tracking module 34 is used for:
  • a second predicted tracking result for the current image is determined based on the second tracking parameter and the fifth image feature.
  • the second target tracking module 34 is used for:
  • a weighted sum of the at least two-level second image features is determined to obtain a fifth image feature of the current image.
  • the fusion module 35 is used for:
  • the third weight and the fourth weight determine the weighted sum of the first prediction tracking result and the second prediction tracking result, and obtain the third prediction tracking result of the current image
  • the tracking result of the target object in the current image is determined.
  • the fusion module 35 is used for:
  • the third prediction tracking result determine the first bounding box with the highest probability of belonging to the target object in the current image
  • the third prediction tracking result determining a second bounding box in the current image that has an overlapping area with the first bounding box
  • a detection frame of the target object in the current image is determined according to the first bounding box and the second bounding box.
  • the fusion module 35 is used for:
  • a weighted sum of the first bounding box and the second bounding box is determined to obtain a detection frame of the target object in the current image.
  • the first tracking parameter is obtained from the template image of the target object, and the target object is tracked on the current image based on the first tracking parameter, so as to obtain the first predicted tracking result of the current image.
  • a first predictive tracking result with high accuracy can be obtained; by determining the second tracking parameter based on the template image and the historical image of the target object, and based on the second tracking parameter, the target object is performed on the current image based on the second tracking parameter.
  • the ability to discriminate similar objects can be improved during the tracking process, so that the success rate of tracking the target objects can be improved when interference from similar objects is encountered.
  • the functions or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments, and the specific implementation and technical effects may refer to the above method embodiments. It is concise and will not be repeated here.
  • Embodiments of the present disclosure further provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.
  • Embodiments of the present disclosure further provide a computer program, including computer-readable codes, when the computer-readable codes are executed in an electronic device, the processor in the electronic device executes the above method.
  • Embodiments of the present disclosure also provide a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are executed in an electronic device , the processor in the electronic device executes the above method.
  • Embodiments of the present disclosure further provide an electronic device, including: one or more processors; a memory for storing executable instructions; wherein the one or more processors are configured to invoke executable instructions stored in the memory instruction to execute the above method.
  • the electronic device may be provided as a terminal, server or other form of device.
  • FIG. 4 shows a block diagram of an electronic device 800 provided by an embodiment of the present disclosure.
  • electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, etc. terminal.
  • the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814 , and the communication component 816 .
  • the processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above.
  • processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components.
  • processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
  • Memory 804 is configured to store various types of data to support operation at electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power supply assembly 806 provides power to various components of electronic device 800 .
  • Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .
  • Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
  • the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 810 is configured to output and/or input audio signals.
  • audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 800 is in operating modes, such as calling mode, recording mode, and voice recognition mode.
  • the received audio signal may be further stored in memory 804 or transmitted via communication component 816 .
  • audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • Sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of electronic device 800 .
  • the sensor assembly 814 can detect the on/off state of the electronic device 800, the relative positioning of the components, such as the display and the keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or one of the electronic device 800 Changes in the position of components, presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 .
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 814 may also include a light sensor, such as a complementary metal oxide semiconductor (CMOS) or charge coupled device (CCD) image sensor, for use in imaging applications.
  • CMOS complementary metal oxide semiconductor
  • CCD charge coupled device
  • the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices.
  • the electronic device 800 can access a wireless network based on communication standards, such as wireless network (Wi-Fi), second generation mobile communication technology (2G), third generation mobile communication technology (3G), fourth generation mobile communication technology (4G) )/Long Term Evolution (LTE) of Universal Mobile Communications Technology, Fifth Generation Mobile Communications Technology (5G), or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A programmed gate array
  • controller microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • a non-volatile computer-readable storage medium such as a memory 804 comprising computer program instructions executable by the processor 820 of the electronic device 800 to perform the above method is also provided.
  • FIG. 5 shows a block diagram of an electronic device 1900 provided by an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server.
  • electronic device 1900 includes processing component 1922, which further includes one or more processors, and a memory resource represented by memory 1932 for storing instructions executable by processing component 1922, such as applications.
  • An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described methods.
  • the electronic device 1900 may also include a power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as a Microsoft server operating system (Windows Server TM ), a graphical user interface based operating system (Mac OS X TM ) introduced by Apple, a multi-user multi-process computer operating system (Unix TM ), Free and Open Source Unix-like Operating System (Linux TM ), Open Source Unix-like Operating System (FreeBSD TM ) or the like.
  • Microsoft server operating system Windows Server TM
  • Mac OS X TM graphical user interface based operating system
  • Uniix TM multi-user multi-process computer operating system
  • Free and Open Source Unix-like Operating System Linux TM
  • FreeBSD TM Open Source Unix-like Operating System
  • a non-volatile computer-readable storage medium such as memory 1932 comprising computer program instructions executable by processing component 1922 of electronic device 1900 to perform the above-described method.
  • the present disclosure may be a system, method and/or computer program product.
  • the computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the present disclosure.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically coded devices, such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disk read only memory
  • DVD digital versatile disk
  • memory sticks floppy disks
  • mechanically coded devices such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.
  • the computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for carrying out operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • Source or object code written in any combination, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect).
  • LAN local area network
  • WAN wide area network
  • custom electronic circuits such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs) can be personalized by utilizing state information of computer readable program instructions.
  • Computer readable program instructions are executed to implement various aspects of the present disclosure.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the computer program product can be specifically implemented by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. Wait.
  • a software development kit Software Development Kit, SDK

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种目标跟踪方法及装置、电子设备和存储介质。所述方法包括:从目标对象的模板图像中获得第一跟踪参数;基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果;基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,其中,所述历史图像表示在所述当前图像之前且包含有目标对象的图像;基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果;基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果。

Description

目标跟踪方法及装置、电子设备和存储介质
本申请要求在2021年3月18日提交中国专利局、申请号为202110292542.0、申请名称为“目标跟踪方法及装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机视觉技术领域,尤其涉及一种目标跟踪方法及装置、电子设备和存储介质。
背景技术
随着图像处理技术的发展,基于图像处理技术的目标跟踪在智能监控、自动驾驶和图像标注等领域起到日益重要的作用,因此目标跟踪也面临着更高的要求。
在目标跟踪中,通常会在一段视频帧序列的某一帧(例如第一帧)中给定一个初始框,来指定需要跟踪的目标对象,并在之后一直跟踪这个指定的目标对象。由于存在遮挡、光照变化、尺度变化等一些干扰问题,目标跟踪一直都存在较大的挑战。
发明内容
本公开提供了一种目标跟踪技术方案。
根据本公开的一方面,提供了一种目标跟踪方法,包括:
从目标对象的模板图像中获得第一跟踪参数;
基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果;
基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,其中,所述历史图像表示在所述当前图像之前且包含有目标对象的图像;
基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果;
基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果。
在一种可能的实现方式中,所述从目标对象的模板图像中获得第一跟踪参数,包括:
提取所述模板图像的第一图像特征,作为第一跟踪参数。
在一种可能的实现方式中,所述基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果,包括:
提取所述当前图像的第二图像特征;
基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果。
在一种可能的实现方式中,
所述提取所述模板图像的第一图像特征,作为第一跟踪参数,包括:通过第一预设网络的不同深度的至少两层对所述模板图像进行特征提取,得到所述模板图像的至少两级第一图像特征,并将所述至少两级第一图像特征作为第一跟踪参数;
所述提取所述当前图像的第二图像特征,包括:通过所述不同深度的至少两层对所述当前图像进行特征提取,得到所述当前图像的至少两级第二图像特征;
所述基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果,包括:对于所述至少两级第一图像特征和所述至少两级第二图像特征中的任意一级,基于该级的第一图像特征和第二图像特征,确定该级的中间预测结果;根据所述至少两级第一图像特征和所述至少两级第二图像特征对应的至少两个中间预测结果,融合得到所述当前图像的第一预测跟踪结果。
在一种可能的实现方式中,所述基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,包括:
获得所述模板图像的第三图像特征;
基于所述第三图像特征,确定初始的第二跟踪参数;
基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数。
在一种可能的实现方式中,所述基于所述第三图像特征,确定初始的第二跟踪参数,包括:基于所述第三图像特征,初始化第二预设网络的在线模块,得到所述初始的第二跟踪参数;所述基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数,包括:将所述初始的第二跟踪参数和所述历史图像的第四图像特征输入所述在线模块,经由所述在线模块得到更新的第二跟踪参数。
在一种可能的实现方式中,所述历史图像是预先从历史视频帧中截取的图像区域,且所述历史图像属于所述目标对象的概率大于或等于第一阈值。
在一种可能的实现方式中,所述获得所述模板图像的第三图像特征,包括:
获取所述模板图像的至少两级第一图像特征,以及与所述至少两级第一图像特征一一对应的至少两个第一权重;
根据所述至少两个第一权重,确定所述至少两级第一图像特征的加权和,得到所述模板图像的第三图像特征。
在一种可能的实现方式中,所述基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果,包括:
获得所述当前图像的第五图像特征;
基于所述第二跟踪参数和所述第五图像特征,确定所述当前图像的第二预测跟踪结果。
在一种可能的实现方式中,所述获得所述当前图像的第五图像特征,包括:
获取所述当前图像的至少两级第二图像特征,以及与所述至少两级第二图像特征一一对应的至少两个第二权重;
根据所述至少两个第二权重,确定所述至少两级第二图像特征的加权和,得到所述当前图像的第五图像特征。
在一种可能的实现方式中,所述基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果,包括:
获取所述第一预测跟踪结果对应的第三权重和所述第二预测跟踪结果对应的第四权重;
根据所述第三权重和第四权重,确定所述第一预测跟踪结果和所述第二预测跟踪结果的加权和,得到所述当前图像的第三预测跟踪结果;
根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果。
在一种可能的实现方式中,所述根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果,包括:
根据所述第三预测跟踪结果,确定所述当前图像中属于所述目标对象的概率最高的第一边界框;
根据所述第三预测跟踪结果,确定所述当前图像中与所述第一边界框具有重叠区域的第二边界框;
根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框。
在一种可能的实现方式中,所述根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框,包括:
确定所述第二边界框与所述第一边界框的交并比;
根据所述交并比,确定所述第二边界框对应的第五权重;
基于所述第五权重,确定所述第一边界框与所述第二边界框的加权和,得到所述当前图像中所述 目标对象的检测框。
根据本公开的一方面,提供了一种目标跟踪装置,包括:
获得模块,用于从目标对象的模板图像中获得第一跟踪参数;
第一目标跟踪模块,用于基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果;
确定模块,用于基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,其中,所述历史图像表示在所述当前图像之前且包含有目标对象的图像;
第二目标跟踪模块,用于基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果;
融合模块,用于基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果。
在一种可能的实现方式中,所述获得模块用于:
提取所述模板图像的第一图像特征,作为第一跟踪参数。
在一种可能的实现方式中,所述第一目标跟踪模块用于:
提取所述当前图像的第二图像特征;
基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果。
在一种可能的实现方式中,
所述获得模块用于:通过第一预设网络的不同深度的至少两层对所述模板图像进行特征提取,得到所述模板图像的至少两级第一图像特征,并将所述至少两级第一图像特征作为第一跟踪参数;
所述第一目标跟踪模块用于:通过所述不同深度的至少两层对所述当前图像进行特征提取,得到所述当前图像的至少两级第二图像特征;对于所述至少两级第一图像特征和所述至少两级第二图像特征中的任意一级,基于该级的第一图像特征和第二图像特征,确定该级的中间预测结果;根据所述至少两级第一图像特征和所述至少两级第二图像特征对应的至少两个中间预测结果,融合得到所述当前图像的第一预测跟踪结果。
在一种可能的实现方式中,所述确定模块用于:
获得所述模板图像的第三图像特征;
基于所述第三图像特征,确定初始的第二跟踪参数;
基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数。
在一种可能的实现方式中,所述确定模块用于:
基于所述第三图像特征,初始化第二预设网络的在线模块,得到所述初始的第二跟踪参数;
将所述初始的第二跟踪参数和所述历史图像的第四图像特征输入所述在线模块,经由所述在线模块得到更新的第二跟踪参数。
在一种可能的实现方式中,所述历史图像是预先从历史视频帧中截取的图像区域,且所述历史图像属于所述目标对象的概率大于或等于第一阈值。
在一种可能的实现方式中,所述确定模块用于:
获取所述模板图像的至少两级第一图像特征,以及与所述至少两级第一图像特征一一对应的至少两个第一权重;
根据所述至少两个第一权重,确定所述至少两级第一图像特征的加权和,得到所述模板图像的第三图像特征。
在一种可能的实现方式中,所述第二目标跟踪模块用于:
获得所述当前图像的第五图像特征;
基于所述第二跟踪参数和所述第五图像特征,确定所述当前图像的第二预测跟踪结果。
在一种可能的实现方式中,所述第二目标跟踪模块用于:
获取所述当前图像的至少两级第二图像特征,以及与所述至少两级第二图像特征一一对应的至少两个第二权重;
根据所述至少两个第二权重,确定所述至少两级第二图像特征的加权和,得到所述当前图像的第五图像特征。
在一种可能的实现方式中,所述融合模块用于:
获取所述第一预测跟踪结果对应的第三权重和所述第二预测跟踪结果对应的第四权重;
根据所述第三权重和第四权重,确定所述第一预测跟踪结果和所述第二预测跟踪结果的加权和,得到所述当前图像的第三预测跟踪结果;
根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果。
在一种可能的实现方式中,所述融合模块用于:
根据所述第三预测跟踪结果,确定所述当前图像中属于所述目标对象的概率最高的第一边界框;
根据所述第三预测跟踪结果,确定所述当前图像中与所述第一边界框具有重叠区域的第二边界框;
根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框。
在一种可能的实现方式中,所述融合模块用于:
确定所述第二边界框与所述第一边界框的交并比;
根据所述交并比,确定所述第二边界框对应的第五权重;
基于所述第五权重,确定所述第一边界框与所述第二边界框的加权和,得到所述当前图像中所述目标对象的检测框。
根据本公开的一方面,提供了一种电子设备,包括:一个或多个处理器;用于存储可执行指令的存储器;其中,所述一个或多个处理器被配置为调用所述存储器存储的可执行指令,以执行上述方法。
根据本公开的一方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
根据本公开的一方面,提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述方法。
在本公开实施例中,通过从目标对象的模板图像中获得第一跟踪参数,基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果,由此能够获得准确性较高的第一预测跟踪结果;通过基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果,由此能够结合目标对象的历史图像的信息获得鲁棒性较高的第二预测跟踪结果;通过基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果,由此能够获得兼具准确性和鲁棒性的跟踪结果。采用本公开实施例提供的目标跟踪方法,能够在跟踪过程中提高对相似对象的判别能力,从而在遇到相似对象的干扰时,能够提高对目标对象进行跟踪的成功率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1示出本公开实施例提供的目标跟踪方法的流程图。
图2示出本公开实施例提供的一种应用场景的示意图。
图3示出本公开实施例提供的目标跟踪装置的框图。
图4示出本公开实施例提供的一种电子设备800的框图。
图5示出本公开实施例提供的一种电子设备1900的框图。
具体实施方式
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
另外,为了更好地说明本公开,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开的主旨。
相关技术中,目标跟踪方法通常基于第一帧的模板图像完成后续帧的跟踪定位。这种方式对于跟踪过程中相似对象的判别能力较弱,当遇到相似对象的干扰时,容易跟踪失败。
本公开实施例提供了一种目标跟踪方法及装置、电子设备和存储介质,通过从目标对象的模板图像中获得第一跟踪参数,基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果,由此能够获得准确性较高的第一预测跟踪结果;通过基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果,由此能够结合目标对象的历史图像的信息获得鲁棒性较高的第二预测跟踪结果;通过基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果,由此能够获得兼具准确性和鲁棒性的跟踪结果。采用本公开实施例提供的目标跟踪方法,能够在跟踪过程中提高对相似对象的判别能力,从而在遇到相似对象的干扰时,能够提高对目标对象进行跟踪的成功率。
下面结合附图对本公开实施例提供的目标跟踪方法进行详细的说明。图1示出本公开实施例提供的目标跟踪方法的流程图。在一种可能的实现方式中,所述目标跟踪方法可以由终端设备或服务器或其它处理设备执行。其中,终端设备可以是用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备或者可穿戴设备等。在一些可能的实现方式中,所述目标跟踪方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。如图1所示,所述目标跟踪方法包括步骤S11至步骤S15。
在步骤S11中,从目标对象的模板图像中获得第一跟踪参数。
在步骤S12中,基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果。
在步骤S13中,基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,其中,所述历史图像表示在所述当前图像之前且包含有目标对象的图像。
在步骤S14中,基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果。
在步骤S15中,基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述 当前图像中的跟踪结果。
在本公开实施例中,目标对象可以表示需要跟踪的对象。在目标对象的数量为多个的情况下,可以针对各个目标对象分别执行本公开实施例提供的目标跟踪方法。目标对象的类型可以为人、物体或者动物等。目标对象的模板图像可以为包含目标对象的图像。目标对象的模板图像可以是目标视频的某一帧(例如第一帧)中的指定区域的图像,也可以不是目标视频中的图像。例如,可以将用户在目标视频的第一帧中框选出的指定区域中的图像,作为目标对象的模板图像。又如,可以将用户在其他视频中框选出的指定区域中的图像,作为目标对象的模板图像。又如,可以将用户上传或者选择的图像作为目标对象的模板图像。
在本公开实施例中,第一跟踪参数可以表示从模板图像中获得的跟踪参数。在本公开实施例中,可以从模板图像中提取信息,得到第一跟踪参数。即,第一跟踪参数可以包含模板图像的信息。例如,第一跟踪参数可以包含模板图像的特征信息、颜色信息、纹理信息等中的至少之一。
在一种可能的实现方式中,所述从目标对象的模板图像中获得第一跟踪参数,包括:提取所述模板图像的第一图像特征,作为第一跟踪参数。在该实现方式中,第一图像特征表示模板图像的图像特征。在该实现方式中,第一图像特征可以为一级或至少两级,第一跟踪参数可以包括一级或至少两级第一图像特征。在该实现方式中,通过提取模板图像的第一图像特征作为第一跟踪参数,由此基于模板图像的第一图像特征对当前图像进行跟踪,能够提高所确定的第一预测跟踪结果的准确性。
在本公开实施例中,第一预测跟踪结果可以表示根据第一跟踪参数在当前图像中预测的跟踪结果。在第一预测跟踪结果中,可以用概率值或者热度值等表示当前图像中的各个像素属于目标对象的概率。
在一种可能的实现方式中,所述基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果,包括:提取所述当前图像的第二图像特征;基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果。在该实现方式中,第二图像特征表示当前图像的图像特征。在该实现方式中,第二图像特征可以为一级或至少两级。在该实现方式中,通过基于第一跟踪参数和当前图像的第二图像特征确定第一预测跟踪结果,由此能够提高所确定的第一预测跟踪结果的准确性。
作为该实现方式的一个示例,所述提取所述模板图像的第一图像特征,作为第一跟踪参数,包括:通过第一预设网络的不同深度的至少两层对所述模板图像进行特征提取,得到所述模板图像的至少两级第一图像特征,并将所述至少两级第一图像特征作为第一跟踪参数;所述提取所述当前图像的第二图像特征,包括:通过所述不同深度的至少两层对所述当前图像进行特征提取,得到所述当前图像的至少两级第二图像特征;所述基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果,包括:对于所述至少两级第一图像特征和所述至少两级第二图像特征中的任意一级,基于该级的第一图像特征和第二图像特征,确定该级的中间预测结果;根据所述至少两级第一图像特征和所述至少两级第二图像特征对应的至少两个中间预测结果,融合得到所述当前图像的第一预测跟踪结果。
在该示例中,第一预设网络可以为孪生网络,例如可以为SiamRPN++。SiamRPN++基于RPN(Region Proposal Network,区域候选网络)进行分类定位,有利于获得更准确的定位坐标。例如,第一图像特征可以包括3级,分别是SiamRPN++的块2(block2)、块3(block3)和块4(block4)输出的模板图像的图像特征;第二图像特征可以包括3级,分别是SiamRPN++的块2、块3和块4输出的当前图像的图像特征。例如,至少两级第一图像特征包括第一级第一图像特征、第二级第一图像特征和第三级第一图像特征,至少两级第二图像特征包括第一级第二图像特征、第二级第二图像特征和第三级第二图像特征;可以通过深度可分离相关层对第一级第一图像特征和第一级第二图像特征进行卷积,得到第一级对应的中间预测结果;可以通过深度可分离相关层对第二级第一图像特征和第二级第二图像特征进行卷积,得到第二级对应的中间预测结果;可以通过深度可分离相关层对第三级第一图像特 征和第三级第二图像特征进行卷积,得到第三级对应的中间预测结果;可以根据第一级对应的中间预测结果、第二级对应的中间预测结果和第三级对应的中间预测结果,融合得到当前图像的第一预测跟踪结果。在一个例子中,在对第二级第一图像特征和第二级第二图像特征进行卷积之前,可以对第二级第一图像特征和第二级第二图像特征插值至与第一级第一图像特征和第一级第二图像特征的尺寸相同;在对第三级第一图像特征和第三级第二图像特征进行卷积之前,可以对第三级第一图像特征和第三级第二图像特征插值至与第一级第一图像特征和第一级第二图像特征的尺寸相同。例如,可以对SiamRPN++的块3和块4的输出进行插值,使插值得到的特征图的尺寸与块2输出的特征图的尺寸相同,由此能够提升第一预设网络的感受野,从而能够进一步提高第一预设网络进行目标跟踪的准确性。
在该示例中,通过利用模板图像的至少两级第一图像特征和当前图像的至少两级第二图像特征确定第一预测跟踪结果,且对于所述至少两级第一图像特征和所述至少两级第二图像特征中的任意一级,基于该级的第一图像特征和第二图像特征,确定该级的中间预测结果,并根据所述至少两级第一图像特征和所述至少两级第二图像特征对应的至少两个中间预测结果,融合得到所述当前图像的第一预测跟踪结果,由此能够利用模板图像和当前图像的更丰富的图像信息,从而能够在当前图像中快速高效地提取目标对象的潜在区域的同时,初步滤除干扰信息,减少冗余计算,并能够将同一级的第一图像特征和第二图像特征进行比对处理,从而能够提高第一预测跟踪结果的准确性。
在一个例子中,可以采用式1,确定当前图像的第一预测跟踪结果
Figure PCTCN2021100558-appb-000001
Figure PCTCN2021100558-appb-000002
其中,z表示模板图像,x i表示当前图像。
Figure PCTCN2021100558-appb-000003
表示第一预设网络的第l个块(block)的输出。
Figure PCTCN2021100558-appb-000004
表示将模板图像z输入第一预设网络后,第一预设网络的第l个块输出的模板图像z的第一图像特征;
Figure PCTCN2021100558-appb-000005
表示将当前图像x i输入第一预设网络后,第一预设网络的第l个块输出的当前图像x i的第二图像特征。例如,l=3可以对应于SiamRPN++的块2,l=4可以对应于SiamRPN++的块3,l=5可以对应于SiamRPN++的块4。
Figure PCTCN2021100558-appb-000006
表示
Figure PCTCN2021100558-appb-000007
Figure PCTCN2021100558-appb-000008
的相关性,在这个例子中,可以将同级的第一图像特征和第二图像特征的相关性作为该级的中间预测结果。α l表示
Figure PCTCN2021100558-appb-000009
对应的权重,其中,α l可以与第一预设网络中的其他参数同时训练。
作为该实现方式的另一个示例,可以对至少两级第一图像特征进行融合,得到第一融合特征;对至少两级第二图像特征进行融合,得到第二融合特征;根据第一融合特征和第二融合特征,得到当前图像的第一预测跟踪结果。
作为该实现方式的另一个示例,模板图像的第一图像特征和当前图像的第二图像特征可以分别为一级,即,可以根据模板图像的一级第一图像特征和当前图像的一级第二图像特征,确定当前图像的第一预测跟踪结果。
在本公开实施例中,第二跟踪参数可以表示基于模板图像和历史图像确定的跟踪参数。在本公开实施例中,可以基于模板图像和历史图像的信息,确定第二跟踪参数。即,第二跟踪参数可以同时包含模板图像和历史图像的信息。在一种可能的实现方式中,可以基于模板图像以及支撑集中的历史图像,确定第二跟踪参数。其中,在目标跟踪的过程中,支撑集中的历史图像可以进行更新,相应地, 第二跟踪参数可以响应于支撑集中的历史图像发生更新,进行更新。在本公开实施例中,通过基于模板图像和历史图像确定第二跟踪参数,并基于第二跟踪参数对当前图像进行目标对象的跟踪,由此能够提高对相似对象的抗干扰能力,从而能够得到鲁棒性较强的第二预测跟踪结果。其中,第二预测跟踪结果可以表示根据第二跟踪参数在当前图像中预测的跟踪结果。在第二预测跟踪结果中,可以采用概率值或者热度值等表示当前图像中的各个像素属于目标对象的概率。
在一种可能的实现方式中,所述基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,包括:获得所述模板图像的第三图像特征;基于所述第三图像特征,确定初始的第二跟踪参数;基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数。
在该实现方式中,第三图像特征为模板图像的图像特征。例如,可以对模板图像的至少两级第一图像特征进行融合,得到模板图像的第三图像特征。又如,模板图像的第三图像特征可以与模板图像的第一图像特征相同。在该实现方式中,可以基于模板图像和支撑集中的各个历史图像,确定第二跟踪参数。其中,支撑集可以在目标跟踪的过程中进行更新。例如,若当前图像中的任一图像区域属于目标对象的概率大于或等于第一阈值,则可以将当前图像中的该图像区域作为新的历史图像加入支撑集中。在一个例子中,支撑集中的历史图像的数量小于或等于第二阈值。在支撑集中的历史图像的数量超过第二阈值的情况下,可以删除最先加入支撑集的历史图像。在该实现方式中,支撑集中不包括模板图像,即,第二跟踪参数除了基于模板图像,还基于模板图像以外的其他历史图像中的目标对象的信息进行确定。在该实现方式中,第二跟踪参数的初始值可以为第三图像特征,并可以随着历史图像的更新进行更新。
在该实现方式中,通过获得所述模板图像的第三图像特征,基于所述第三图像特征,确定初始的第二跟踪参数,并基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数,由此能够目标跟踪的过程中,随着历史图像的更新,持续对第二跟踪参数进行更新,从而能够加强对相似对象的抗干扰能力。
作为该实现方式的一个示例,所述基于所述第三图像特征,确定初始的第二跟踪参数,包括:基于所述第三图像特征,初始化第二预设网络的在线模块,得到所述初始的第二跟踪参数;所述基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数,包括:将所述初始的第二跟踪参数和所述历史图像的第四图像特征输入所述在线模块,经由所述在线模块得到更新的第二跟踪参数。
在该示例中,可以通过第二预设网络的在线模块(online module)对第二跟踪参数进行更新。例如,可以将初始的第二跟踪参数(即第三图像特征)和历史图像的第四图像特征输入在线模块,得到更新的第二跟踪参数。当支撑集中的历史图像发生更新时,可以将当前的第二跟踪参数与当前支撑集中的各个历史图像的第四图像特征输入在线模块,得到更新的第二跟踪参数。即,可以响应于支撑集中的历史图像发生更新,对第二跟踪参数进行实时更新。在该实现方式中,通过基于所述第三图像特征,初始化第二预设网络的在线模块,得到所述初始的第二跟踪参数,并将所述初始的第二跟踪参数和所述历史图像的第四图像特征输入所述在线模块,经由所述在线模块得到更新的第二跟踪参数,由此能够在目标跟踪的过程中,随着历史图像的更新,通过第二预设网络的在线模块持续对第二跟踪参数进行更新,从而能够加强对相似对象的抗干扰能力。
作为该实现方式的一个示例,所述历史图像是预先从历史视频帧中截取的图像区域,且所述历史图像属于所述目标对象的概率大于或等于第一阈值。在该示例中,历史视频中可以表示目标视频中、在当前图像之前的视频帧。在该示例中,通过基于所述模板图像和至少一个历史图像,确定第二跟踪参数,由此能够利用历史图像帧中属于目标对象的概率较大的图像区域的信息辅助进行当前图像的目标跟踪,从而有利于获得鲁棒性较高的第二预测跟踪结果。
在一个示例中,支撑集可以表示为
Figure PCTCN2021100558-appb-000010
其中,M表示支撑集中的历史图像的数量,x j表示支撑集中的第j个历史图像,y j表示x j的伪标签。其中,支撑集中的历史图像的伪标签,可以根据历史图像中的各个位置属于目标对象的概率的高斯分布确定。在一个例子中,可以将支撑集中的各个历史图像的第四图像特征和当前的第二跟踪参数输入在线模块,经由在线模块输出各个历史图像属于目标对象的预测概率。根据各个历史图像属于目标对象的预测概率,以及各个历史图像的伪标签,可以得到第二跟踪参数对应的损失函数。基于该损失函数,采用梯度下降法,可以对第二跟踪参数进行更新。
在一个示例中,在第二预设网络训练完成后,在应用第二预设网络进行目标对象的跟踪的过程中,第二预设网络的内部的参数可以不进行更新,从而能够提高运算效率。
作为该实现方式的一个示例,所述获得所述模板图像的第三图像特征,包括:获取所述模板图像的至少两级第一图像特征,以及与所述至少两级第一图像特征一一对应的至少两个第一权重;根据所述至少两个第一权重,确定所述至少两级第一图像特征的加权和,得到所述模板图像的第三图像特征。基于该示例确定的第三图像特征确定第二跟踪参数,能够进一步提高对当前图像进行目标对象的跟踪的鲁棒性。
作为该实现方式的另一个示例,还可以根据至少两级第一图像特征的平均值,确定第三图像特征。
在一种可能的实现方式中,所述基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果,包括:获得所述当前图像的第五图像特征;基于所述第二跟踪参数和所述第五图像特征,确定所述当前图像的第二预测跟踪结果。在该实现方式中,第五图像特征为当前图像的图像特征。例如,可以通过升维相关层对第五图像特征和第二跟踪参数进行升维的卷积,得到第二预测跟踪结果。在该实现方式中,通过基于所述第二跟踪参数和所述第五图像特征,确定所述当前图像的第二预测跟踪结果,由此能够提高所确定的第二预测跟踪结果的准确性。
作为该实现方式的一个示例,所述获得所述当前图像的第五图像特征,包括:获取所述当前图像的至少两级第二图像特征,以及与所述至少两级第二图像特征一一对应的至少两个第二权重;根据所述至少两个第二权重,确定所述至少两级第二图像特征的加权和,得到所述当前图像的第五图像特征。基于该示例确定的第五图像特征,能够进一步提高第二预测跟踪结果的鲁棒性。
在一个例子中,可以采用式2,确定当前图像的第二预测跟踪结果
Figure PCTCN2021100558-appb-000011
Figure PCTCN2021100558-appb-000012
其中,
Figure PCTCN2021100558-appb-000013
表示第二跟踪参数。
Figure PCTCN2021100558-appb-000014
表示将当前图像x i输入第一预设网络后,第一预设网络的第l个块输出的当前图像x i的第二图像特征。例如,l=3可以对应于SiamRPN++的块2,l=4可以对应于SiamRPN++的块3,l=5可以对应于SiamRPN++的块4。β l表示
Figure PCTCN2021100558-appb-000015
对应的权重。
Figure PCTCN2021100558-appb-000016
表示采用β l
Figure PCTCN2021100558-appb-000017
进行加权,
Figure PCTCN2021100558-appb-000018
表示第一预设网络的三个块(三个不同深度的网络块)从当前图像x i提取的三级第二图像特征的加权和。
在一个例子中,可以采用式3,确定第二跟踪参数
Figure PCTCN2021100558-appb-000019
Figure PCTCN2021100558-appb-000020
其中,
Figure PCTCN2021100558-appb-000021
表示支撑集。支撑集中包括M个历史图像,x j表示支撑集中的第j个历史图像,
Figure PCTCN2021100558-appb-000022
表示x j的伪标签。
Figure PCTCN2021100558-appb-000023
表示将历史图像x j输入第一预设网络后,第一预设网络的第l个块输出的历史图像x j的第六图像特征。β l表示
Figure PCTCN2021100558-appb-000024
对应的权重。
Figure PCTCN2021100558-appb-000025
表示采用β l
Figure PCTCN2021100558-appb-000026
进行加权。
Figure PCTCN2021100558-appb-000027
表示第一预设网络的三个块从历史图像x j提取的三级第六图像特征的加权和,即历史图像x j的第四图像特征。Λ表示在线模块,ρ表示在线模块的内部参数。随着支撑集中的历史图像的更新,第二跟踪参数
Figure PCTCN2021100558-appb-000028
将发生更新。在一个例子中,可以将支撑集中的M个历史图像的第四图像特征和当前的第二跟踪参数输入在线模块Λ,经由在线模块Λ输出各个历史图像属于目标对象的预测概率。根据各个历史图像属于目标对象的预测概率,以及各个历史图像的伪标签
Figure PCTCN2021100558-appb-000029
可以得到第二跟踪参数对应的损失函数。基于该损失函数,采用梯度下降法,可以对第二跟踪参数进行更新,得到更新的第二跟踪参数。
在一种可能的实现方式中,所述基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果,包括:获取所述第一预测跟踪结果对应的第三权重和所述第二预测跟踪结果对应的第四权重;根据所述第三权重和第四权重,确定所述第一预测跟踪结果和所述第二预测跟踪结果的加权和,得到所述当前图像的第三预测跟踪结果;根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果。在该实现方式中,第三权重和第四权重可以分别为超参数。第三权重与第四权重之和可以等于1,第三权重可以大于0且小于1,第四权重可以大于0且小于1。当然,第三权重与第四权重之和也可以不等于1。在该实现方式中,可以根据第一预测跟踪结果和第二预测跟踪结果的加权和,确定第三预测跟踪结果。在该实现方式中,通过根据所述第三权重和第四权重,确定所述第一预测跟踪结果和所述第二预测跟踪结果的加权和,得到所述当前图像的第三预测跟踪结果,并根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果,由此获得的目标对象在当前图像中跟踪结果能够兼具准确性和鲁棒性。
在一个例子中,可以采用式4,确定所述目标对象在所述当前图像中的跟踪结果
Figure PCTCN2021100558-appb-000030
Figure PCTCN2021100558-appb-000031
其中,
Figure PCTCN2021100558-appb-000032
表示第二预测跟踪结果,μ表示第二预测跟踪结果对应的第四权重,
Figure PCTCN2021100558-appb-000033
表示第一预测跟踪结果,1-μ表示第一预测跟踪结果对应的第三权重。
作为该实现方式的一个示例,所述根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果,包括:根据所述第三预测跟踪结果,确定所述当前图像中属于所述目标对象的概率 最高的第一边界框;根据所述第三预测跟踪结果,确定所述当前图像中与所述第一边界框具有重叠区域的第二边界框;根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框。在该示例中,可以基于第三预测跟踪结果进行边界框回归,得到目标对象在当前图像中的多个候选框。在多个候选框中,可以将属于目标对象的概率最高的候选框作为第一边界框,并将与第一边界框具有重叠区域的候选框作为第二边界框。其中,第二边界框的数量可以为一个或多个。在该示例中,在确定当前图像中目标对象的检测框时,不仅基于属于目标对象的概率最高的第一边界框,还基于与第一边界框重叠的第二边界框,由此能够利用更多候选框的信息得到更准确的检测框。
在一个例子中,所述根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框,包括:确定所述第二边界框与所述第一边界框的交并比;根据所述交并比,确定所述第二边界框对应的第五权重;基于所述第五权重,确定所述第一边界框与所述第二边界框的加权和,得到所述当前图像中所述目标对象的检测框。例如,第一边界框对应的权重可以是1,任一第二边界框对应的第五权重可以等于该第二边界框与第一边界框的交并比。又如,第一边界框对应的权重可以与第一边界框属于目标对象的概率的正相关;任一第二边界框对应的第五权重可以与该第二边界框与第一边界框的交并比正相关,且与该第二边界框属于目标对象的概率正相关。例如,第一边界框对应的权重可以是第一边界框属于目标对象的概率;任一第二边界框对应的第五权重可以等于:该第二边界框与第一边界框的交并比,与该第二边界框属于目标对象的概率的乘积。例如,可以确定第一边界框与各个第二边界框的加权和;确定第一边界框对应的权重与各个第二边界框对应的第五权重之和,得到权重和;将该加权和与该权重和的比值,作为当前图像中目标对象的检测框。在上述例子中,通过确定所述第二边界框与所述第一边界框的交并比,根据所述交并比,确定所述第二边界框对应的第五权重,并基于所述第五权重,确定所述第一边界框与所述第二边界框的加权和,得到所述当前图像中所述目标对象的检测框,由此能够提高跟踪结果的稳定性。
在另一个例子中,各个第二边界框对应的第五权重可以相同。例如,可以计算各个第二边界框的平均值,并将该平均值与第一边界框的平均值,作为当前图像中目标对象的检测框。
当然,在其他示例中,也可以直接将第一边界框作为目标对象的检测框。
在本公开实施例中,通过从目标对象的模板图像中获得第一跟踪参数,基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果,由此能够获得准确性较高的第一预测跟踪结果;通过基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果,由此能够结合目标对象的历史图像的信息获得鲁棒性较高的第二预测跟踪结果;通过基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果,由此能够获得兼具准确性和鲁棒性的跟踪结果。采用本公开实施例提供的目标跟踪方法,能够在跟踪过程中提高对相似对象的判别能力,从而在遇到相似对象的干扰时,能够提高对目标对象进行跟踪的成功率。
本公开实施例提供的目标跟踪方法可以应用于单目标跟踪或者多目标跟踪等跟踪任务中。
下面通过一个具体的应用场景说明本公开实施例提供的目标跟踪方法。图2示出本公开实施例提供的一种应用场景的示意图。如图2所示,该应用场景提供了一种目标跟踪器,该目标跟踪器包括第一预设网络和第二预设网络,第二预设网络包括在线模块。第一预设网络的输入可以为目标对象的模板图像z和当前图像x i,输出可以为第一预测跟踪结果
Figure PCTCN2021100558-appb-000034
第二预设网络的输入可以为模板图像z的第三图像特征、支撑集中的各个历史图像x j的第四图像特征以及当前图像x i的第五图像特征,输出可以为第二预测跟踪结果
Figure PCTCN2021100558-appb-000035
计算第一预测跟踪结果
Figure PCTCN2021100558-appb-000036
与第二预测跟踪结果
Figure PCTCN2021100558-appb-000037
的加权和,可以得到目标对象在当前图像x i中的最终跟踪结果。下面分别对第一预设网络和第二预设网络进行介绍。
第一预设网络可以采用SiamRPN++。将模板图像z输入SiamRPN++,可以经由SiamRPN++的块2(block2)、块3(block3)和块4(block4)分别输出模板图像z的第一级第一图像特征、第二级第一图像特征和第三级第一图像特征。将当前图像x i输入SiamRPN++,可以经由SiamRPN++的块2、块3和块4分别输出当前图像x i的第一级第二图像特征、第二级第二图像特征和第三级第二图像特征。可以通过深度可分离相关层(DW-C,depthwise correlation)计算第一级第一图像特征和第一级第二图像特征的相关性,得到第一级对应的中间预测结果;通过深度可分离相关层计算第二级第一图像特征和第二级第二图像特征的相关性,得到第二级对应的中间预测结果;通过深度可分离相关层计算第三级第一图像特征和第三级第二图像特征,得到第三级对应的中间预测结果。计算三级中间预测结果的加权和,可以得到第一预测跟踪结果。如图2所示,还可以对SiamRPN++的块3和块4的输出进行插值,使插值得到的特征图的尺寸与块2输出的特征图的尺寸相同,由此能够提升第一预设网络的感受野,从而能够进一步提高第一预设网络进行目标对象的跟踪的准确性。通过采用第一预设网络能够获得准确性较高的第一预测跟踪结果,即,通过第一预设网络回归得到的目标对象的位置的准确性较高。
第二预设网络的在线模块可以用于对第二跟踪参数进行更新。其中,第二跟踪参数的初始值可以为模板图像z的第三图像特征。在第二跟踪参数首次更新时,在线模块可以输入模板图像z的第三图像特征和支撑集中的历史图像的第四图像特征,得到更新的第二跟踪参数。在第二跟踪参数后续更新时,在线模块可以输入当前的第二跟踪参数和支撑集中的各个历史图像的第四图像特征,得到更新的第二跟踪参数。通过升维相关层(UP-C,up-channel correlation)计算当前图像x i的第五图像特征与最新的第二跟踪参数的相关性,可以得到第二预测跟踪结果。通过采用第二预设网络能够获得鲁棒性较高的第二预测跟踪结果,即,第二预设网络进行分类的鲁棒性较高,对相似对象的抗干扰能力较强。
通过基于第一预测跟踪结果和第二预测跟踪结果,得到目标对象在当前图像中的跟踪结果,由此能够获得兼具准确性和鲁棒性的跟踪结果。例如,在目标对象的周围存在一个或多个干扰对象(即与目标对象相似的对象)的情况下,通过采用本公开实施例提供的目标跟踪方法,能够准确地区分干扰对象和目标对象,从而能够使得跟踪结果更加准确。又如,在无人机跟拍系统中,目标对象可能被亭子、桥梁、建筑等遮挡,在目标对象再次出现时,通过采用本公开实施例提供的目标跟踪方法,能够高效准确地重新找回目标对象。又如,本公开实施例提供的目标跟踪方法还可以应用于自动标注,从而能够提高更准确的自动标注数据。另外,本公开实施例提供的目标跟踪方法具有较高的分类准确性和回归准确性,稳定性较高,能够更好地适应于长时间的目标跟踪任务,且具备较快的跟踪速度,能够实现实时跟踪。
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例,限于篇幅,本公开不再赘述。本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
此外,本公开还提供了目标跟踪装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种目标跟踪方法,相应技术方案和技术效果可参见方法部分的相应记载,不再赘述。
图3示出本公开实施例提供的目标跟踪装置的框图。如图3所示,所述目标跟踪装置包括:
根据本公开的一方面,提供了一种目标跟踪装置,包括:
获得模块31,用于从目标对象的模板图像中获得第一跟踪参数;
第一目标跟踪模块32,用于基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果;
确定模块33,用于基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,其中,所述历史图像表示在所述当前图像之前且包含有目标对象的图像;
第二目标跟踪模块34,用于基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果;
融合模块35,用于基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果。
在一种可能的实现方式中,所述获得模块31用于:
提取所述模板图像的第一图像特征,作为第一跟踪参数。
在一种可能的实现方式中,所述第一目标跟踪模块32用于:
提取所述当前图像的第二图像特征;
基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果。
在一种可能的实现方式中,
所述获得模块31用于:通过第一预设网络的不同深度的至少两层对所述模板图像进行特征提取,得到所述模板图像的至少两级第一图像特征,并将所述至少两级第一图像特征作为第一跟踪参数;
所述第一目标跟踪模块32用于:通过所述不同深度的至少两层对所述当前图像进行特征提取,得到所述当前图像的至少两级第二图像特征;对于所述至少两级第一图像特征和所述至少两级第二图像特征中的任意一级,基于该级的第一图像特征和第二图像特征,确定该级的中间预测结果;根据所述至少两级第一图像特征和所述至少两级第二图像特征对应的至少两个中间预测结果,融合得到所述当前图像的第一预测跟踪结果。
在一种可能的实现方式中,所述确定模块33用于:
基于所述第三图像特征,确定初始的第二跟踪参数;
基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数。
在一种可能的实现方式中,所述确定模块33用于:
基于所述第三图像特征,初始化第二预设网络的在线模块,得到所述初始的第二跟踪参数;
将所述初始的第二跟踪参数和所述历史图像的第四图像特征输入所述在线模块,经由所述在线模块得到更新的第二跟踪参数。
在一种可能的实现方式中,所述历史图像是预先从历史视频帧中截取的图像区域,且所述历史图像属于所述目标对象的概率大于或等于第一阈值。
在一种可能的实现方式中,所述确定模块33用于:
获取所述模板图像的至少两级第一图像特征,以及与所述至少两级第一图像特征一一对应的至少两个第一权重;
根据所述至少两个第一权重,确定所述至少两级第一图像特征的加权和,得到所述模板图像的第三图像特征。
在一种可能的实现方式中,所述第二目标跟踪模块34用于:
获得所述当前图像的第五图像特征;
基于所述第二跟踪参数和所述第五图像特征,确定所述当前图像的第二预测跟踪结果。
在一种可能的实现方式中,所述第二目标跟踪模块34用于:
获取所述当前图像的至少两级第二图像特征,以及与所述至少两级第二图像特征一一对应的至少两个第二权重;
根据所述至少两个第二权重,确定所述至少两级第二图像特征的加权和,得到所述当前图像的第五图像特征。
在一种可能的实现方式中,所述融合模块35用于:
获取所述第一预测跟踪结果对应的第三权重和所述第二预测跟踪结果对应的第四权重;
根据所述第三权重和第四权重,确定所述第一预测跟踪结果和所述第二预测跟踪结果的加权和,得到所述当前图像的第三预测跟踪结果;
根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果。
在一种可能的实现方式中,所述融合模块35用于:
根据所述第三预测跟踪结果,确定所述当前图像中属于所述目标对象的概率最高的第一边界框;
根据所述第三预测跟踪结果,确定所述当前图像中与所述第一边界框具有重叠区域的第二边界框;
根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框。
在一种可能的实现方式中,所述融合模块35用于:
确定所述第二边界框与所述第一边界框的交并比;
根据所述交并比,确定所述第二边界框对应的第五权重;
基于所述第五权重,确定所述第一边界框与所述第二边界框的加权和,得到所述当前图像中所述目标对象的检测框。
在本公开实施例中,通过从目标对象的模板图像中获得第一跟踪参数,基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果,由此能够获得准确性较高的第一预测跟踪结果;通过基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果,由此能够结合目标对象的历史图像的信息获得鲁棒性较高的第二预测跟踪结果;通过基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果,由此能够获得兼具准确性和鲁棒性的跟踪结果。采用本公开实施例提供的目标跟踪方法,能够在跟踪过程中提高对相似对象的判别能力,从而在遇到相似对象的干扰时,能够提高对目标对象进行跟踪的成功率。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现和技术效果可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。其中,所述计算机可读存储介质可以是非易失性计算机可读存储介质,或者可以是易失性计算机可读存储介质。
本公开实施例还提出一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现上述方法。
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行上述方法。
本公开实施例还提供一种电子设备,包括:一个或多个处理器;用于存储可执行指令的存储器;其中,所述一个或多个处理器被配置为调用所述存储器存储的可执行指令,以执行上述方法。
电子设备可以被提供为终端、服务器或其它形态的设备。
图4示出本公开实施例提供的一种电子设备800的框图。例如,电子设备800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。
参照图4,电子设备800可以包括以下一个或多个组件:处理组件802,存储器804,电源组件806, 多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。
处理组件802通常控制电子设备800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件806为电子设备800的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为电子设备800生成、管理和分配电力相关联的组件。
多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。当电子设备800处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),当电子设备800处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为电子设备800的显示器和小键盘,传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变,用户与电子设备800接触的存在或不存在,电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如互补金属氧化物半导体(CMOS)或电荷耦合装置(CCD)图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络,如无线网络(Wi-Fi)、第二代移动通信技术(2G)、第三代移动通信技术(3G)、第四代移动通信技术(4G)/通用移动通信技术的长期演进(LTE)、第五代移动通信技术(5G)或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理 器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器804,上述计算机程序指令可由电子设备800的处理器820执行以完成上述方法。
图5示出本公开实施例提供的一种电子设备1900的框图。例如,电子设备1900可以被提供为一服务器。参照图5,电子设备1900包括处理组件1922,其进一步包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。
电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统,例如微软服务器操作系统(Windows Server TM),苹果公司推出的基于图形用户界面操作系统(Mac OS X TM),多用户多进程的计算机操作系统(Unix TM),自由和开放原代码的类Unix操作系统(Linux TM),开放原代码的类Unix操作系统(FreeBSD TM)或类似。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。
本公开可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程 逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (17)

  1. 一种目标跟踪方法,其特征在于,包括:
    从目标对象的模板图像中获得第一跟踪参数;
    基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果;
    基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,其中,所述历史图像表示在所述当前图像之前且包含有目标对象的图像;
    基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果;
    基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果。
  2. 根据权利要求1所述的方法,其特征在于,所述从目标对象的模板图像中获得第一跟踪参数,包括:
    提取所述模板图像的第一图像特征,作为第一跟踪参数。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果,包括:
    提取所述当前图像的第二图像特征;
    基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果。
  4. 根据权利要求3所述的方法,其特征在于,
    所述提取所述模板图像的第一图像特征,作为第一跟踪参数,包括:通过第一预设网络的不同深度的至少两层对所述模板图像进行特征提取,得到所述模板图像的至少两级第一图像特征,并将所述至少两级第一图像特征作为第一跟踪参数;
    所述提取所述当前图像的第二图像特征,包括:通过所述不同深度的至少两层对所述当前图像进行特征提取,得到所述当前图像的至少两级第二图像特征;
    所述基于所述第一跟踪参数和所述第二图像特征,确定所述当前图像的第一预测跟踪结果,包括:对于所述至少两级第一图像特征和所述至少两级第二图像特征中的任意一级,基于该级的第一图像特征和第二图像特征,确定该级的中间预测结果;根据所述至少两级第一图像特征和所述至少两级第二图像特征对应的至少两个中间预测结果,融合得到所述当前图像的第一预测跟踪结果。
  5. 根据权利要求1至4中任意一项所述的方法,其特征在于,所述基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,包括:
    获得所述模板图像的第三图像特征;
    基于所述第三图像特征,确定初始的第二跟踪参数;
    基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数。
  6. 根据权利要求5所述的方法,其特征在于,
    所述基于所述第三图像特征,确定初始的第二跟踪参数,包括:基于所述第三图像特征,初始化第二预设网络的在线模块,得到所述初始的第二跟踪参数;
    所述基于初始的第二跟踪参数和所述历史图像的第四图像特征,得到更新的第二跟踪参数,包括:将所述初始的第二跟踪参数和所述历史图像的第四图像特征输入所述在线模块,经由所述在线模块得到更新的第二跟踪参数。
  7. 根据权利要求5或6所述的方法,其特征在于,所述历史图像是预先从历史视频帧中截取的图像区域,且所述历史图像属于所述目标对象的概率大于或等于第一阈值。
  8. 根据权利要求5至7中任意一项所述的方法,其特征在于,所述获得所述模板图像的第三图像特征,包括:
    获取所述模板图像的至少两级第一图像特征,以及与所述至少两级第一图像特征一一对应的至少 两个第一权重;
    根据所述至少两个第一权重,确定所述至少两级第一图像特征的加权和,得到所述模板图像的第三图像特征。
  9. 根据权利要求1至8中任意一项所述的方法,其特征在于,所述基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果,包括:
    获得所述当前图像的第五图像特征;
    基于所述第二跟踪参数和所述第五图像特征,确定所述当前图像的第二预测跟踪结果。
  10. 根据权利要求9所述的方法,其特征在于,所述获得所述当前图像的第五图像特征,包括:
    获取所述当前图像的至少两级第二图像特征,以及与所述至少两级第二图像特征一一对应的至少两个第二权重;
    根据所述至少两个第二权重,确定所述至少两级第二图像特征的加权和,得到所述当前图像的第五图像特征。
  11. 根据权利要求1至10中任意一项所述的方法,其特征在于,所述基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果,包括:
    获取所述第一预测跟踪结果对应的第三权重和所述第二预测跟踪结果对应的第四权重;
    根据所述第三权重和第四权重,确定所述第一预测跟踪结果和所述第二预测跟踪结果的加权和,得到所述当前图像的第三预测跟踪结果;
    根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述第三预测跟踪结果,确定所述目标对象在所述当前图像中的跟踪结果,包括:
    根据所述第三预测跟踪结果,确定所述当前图像中属于所述目标对象的概率最高的第一边界框;
    根据所述第三预测跟踪结果,确定所述当前图像中与所述第一边界框具有重叠区域的第二边界框;
    根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所第一边界框和所述第二边界框,确定所述当前图像中所述目标对象的检测框,包括:
    确定所述第二边界框与所述第一边界框的交并比;
    根据所述交并比,确定所述第二边界框对应的第五权重;
    基于所述第五权重,确定所述第一边界框与所述第二边界框的加权和,得到所述当前图像中所述目标对象的检测框。
  14. 一种目标跟踪装置,其特征在于,包括:
    获得模块,用于从目标对象的模板图像中获得第一跟踪参数;
    第一目标跟踪模块,用于基于所述第一跟踪参数对当前图像进行目标对象的跟踪,得到所述当前图像的第一预测跟踪结果;
    确定模块,用于基于所述模板图像以及所述目标对象的历史图像,确定第二跟踪参数,其中,所述历史图像表示在所述当前图像之前且包含有目标对象的图像;
    第二目标跟踪模块,用于基于所述第二跟踪参数对所述当前图像进行目标对象的跟踪,得到所述当前图像的第二预测跟踪结果;
    融合模块,用于基于所述第一预测跟踪结果和所述第二预测跟踪结果,得到所述目标对象在所述当前图像中的跟踪结果。
  15. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    用于存储可执行指令的存储器;
    其中,所述一个或多个处理器被配置为调用所述存储器存储的可执行指令,以执行权利要求1至13中任意一项所述的方法。
  16. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至13中任意一项所述的方法。
  17. 一种计算机程序产品,其特征在于,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行权利要求1至13中任意一项所述的方法。
PCT/CN2021/100558 2021-03-18 2021-06-17 目标跟踪方法及装置、电子设备和存储介质 WO2022193456A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/880,592 US20220383517A1 (en) 2021-03-18 2022-08-03 Method and device for target tracking, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110292542.0A CN113052874B (zh) 2021-03-18 2021-03-18 目标跟踪方法及装置、电子设备和存储介质
CN202110292542.0 2021-03-18

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/880,592 Continuation US20220383517A1 (en) 2021-03-18 2022-08-03 Method and device for target tracking, and storage medium

Publications (1)

Publication Number Publication Date
WO2022193456A1 true WO2022193456A1 (zh) 2022-09-22

Family

ID=76513695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/100558 WO2022193456A1 (zh) 2021-03-18 2021-06-17 目标跟踪方法及装置、电子设备和存储介质

Country Status (3)

Country Link
US (1) US20220383517A1 (zh)
CN (1) CN113052874B (zh)
WO (1) WO2022193456A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114332157B (zh) * 2021-12-14 2024-05-24 北京理工大学 一种双阈值控制的长时跟踪方法
CN114332080B (zh) * 2022-03-04 2022-05-27 北京字节跳动网络技术有限公司 组织腔体的定位方法、装置、可读介质和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109191491A (zh) * 2018-08-03 2019-01-11 华中科技大学 基于多层特征融合的全卷积孪生网络的目标跟踪方法及系统
US10204299B2 (en) * 2015-11-04 2019-02-12 Nec Corporation Unsupervised matching in fine-grained datasets for single-view object reconstruction
CN110647836A (zh) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 一种鲁棒的基于深度学习的单目标跟踪方法
CN111429482A (zh) * 2020-03-19 2020-07-17 上海眼控科技股份有限公司 目标跟踪方法、装置、计算机设备和存储介质
CN112330718A (zh) * 2020-11-12 2021-02-05 重庆邮电大学 一种基于cnn的三级信息融合视觉目标跟踪方法
CN112381858A (zh) * 2020-11-13 2021-02-19 成都商汤科技有限公司 目标检测方法、装置、存储介质及设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679455A (zh) * 2017-08-29 2018-02-09 平安科技(深圳)有限公司 目标跟踪装置、方法及计算机可读存储介质
CN109785385B (zh) * 2019-01-22 2021-01-29 中国科学院自动化研究所 视觉目标跟踪方法及系统
CN112183600B (zh) * 2020-09-22 2021-04-27 天津大学 一种基于动态记忆库模板更新的目标跟踪方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10204299B2 (en) * 2015-11-04 2019-02-12 Nec Corporation Unsupervised matching in fine-grained datasets for single-view object reconstruction
CN109191491A (zh) * 2018-08-03 2019-01-11 华中科技大学 基于多层特征融合的全卷积孪生网络的目标跟踪方法及系统
CN110647836A (zh) * 2019-09-18 2020-01-03 中国科学院光电技术研究所 一种鲁棒的基于深度学习的单目标跟踪方法
CN111429482A (zh) * 2020-03-19 2020-07-17 上海眼控科技股份有限公司 目标跟踪方法、装置、计算机设备和存储介质
CN112330718A (zh) * 2020-11-12 2021-02-05 重庆邮电大学 一种基于cnn的三级信息融合视觉目标跟踪方法
CN112381858A (zh) * 2020-11-13 2021-02-19 成都商汤科技有限公司 目标检测方法、装置、存储介质及设备

Also Published As

Publication number Publication date
CN113052874B (zh) 2022-01-25
US20220383517A1 (en) 2022-12-01
CN113052874A (zh) 2021-06-29

Similar Documents

Publication Publication Date Title
TWI766286B (zh) 圖像處理方法及圖像處理裝置、電子設備和電腦可讀儲存媒介
TWI781359B (zh) 人臉和人手關聯檢測方法及裝置、電子設備和電腦可讀儲存媒體
WO2021008158A1 (zh) 一种人体关键点检测方法及装置、电子设备和存储介质
WO2022011892A1 (zh) 网络训练方法及装置、目标检测方法及装置和电子设备
TWI706379B (zh) 圖像處理方法及裝置、電子設備和儲存介質
WO2022134382A1 (zh) 图像分割方法及装置、电子设备和存储介质、计算机程序
TWI767596B (zh) 場景深度和相機運動預測方法、電子設備和電腦可讀儲存介質
US20210158560A1 (en) Method and device for obtaining localization information and storage medium
CN107784279B (zh) 目标跟踪方法及装置
US20210279892A1 (en) Image processing method and device, and network training method and device
WO2022193456A1 (zh) 目标跟踪方法及装置、电子设备和存储介质
WO2022021872A1 (zh) 目标检测方法及装置、电子设备和存储介质
CN111401230B (zh) 姿态估计方法及装置、电子设备和存储介质
WO2022188305A1 (zh) 信息展示方法及装置、电子设备、存储介质及计算机程序
JP2022522551A (ja) 画像処理方法及び装置、電子機器並びに記憶媒体
WO2022193507A1 (zh) 图像处理方法及装置、设备、存储介质、程序和程序产品
CN109344703B (zh) 对象检测方法及装置、电子设备和存储介质
CN112291473B (zh) 对焦方法、装置及电子设备
TW202205127A (zh) 目標檢測方法、電子設備和電腦可讀儲存介質
CN112184787A (zh) 图像配准方法及装置、电子设备和存储介质
WO2020233201A1 (zh) 图标位置确定方法和装置
WO2022183656A1 (zh) 数据生成方法、装置、设备、存储介质及程序
US20210326578A1 (en) Face recognition method and apparatus, electronic device, and storage medium
CN114581525A (zh) 姿态确定方法及装置、电子设备和存储介质
WO2022141969A1 (zh) 图像分割方法及装置、电子设备、存储介质和程序

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21931041

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21931041

Country of ref document: EP

Kind code of ref document: A1