CN116343072A - Target tracking method and device - Google Patents

Target tracking method and device Download PDF

Info

Publication number
CN116343072A
CN116343072A CN202111555172.1A CN202111555172A CN116343072A CN 116343072 A CN116343072 A CN 116343072A CN 202111555172 A CN202111555172 A CN 202111555172A CN 116343072 A CN116343072 A CN 116343072A
Authority
CN
China
Prior art keywords
scale
target tracking
features
tracking result
roi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111555172.1A
Other languages
Chinese (zh)
Inventor
陈一伟
俞佳茜
潘思杨
朴昶范
李贤庭
王强
俞炳仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to CN202111555172.1A priority Critical patent/CN116343072A/en
Priority to KR1020220157467A priority patent/KR20230092741A/en
Priority to US18/084,003 priority patent/US20230196589A1/en
Publication of CN116343072A publication Critical patent/CN116343072A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • G06T7/231Analysis of motion using block-matching using full search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/12Bounding box

Abstract

Provided are a target tracking method and device. The target tracking method comprises the following steps: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result. Meanwhile, the object tracking method performed by the electronic device described above may be performed using an artificial intelligence model.

Description

Target tracking method and device
Technical Field
The present disclosure relates to the field of computer vision technology. More particularly, the present disclosure relates to a target tracking method and apparatus.
Background
Vision-based object tracking technology (visual object tracking) is an important direction in computer vision. The specific task is to continuously predict the envelope of the target object in the following frame image according to the first frame image and the given target object envelope (bounding box) in one video. The method comprises the core ideas of extracting template information (template) according to a target (target) marked by a first frame, calculating the matching degree of different candidate positions and templates in a subsequent video frame searching area (search region), and selecting the position with the highest matching degree so as to determine the target position.
Target tracking techniques are commonly used for common objects in motion, such as humans, animals, aircraft, automobiles, and the like. But unlike object detection techniques, object tracking algorithms do not detect class attributes of objects. The target tracking algorithm can be further divided into short-time visual target tracking (short-term visual object tracking) and long-time visual target tracking (long-term visual object tracking) according to the tracked scene. The tracking of long-term visual targets adds verification (verification) of estimated target tracking states and retrieval (re-detection) after target tracking failure based on short-term visual target tracking technology.
In the related technology, the consumption of computing resources is large, and the real-time performance of tracking is affected; the accumulated error of the scale generated in the target tracking is larger, and the effect of the target tracking is affected.
Disclosure of Invention
An exemplary embodiment of the present disclosure is to provide a target tracking method and apparatus to improve the effect of target tracking with reduced computing consumption.
According to an exemplary embodiment of the present disclosure, there is provided a target tracking method including: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.
Alternatively, the search area may be a full-view search area or an area larger than the target tracking result of the previous frame image.
Alternatively, the search area may be determined based on the target tracking result of the previous frame image.
Alternatively, the first target tracking result may comprise a first tracking envelope box and the second target tracking result may comprise a second tracking envelope box.
Optionally, the target tracking method may further include: whether the target tracking is successful is determined based on the scale features of the first target tracking result.
Optionally, determining whether the target tracking is successful based on the scale features of the first target tracking result may include: acquiring apparent characteristics of a first target tracking result; whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.
Optionally, before predicting the scale of the target in the search area based on the scale features of the first target tracking result, the target tracking method may further include: and acquiring the scale characteristics of the first target tracking result.
Optionally, acquiring the scale feature of the first target tracking result may include: acquiring interesting ROI features of the multi-scale template; acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale; and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.
Optionally, before determining the scale feature of the first target tracking result based on the multi-scale template ROI feature and the ROI feature of the first target tracking result, the target tracking method may further comprise: feature alignment is performed on the ROI features of each scale in the ROI features of the first target tracking result based on the apparent features of the first target tracking result.
Optionally, determining the scale feature of the first target tracking result based on the multi-scale template ROI feature and the ROI feature of the first target tracking result may include: and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.
Optionally, performing a correlation calculation of the multi-scale template ROI feature and the ROI feature of the first target tracking result may include: and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.
Alternatively, the ROI features of the first target tracking result may comprise ROI features of one scale, and the scale features of the first target tracking result may comprise one-dimensional scale features.
Alternatively, the ROI features of the first target tracking result may comprise ROI features of multiple scales, and the scale features of the first target tracking result may comprise two-dimensional scale features.
Optionally, the target tracking method may further include: the first target tracking result is adjusted based on the apparent characteristics of the first target tracking result. According to an exemplary embodiment of the present disclosure, there is provided an object tracking apparatus including: a target determination unit configured to determine a first target tracking result based on a search area of a current frame image of the video; a scale prediction unit configured to predict a scale of a target in the search area based on a scale feature of the first target tracking result; and the scale adjustment unit is configured to adjust the first target tracking result based on the scale prediction result to obtain a second target tracking result.
Alternatively, the search area may be a full-view search area or an area larger than the target tracking result of the previous frame image.
Optionally, the search area is determined based on a target tracking result of a previous frame image.
Alternatively, the first target tracking result may comprise a first tracking envelope box and the second target tracking result may comprise a second tracking envelope box.
Optionally, the target tracking apparatus may further include: and a result checking unit configured to determine whether the target tracking is successful based on the scale features of the first target tracking result.
Alternatively, the result checking unit may be configured to: acquiring apparent characteristics of a first target tracking result; whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.
Optionally, the target tracking apparatus may further include: and the scale feature acquisition unit is configured to acquire the scale feature of the first target tracking result.
Alternatively, the scale feature acquisition unit may be configured to: acquiring interesting ROI features of the multi-scale template; acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale;
and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.
Optionally, the target tracking apparatus may further include: and a feature alignment unit configured to perform feature alignment on the ROI feature of each scale among the ROI features of the first target tracking result based on the apparent feature of the first target tracking result.
Alternatively, the scale feature acquisition unit may be configured to: and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.
Alternatively, the scale feature acquisition unit may be configured to: and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.
Alternatively, the ROI features of the first target tracking result may comprise ROI features of one scale, and the scale features of the first target tracking result may comprise one-dimensional scale features.
Alternatively, the ROI features of the first target tracking result may comprise ROI features of multiple scales, and the scale features of the first target tracking result may comprise two-dimensional scale features.
Optionally, the target tracking apparatus may further include: and a feature adjustment unit configured to adjust the first target tracking result based on the apparent feature of the first target tracking result.
According to an exemplary embodiment of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements an object tracking method according to an exemplary embodiment of the present disclosure.
According to an exemplary embodiment of the present disclosure, there is provided a computing device including: at least one processor; at least one memory storing a computer program that, when executed by the at least one processor, implements an object tracking method according to an exemplary embodiment of the present disclosure.
According to an exemplary embodiment of the present disclosure, a computer program product is provided, instructions in which are executable by a processor of a computer device to perform a target tracking method according to an exemplary embodiment of the present disclosure.
According to the target tracking method and the target tracking device of the exemplary embodiment of the disclosure, the target object envelope frame of the current frame image of the video is firstly predicted, the candidate region characteristic of at least one scale of the target object envelope frame is obtained, the scale characteristic of the target object envelope frame is determined based on the candidate region characteristic of at least one scale of the target object envelope frame and the multi-scale template characteristic of the video generated based on the initial target object envelope frame in the first frame image, and then the scale of the target object envelope frame is predicted based on the scale characteristic, so that the target tracking effect is improved under the condition of reducing calculation consumption.
Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.
Drawings
The foregoing and other objects and features of exemplary embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate the embodiments by way of example, in which:
FIG. 1 illustrates a flow chart of a target tracking method according to an exemplary embodiment of the present disclosure;
FIG. 2 shows a schematic diagram of target tracking according to an exemplary embodiment of the present disclosure;
FIG. 3 illustrates an example of one-dimensional scale features according to an exemplary embodiment of the present disclosure;
FIG. 4 illustrates an example of a two-dimensional scale feature according to an exemplary embodiment of the present disclosure;
FIG. 5 illustrates a schematic diagram of a scale feature generator according to an exemplary embodiment of the present disclosure;
FIG. 6 illustrates a structural schematic diagram of a one-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure;
FIG. 7 illustrates a structural schematic diagram of a two-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure;
FIG. 8 illustrates a block diagram of a target tracking device according to an exemplary embodiment of the present disclosure; and
fig. 9 shows a schematic diagram of a computing device according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments will be described below in order to explain the present disclosure by referring to the figures.
In the related art, a matching network (e.g., a twin network) or an existing tracker is utilized to relocate a lost target. Since tracking or matching is only good for objects of a certain size scale. In the related art, matching of targets under multiple size scales is required by a scale search method (scale search). The tracker or the matching network inputs images of a plurality of scales and outputs the confidence coefficient of the corresponding candidate envelope frame and the corresponding candidate envelope frame under each scale. And selecting a candidate envelope frame with the highest confidence score as a heavy detection result. In the related art, the scale problem is regarded as target matching on a limited number of images with different scales, and relatively better real-time performance can be achieved due to lower calculation consumption of a tracker or a matching network.
However, in the retrieval detection process of the related art, multiple matching tracking of search images of different scales is adopted to predict confidence scores of targets at different scales. This will result in greater computational resource consumption, reducing the real-time nature of the tracker. Further, in a computationally constrained device, the number of preset multi-scale search images needs to be as small as possible, which in turn can affect the recall of different scale targets, affecting the performance of the tracker.
The disclosure provides a state estimation and scale prediction method based on scale features. On the one hand, aiming at the problem of larger calculation consumption of the scale search method, the disclosure proposes to utilize a template to carry out multi-scale matching with a current candidate region so as to generate scale features containing scale information. Based on the characteristics, in the recovery detection, the scale of the candidate region target is predicted, and the detection of targets with different sizes is processed.
Fig. 1 shows a flowchart of a target tracking method according to an exemplary embodiment of the present disclosure. Fig. 2 shows a schematic diagram of target tracking according to an exemplary embodiment of the present disclosure. Fig. 3 illustrates an example of one-dimensional scale features according to an exemplary embodiment of the present disclosure. Fig. 4 illustrates an example of a two-dimensional scale feature according to an exemplary embodiment of the present disclosure. Fig. 5 shows a schematic structural diagram of a scale feature generator according to an exemplary embodiment of the present disclosure. Fig. 6 shows a schematic structural diagram of a one-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure. Fig. 7 shows a schematic structural diagram of a two-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure.
Referring to fig. 1, in step S101, a first target tracking result is determined based on a search area of a current frame image of a video.
In an exemplary embodiment of the present disclosure, the first target tracking result may include a first tracking envelope box.
In an exemplary embodiment of the present disclosure, the search area may be a full-view search area or an area larger than a target tracking result of a previous frame image. For example, when it is determined that the target tracking of the current frame image fails, it is necessary to newly perform target tracking on the current frame image, that is, to newly determine the target tracking result of the current frame image to retrieve the target, in which case the first target tracking result is determined based on the full-view search area or an area larger than the target tracking result of the previous frame image. In one example, when it is determined that the target tracking of the current frame image fails, the full-view search area is taken as the first target tracking result. In another example, when it is determined that the target tracking of the current frame image fails, an area larger than the target tracking result of the previous frame image is taken as the first target tracking result.
In an exemplary embodiment of the present disclosure, the search area may be determined based on a target tracking result of a previous frame image. For example, when it is determined that the target tracking of the previous frame image is successful, the target tracking result of the previous frame image is determined as the first target tracking result.
In exemplary embodiments of the present disclosure, the first target tracking result may also be adjusted based on an apparent characteristic of the first target tracking result.
In step S102, the scale of the target in the search area is predicted based on the scale features of the first target tracking result.
In exemplary embodiments of the present disclosure, the scale features of the first target tracking result may be first acquired before predicting the scale of the target in the search area based on the scale features of the first target tracking result.
In exemplary embodiments of the present disclosure, when acquiring the scale feature of the first target tracking result, a multi-scale template region of interest (Region Of Interest, simply ROI) feature may be acquired first, the ROI feature of the first target tracking result is acquired, and then the scale feature of the first target tracking result is determined based on the multi-scale template ROI feature and the ROI feature of the first target tracking result. Here, the ROI features of the first object tracking result include ROI features of at least one scale.
In exemplary embodiments of the present disclosure, before determining the scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result, the ROI features of each scale in the ROI features of the first target tracking result may also be feature aligned based on the apparent features of the first target tracking result.
For example, as shown in FIG. 2, a first frame image I of a video is first acquired 1 Envelope frame b combined with given initial target object 0 For image I 1 Cutting to obtain a target object image Z, and extracting depth features F of the image Z by adopting a convolutional neural network Z . Acquiring a t-th frame image I of a video t Target object envelope frame predicted from previous frame (i.e., I t-1 Envelope frame) for image I t Cropping to obtain search region image X t Then, a convolutional neural network is adopted to extract a search area image X t Depth feature F of (2) Xt . Then, predicting the target object envelope frame B of the target object in the t-th frame t
Then based on the characteristic F Z And F is equal to Xt At envelope box B by scale feature generator t The position is adjusted to obtain an envelope frame B t’ . Based on envelope frame B t’ Generating k by a scale feature generator x Ruler for measuringAligned envelope box B of degrees t’ ROI feature R of (2) X1~kx Initial target object envelope b in a first video-based frame image 0 Generating k z Template ROI feature R of individual scale Z1~kz . Then, feature R is processed by a scale predictor Z1~kz And R is R X1~kx Performing multi-scale matching to obtain scale feature F St
In an exemplary embodiment of the present disclosure, when determining the scale feature of the first target tracking result based on the multi-scale template ROI feature and the ROI feature of the first target tracking result, the multi-scale template ROI feature and the ROI feature of the first target tracking result may be subjected to a correlation calculation to obtain the scale feature of the first target tracking result.
In an exemplary embodiment of the present disclosure, when performing the correlation calculation of the multi-scale template ROI feature and the ROI feature of the first target tracking result, the correlation calculation may be performed with each scale of the ROI feature of the first target tracking result, respectively, with the multi-scale template feature.
In an exemplary embodiment of the present disclosure, the ROI features of the first target tracking result may include ROI features of one scale, and the scale features of the first target tracking result may include one-dimensional scale features.
In an exemplary embodiment of the present disclosure, the ROI features of the first target tracking result may include ROI features of multiple scales, and the scale features of the first target tracking result may include two-dimensional scale features.
In an exemplary embodiment of the present disclosure, the second target tracking result includes a second tracking envelope box.
In the task in the image, performing a Correlation operation can obtain a responsiveness Y representing the degree of similarity of both images, wherein the larger the value is, the higher the degree of similarity of the corresponding position with the target object image X in the search area image Z is. The correlation calculation is shown as follows:
the correlation calculation is shown as follows:
Figure BDA0003418879590000081
here, Y (i, j) represents the degree of similarity of the two images X and Z, h and w represent the size of the image X, and i, j, u, v are coordinates in the images, respectively.
The scale features are related operations between features of different scales, and the correlation between the features of different scales is obtained, and is calculated as follows:
one-dimensional scale features calculate the scale association of multi-scale template features with a single scale candidate region (predicted envelope), which need only be generated once at the initialization of the tracking system. The computational cost is less. The ROI features of the predicted envelope frame can be related to k using, for example, the following formula z And performing related operation on the ROI features of the template envelope frames with different scales to obtain one-dimensional scale features.
Figure BDA0003418879590000082
Here, S (S z I, j) represents one-dimensional scale features, f x 、f z The predicted envelope frame and the template envelope frame are respectively the ROI features. s is(s) z Is the scale of the ROI features of the template envelope.
Based on the one-dimensional scale features, multi-scale correlation calculations of the two-dimensional scale features extend to candidate regions of different scales (i.e., predicted envelope frames). Compared with the one-dimensional scale feature, the two-dimensional scale feature contains more scale information, which is beneficial to improving the performance of the relevant module inspired by the scale feature. The predicted k can be calculated by, for example, the following formula x ROI features and k of envelope frames of different scales z And performing related operation on the ROI features of the template envelope frames with different scales to obtain two-dimensional scale features.
Figure BDA0003418879590000083
Here, S (S x ,s z ,i,j) Representing two-dimensional scale features, s x Is the scale of the ROI features of the predicted envelope.
The scale features may be generated by a scale feature generator as shown in fig. 5. As shown in fig. 5, the scale features describe the scale relatedness of the object (object) within the two envelope frames (template envelope frame, predicted envelope frame), requiring that the in-frame object (object) be as centered as possible within the envelope frame. Feature alignment is to mine feature apparent information on an original tracking result by using a convolutional neural network, and further adjust the center offset of a target object (target object) in an envelope frame. After the feature alignment, multi-scale ROI features are generated. And mining scale information in the features by using a convolutional neural network, and finally, executing multi-scale correlation among the features to generate scale features.
The one-dimensional scale can be calculated using the ROI features of the template envelope frame and the predicted ROI features of the envelope frame of the same size by the following formula:
Figure BDA0003418879590000091
here, S1 (S z I, j) represents a one-dimensional scale, the dimension of the scale feature S1 being 1K z
The two-dimensional scale can be calculated using the ROI features of the template envelope frame and the predicted ROI features of the envelope frame of the same size by the following formula:
Figure BDA0003418879590000092
here, S2 (S x ,s z I, j) represents a two-dimensional scale, the dimension of the scale feature S2 being k z ×k x
In an exemplary embodiment of the present disclosure, when predicting the scale of the target in the search area based on the scale features of the first target tracking result, a maximum scale response value among the scale response values included in the scale features of the first target tracking result may be first selected, and then a scale corresponding to the maximum scale response value may be predicted as the scale of the target in the search area.
In an exemplary embodiment of the present disclosure, when predicting the scale of the target in the search area based on the scale feature of the first target tracking result, the scale feature may be input into a preset convolutional neural network, to obtain the scale of the target in the search area.
In step S103, the first target tracking result is adjusted based on the scale prediction result, and the second target tracking result is obtained. Here, in the target tracking process, the first target tracking result is adjusted based on the scale prediction result, so that larger scale drift in the target tracking process can be reduced, and the target tracking effect is improved. In the re-detection process (also referred to as a re-target tracking process) after the target tracking fails, the calculation cost of the re-detection process can be reduced by adjusting the first target tracking result based on the scale prediction result.
In exemplary embodiments of the present disclosure, after the second target tracking result is obtained, it may also be determined whether the target tracking was successful based on the scale features of the first target tracking result. Here, in estimating the state of target tracking, the state of target tracking can be accurately estimated by using the scale features.
In exemplary embodiments of the present disclosure, in determining whether the target tracking is successful based on the scale features of the first target tracking result, the apparent features of the first target tracking result may be first obtained, and then whether the target tracking is successful may be determined based on the apparent features and the scale features of the first target tracking result.
For example, as shown in FIG. 2, the scale feature heuristics module is first based on the scale feature F St Prediction envelope frame scale and confidence. The scale predictor is based on scale feature F St Predicting the current scale; verifiers for scale feature guidance based on scale feature F St Distribution pattern and apparent characteristic F of (2) Xt 、F Z Confidence in the envelope is estimated.
For example, as shown in fig. 6, based on the one-dimensional scale feature, the scale predictor judges the current scale according to the size of the scale response; and the state validator is used for respectively calculating the confidence coefficient based on the apparent information and the confidence coefficient based on the scale features according to the distribution mode of the scale features and the apparent features, and fusing and outputting the final confidence coefficient. Target tracking based on one-dimensional scale features is suitable for scenes with low calculation amount requirements.
For example, as shown in FIG. 7, based on two-dimensional scale features, a scale predictor employs a convolutional neural network to mine scale information in the scale features. And the state validator is used for respectively calculating the confidence coefficient based on the apparent information and the confidence coefficient based on the scale features according to the distribution mode of the scale features and the apparent features, and fusing and outputting the final confidence coefficient. Target tracking based on two-dimensional scale is suitable for scenes with higher performance requirements.
After performing the target object scale prediction for the current frame image of the video to track the target in steps S101 to S104, the current frame image of the video (e.g., I t ) Is the next frame image (e.g., I t+1 ) Target object scale prediction is performed to track the target.
With the target tracking method according to the exemplary embodiment of the present disclosure, a large scale drift occurring in the target tracking process may be reduced, thereby improving the effect of target tracking. Further, using the target tracking method according to an exemplary embodiment of the present disclosure, the calculation cost of the re-detection process may be reduced. In addition, using the target tracking method according to the exemplary embodiments of the present disclosure, the state of target tracking may be accurately estimated, improving the accuracy of target tracking.
Further, according to an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed, implements the object tracking method according to an exemplary embodiment of the present disclosure.
In an exemplary embodiment of the present disclosure, the computer-readable storage medium may carry one or more programs, which when executed, may implement the steps of: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.
The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the present disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing. The computer readable storage medium may be embodied in any device; or may exist alone without being assembled into the device.
Further, according to an exemplary embodiment of the present disclosure, a computer program product is provided, instructions in which are executable by a processor of a computer device to perform a method of object tracking according to an exemplary embodiment of the present disclosure.
The object tracking method according to the exemplary embodiment of the present disclosure has been described above in connection with fig. 1 to 7. Hereinafter, a target tracking apparatus and units thereof according to an exemplary embodiment of the present disclosure will be described with reference to fig. 8.
Fig. 8 shows a block diagram of a target tracking apparatus according to an exemplary embodiment of the present disclosure.
Referring to fig. 8, the object tracking apparatus includes an object determining unit 81, a scale predicting unit 82, and a scale adjusting unit 83.
The target determination unit 81 is configured to determine a first target tracking result based on a search area of a current frame image of the video.
In an exemplary embodiment of the present disclosure, the first target tracking result may include a first tracking envelope box.
In an exemplary embodiment of the present disclosure, the search area may be a full-view search area or an area larger than a target tracking result of a previous frame image.
In an exemplary embodiment of the present disclosure, the search area may be determined based on a target tracking result of a previous frame image.
In an exemplary embodiment of the present disclosure, the target tracking apparatus may further include a feature adjustment unit (not shown) configured to adjust the first target tracking result based on an apparent feature of the first target tracking result.
The scale prediction unit 82 is configured to predict the scale of the target in the search area based on the scale features of the first target tracking result.
In an exemplary embodiment of the present disclosure, the object tracking device may further include a scale feature acquisition unit (not shown) configured to acquire scale features of the first object tracking result.
In an exemplary embodiment of the present disclosure, the scale feature acquisition unit may be configured to: acquiring interesting ROI features of the multi-scale template; acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale; and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.
In an exemplary embodiment of the present disclosure, the object tracking apparatus may further include: a feature alignment unit (not shown) configured to perform feature alignment on each scale of the ROI features of the first target tracking result based on the apparent features of the first target tracking result.
In an exemplary embodiment of the present disclosure, the scale feature acquisition unit may be configured to: and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.
In an exemplary embodiment of the present disclosure, the scale feature acquisition unit may be configured to: and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.
In an exemplary embodiment of the present disclosure, the ROI features of the first target tracking result may include ROI features of one scale, and the scale features of the first target tracking result may include one-dimensional scale features.
In an exemplary embodiment of the present disclosure, the ROI features of the first target tracking result may include ROI features of multiple scales, and the scale features of the first target tracking result may include two-dimensional scale features.
The scale adjustment unit 83 is configured to adjust the first target tracking result based on the scale prediction result, resulting in a second target tracking result.
In an exemplary embodiment of the present disclosure, the second target tracking result may include a second tracking envelope box.
In an exemplary embodiment of the present disclosure, the object tracking device may further include a result checking unit (not shown) configured to determine whether the object tracking is successful based on the scale features of the first object tracking result.
In an exemplary embodiment of the present disclosure, the result checking unit may be configured to: acquiring apparent characteristics of a first target tracking result; whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.
An object tracking device according to an exemplary embodiment of the present disclosure has been described above in connection with fig. 8. Next, a computing device according to an exemplary embodiment of the present disclosure is described in connection with fig. 9.
Fig. 9 shows a schematic diagram of a computing device according to an exemplary embodiment of the present disclosure.
Referring to fig. 9, a computing device 9 according to an exemplary embodiment of the present disclosure includes a memory 91 and a processor 92, the memory 91 having stored thereon a computer program which, when executed by the processor 92, implements a target tracking method according to an exemplary embodiment of the present disclosure.
In an exemplary embodiment of the present disclosure, the following steps may be implemented when the computer program is executed by the processor 92: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.
Computing devices in embodiments of the present disclosure may include, but are not limited to, devices such as mobile phones, notebook computers, PDAs (personal digital assistants), PADs (tablet computers), desktop computers, and the like. The computing device illustrated in fig. 9 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure.
The object tracking method and apparatus according to the exemplary embodiments of the present disclosure have been described above with reference to fig. 1 to 9. However, it should be understood that: the object tracking device and its elements shown in fig. 8 may be configured as software, hardware, firmware, or any combination thereof, respectively, that performs a particular function, the computing device shown in fig. 9 is not limited to including the components shown above, but may add or delete some components as desired, and the above components may also be combined.
According to the target tracking method and device of the exemplary embodiment of the disclosure, the first target tracking result is obtained based on the search area of the current frame image of the video, the scale of the target in the search area is predicted based on the scale characteristics of the first target tracking result, and then the first target tracking result is adjusted based on the scale prediction result to obtain the second target tracking result, so that the target tracking effect is improved under the condition of reducing calculation consumption.
In an exemplary embodiment of the present disclosure, the object tracking method may obtain an object tracking result of a subsequent frame of the output video by taking the initialized object tracking result of a first frame of the video, as input data of the artificial intelligence model.
The artificial intelligence model may be obtained through training. Herein, "obtaining by training" refers to training a basic artificial intelligence model having a plurality of training data by a training algorithm to obtain predefined operational rules or artificial intelligence models configured to perform a desired feature (or purpose).
As an example, the artificial intelligence model may include a plurality of neural network layers. Each of the plurality of neural network layers includes a plurality of weight values, and the neural network calculation is performed by calculation between the calculation result of the previous layer and the plurality of weight values.
Visual understanding, like human vision, is a technique for identifying and processing things including, for example, object recognition, object tracking, image retrieval, human recognition, scene recognition, three-dimensional reconstruction/localization, or image enhancement.
While the present disclosure has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the following claims.

Claims (30)

1. A target tracking method, comprising:
determining a first target tracking result based on a search area of a current frame image of the video;
predicting a scale of the target in the search area based on the scale features of the first target tracking result;
and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.
2. The object tracking method according to claim 1, wherein the search area is a full-view search area or an area larger than an object tracking result of a previous frame image.
3. The object tracking method according to claim 1, wherein the search area is determined based on an object tracking result of a previous frame image.
4. The target tracking method of claim 1, wherein the first target tracking result comprises a first tracking envelope box and the second target tracking result comprises a second tracking envelope box.
5. The target tracking method according to claim 1, further comprising:
whether the target tracking is successful is determined based on the scale features of the first target tracking result.
6. The method of claim 5, wherein determining whether the target tracking was successful based on the scale characteristics of the first target tracking result comprises:
acquiring apparent characteristics of a first target tracking result;
whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.
7. The target tracking method of claim 6, wherein prior to predicting the scale of the target in the search area based on the scale features of the first target tracking result, the target tracking method further comprises:
and acquiring the scale characteristics of the first target tracking result.
8. The method of claim 7, wherein obtaining the scale feature of the first target tracking result comprises:
acquiring a region of interest (ROI) characteristic of a multi-scale template;
acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale;
and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.
9. The target tracking method of claim 8, wherein prior to determining the scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result, the target tracking method further comprises:
feature alignment is performed on the ROI features of each scale in the ROI features of the first target tracking result based on the apparent features of the first target tracking result.
10. The target tracking method of claim 8, wherein determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result comprises:
and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.
11. The object tracking method of claim 10, wherein performing a correlation calculation of the multi-scale template ROI feature and the ROI feature of the first object tracking result comprises:
and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.
12. The method of claim 8, wherein the ROI features of the first target tracking result comprise ROI features of one scale and the scale features of the first target tracking result comprise one-dimensional scale features.
13. The object tracking method of claim 8 wherein the ROI features of the first object tracking result comprise ROI features of a plurality of scales and the scale features of the first object tracking result comprise two-dimensional scale features.
14. The target tracking method of claim 6, further comprising:
the first target tracking result is adjusted based on the apparent characteristics of the first target tracking result.
15. An object tracking device, characterized in that the object tracking device comprises:
a target determination unit configured to determine a first target tracking result based on a search area of a current frame image of the video;
a scale prediction unit configured to predict a scale of a target in the search area based on a scale feature of the first target tracking result;
and the scale adjustment unit is configured to adjust the first target tracking result based on the scale prediction result to obtain a second target tracking result.
16. The object tracking device of claim 15, wherein the search area is a full-view search area or an area larger than an object tracking result of a previous frame image.
17. The object tracking device of claim 15 wherein the search area is determined based on an object tracking result of a previous frame image.
18. The object tracking device of claim 15 wherein the first object tracking result comprises a first tracking envelope box and the second object tracking result comprises a second tracking envelope box.
19. The object tracking device of claim 15, further comprising:
and a result checking unit configured to determine whether the target tracking is successful based on the scale features of the first target tracking result.
20. The object tracking device of claim 19, wherein the result checking unit is configured to:
acquiring apparent characteristics of a first target tracking result;
whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.
21. The object tracking device of claim 20, further comprising:
and the scale feature acquisition unit is configured to acquire the scale feature of the first target tracking result.
22. The object tracking device of claim 21, wherein the scale feature acquisition unit is configured to:
acquiring a multi-scale template ROI feature;
acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale;
and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.
23. The object tracking device of claim 22, further comprising:
and a feature alignment unit configured to perform feature alignment on the ROI feature of each scale among the ROI features of the first target tracking result based on the apparent feature of the first target tracking result.
24. The object tracking device of claim 22, wherein the scale feature acquisition unit is configured to:
and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.
25. The object tracking device of claim 24, wherein the scale feature acquisition unit is configured to:
and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.
26. The object tracking device of claim 22 wherein the ROI features of the first object tracking result comprise ROI features of one dimension and the scale features of the first object tracking result comprise one-dimensional scale features.
27. The object tracking device of claim 22 wherein the ROI features of the first object tracking result comprise ROI features of a plurality of dimensions and the scale features of the first object tracking result comprise two-dimensional scale features.
28. The object tracking device of claim 22, further comprising:
and a feature adjustment unit configured to adjust the first target tracking result based on the apparent feature of the first target tracking result.
29. A computer readable storage medium storing a computer program, characterized in that the object tracking method according to any one of claims 1-14 is implemented when the computer program is executed by a processor.
30. A computing device, the computing device comprising:
at least one processor;
at least one memory storing a computer program which, when executed by the at least one processor, implements the object tracking method of any one of claims 1-14.
CN202111555172.1A 2021-12-17 2021-12-17 Target tracking method and device Pending CN116343072A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202111555172.1A CN116343072A (en) 2021-12-17 2021-12-17 Target tracking method and device
KR1020220157467A KR20230092741A (en) 2021-12-17 2022-11-22 Apparatus and method for tracking target
US18/084,003 US20230196589A1 (en) 2021-12-17 2022-12-19 Method and apparatus with target tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111555172.1A CN116343072A (en) 2021-12-17 2021-12-17 Target tracking method and device

Publications (1)

Publication Number Publication Date
CN116343072A true CN116343072A (en) 2023-06-27

Family

ID=86880906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111555172.1A Pending CN116343072A (en) 2021-12-17 2021-12-17 Target tracking method and device

Country Status (2)

Country Link
KR (1) KR20230092741A (en)
CN (1) CN116343072A (en)

Also Published As

Publication number Publication date
KR20230092741A (en) 2023-06-26

Similar Documents

Publication Publication Date Title
US20220392234A1 (en) Training neural networks for vehicle re-identification
WO2019170023A1 (en) Target tracking method and apparatus, and electronic device and storage medium
Urtasun et al. Sparse probabilistic regression for activity-independent human pose inference
CN109657615B (en) Training method and device for target detection and terminal equipment
CN111062263B (en) Method, apparatus, computer apparatus and storage medium for hand gesture estimation
CN103310188A (en) Method and apparatus for pose recognition
CN111914878A (en) Feature point tracking training and tracking method and device, electronic equipment and storage medium
CN111428619A (en) Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels
CN110992401A (en) Target tracking method and device, computer equipment and storage medium
CN113344016A (en) Deep migration learning method and device, electronic equipment and storage medium
CN112634333A (en) Tracking device method and device based on ECO algorithm and Kalman filtering
CN112749726A (en) Training method and device of target detection model, computer equipment and storage medium
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN110992404B (en) Target tracking method, device and system and storage medium
KR20220059194A (en) Method and apparatus of object tracking adaptive to target object
CN110766725B (en) Template image updating method and device, target tracking method and device, electronic equipment and medium
CN114565668A (en) Instant positioning and mapping method and device
WO2015176502A1 (en) Image feature estimation method and device
WO2019170024A1 (en) Target tracking method and apparatus, and electronic device and storage medium
KR20220079428A (en) Method and apparatus for detecting object in video
CN111008992B (en) Target tracking method, device and system and storage medium
US11961249B2 (en) Generating stereo-based dense depth images
CN112257686B (en) Training method and device for human body posture recognition model and storage medium
CN116343072A (en) Target tracking method and device
Qin et al. Object tracking using distribution fields with correlation coefficients

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication