CN116343072A

CN116343072A - Target tracking method and device

Info

Publication number: CN116343072A
Application number: CN202111555172.1A
Authority: CN
Inventors: 陈一伟; 俞佳茜; 潘思杨; 朴昶范; 李贤庭; 王强; 俞炳仁
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2023-06-27
Also published as: KR20230092741A

Abstract

Provided are a target tracking method and device. The target tracking method comprises the following steps: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result. Meanwhile, the object tracking method performed by the electronic device described above may be performed using an artificial intelligence model.

Description

Target tracking method and device

Technical Field

The present disclosure relates to the field of computer vision technology. More particularly, the present disclosure relates to a target tracking method and apparatus.

Background

Vision-based object tracking technology (visual object tracking) is an important direction in computer vision. The specific task is to continuously predict the envelope of the target object in the following frame image according to the first frame image and the given target object envelope (bounding box) in one video. The method comprises the core ideas of extracting template information (template) according to a target (target) marked by a first frame, calculating the matching degree of different candidate positions and templates in a subsequent video frame searching area (search region), and selecting the position with the highest matching degree so as to determine the target position.

Target tracking techniques are commonly used for common objects in motion, such as humans, animals, aircraft, automobiles, and the like. But unlike object detection techniques, object tracking algorithms do not detect class attributes of objects. The target tracking algorithm can be further divided into short-time visual target tracking (short-term visual object tracking) and long-time visual target tracking (long-term visual object tracking) according to the tracked scene. The tracking of long-term visual targets adds verification (verification) of estimated target tracking states and retrieval (re-detection) after target tracking failure based on short-term visual target tracking technology.

In the related technology, the consumption of computing resources is large, and the real-time performance of tracking is affected; the accumulated error of the scale generated in the target tracking is larger, and the effect of the target tracking is affected.

Disclosure of Invention

An exemplary embodiment of the present disclosure is to provide a target tracking method and apparatus to improve the effect of target tracking with reduced computing consumption.

According to an exemplary embodiment of the present disclosure, there is provided a target tracking method including: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.

Alternatively, the search area may be a full-view search area or an area larger than the target tracking result of the previous frame image.

Alternatively, the search area may be determined based on the target tracking result of the previous frame image.

Alternatively, the first target tracking result may comprise a first tracking envelope box and the second target tracking result may comprise a second tracking envelope box.

Optionally, the target tracking method may further include: whether the target tracking is successful is determined based on the scale features of the first target tracking result.

Optionally, determining whether the target tracking is successful based on the scale features of the first target tracking result may include: acquiring apparent characteristics of a first target tracking result; whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.

Optionally, before predicting the scale of the target in the search area based on the scale features of the first target tracking result, the target tracking method may further include: and acquiring the scale characteristics of the first target tracking result.

Optionally, acquiring the scale feature of the first target tracking result may include: acquiring interesting ROI features of the multi-scale template; acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale; and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.

Optionally, before determining the scale feature of the first target tracking result based on the multi-scale template ROI feature and the ROI feature of the first target tracking result, the target tracking method may further comprise: feature alignment is performed on the ROI features of each scale in the ROI features of the first target tracking result based on the apparent features of the first target tracking result.

Optionally, determining the scale feature of the first target tracking result based on the multi-scale template ROI feature and the ROI feature of the first target tracking result may include: and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.

Optionally, performing a correlation calculation of the multi-scale template ROI feature and the ROI feature of the first target tracking result may include: and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.

Alternatively, the ROI features of the first target tracking result may comprise ROI features of one scale, and the scale features of the first target tracking result may comprise one-dimensional scale features.

Alternatively, the ROI features of the first target tracking result may comprise ROI features of multiple scales, and the scale features of the first target tracking result may comprise two-dimensional scale features.

Optionally, the target tracking method may further include: the first target tracking result is adjusted based on the apparent characteristics of the first target tracking result. According to an exemplary embodiment of the present disclosure, there is provided an object tracking apparatus including: a target determination unit configured to determine a first target tracking result based on a search area of a current frame image of the video; a scale prediction unit configured to predict a scale of a target in the search area based on a scale feature of the first target tracking result; and the scale adjustment unit is configured to adjust the first target tracking result based on the scale prediction result to obtain a second target tracking result.

Optionally, the search area is determined based on a target tracking result of a previous frame image.

Optionally, the target tracking apparatus may further include: and a result checking unit configured to determine whether the target tracking is successful based on the scale features of the first target tracking result.

Alternatively, the result checking unit may be configured to: acquiring apparent characteristics of a first target tracking result; whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.

Optionally, the target tracking apparatus may further include: and the scale feature acquisition unit is configured to acquire the scale feature of the first target tracking result.

Alternatively, the scale feature acquisition unit may be configured to: acquiring interesting ROI features of the multi-scale template; acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale;

and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.

Optionally, the target tracking apparatus may further include: and a feature alignment unit configured to perform feature alignment on the ROI feature of each scale among the ROI features of the first target tracking result based on the apparent feature of the first target tracking result.

Alternatively, the scale feature acquisition unit may be configured to: and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.

Alternatively, the scale feature acquisition unit may be configured to: and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.

Optionally, the target tracking apparatus may further include: and a feature adjustment unit configured to adjust the first target tracking result based on the apparent feature of the first target tracking result.

According to an exemplary embodiment of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, implements an object tracking method according to an exemplary embodiment of the present disclosure.

According to an exemplary embodiment of the present disclosure, there is provided a computing device including: at least one processor; at least one memory storing a computer program that, when executed by the at least one processor, implements an object tracking method according to an exemplary embodiment of the present disclosure.

According to an exemplary embodiment of the present disclosure, a computer program product is provided, instructions in which are executable by a processor of a computer device to perform a target tracking method according to an exemplary embodiment of the present disclosure.

According to the target tracking method and the target tracking device of the exemplary embodiment of the disclosure, the target object envelope frame of the current frame image of the video is firstly predicted, the candidate region characteristic of at least one scale of the target object envelope frame is obtained, the scale characteristic of the target object envelope frame is determined based on the candidate region characteristic of at least one scale of the target object envelope frame and the multi-scale template characteristic of the video generated based on the initial target object envelope frame in the first frame image, and then the scale of the target object envelope frame is predicted based on the scale characteristic, so that the target tracking effect is improved under the condition of reducing calculation consumption.

Additional aspects and/or advantages of the present general inventive concept will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the general inventive concept.

Drawings

The foregoing and other objects and features of exemplary embodiments of the present disclosure will become more apparent from the following description taken in conjunction with the accompanying drawings which illustrate the embodiments by way of example, in which:

FIG. 1 illustrates a flow chart of a target tracking method according to an exemplary embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of target tracking according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates an example of one-dimensional scale features according to an exemplary embodiment of the present disclosure;

FIG. 4 illustrates an example of a two-dimensional scale feature according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a schematic diagram of a scale feature generator according to an exemplary embodiment of the present disclosure;

FIG. 6 illustrates a structural schematic diagram of a one-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure;

FIG. 7 illustrates a structural schematic diagram of a two-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure;

FIG. 8 illustrates a block diagram of a target tracking device according to an exemplary embodiment of the present disclosure; and

fig. 9 shows a schematic diagram of a computing device according to an exemplary embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments will be described below in order to explain the present disclosure by referring to the figures.

In the related art, a matching network (e.g., a twin network) or an existing tracker is utilized to relocate a lost target. Since tracking or matching is only good for objects of a certain size scale. In the related art, matching of targets under multiple size scales is required by a scale search method (scale search). The tracker or the matching network inputs images of a plurality of scales and outputs the confidence coefficient of the corresponding candidate envelope frame and the corresponding candidate envelope frame under each scale. And selecting a candidate envelope frame with the highest confidence score as a heavy detection result. In the related art, the scale problem is regarded as target matching on a limited number of images with different scales, and relatively better real-time performance can be achieved due to lower calculation consumption of a tracker or a matching network.

However, in the retrieval detection process of the related art, multiple matching tracking of search images of different scales is adopted to predict confidence scores of targets at different scales. This will result in greater computational resource consumption, reducing the real-time nature of the tracker. Further, in a computationally constrained device, the number of preset multi-scale search images needs to be as small as possible, which in turn can affect the recall of different scale targets, affecting the performance of the tracker.

The disclosure provides a state estimation and scale prediction method based on scale features. On the one hand, aiming at the problem of larger calculation consumption of the scale search method, the disclosure proposes to utilize a template to carry out multi-scale matching with a current candidate region so as to generate scale features containing scale information. Based on the characteristics, in the recovery detection, the scale of the candidate region target is predicted, and the detection of targets with different sizes is processed.

Fig. 1 shows a flowchart of a target tracking method according to an exemplary embodiment of the present disclosure. Fig. 2 shows a schematic diagram of target tracking according to an exemplary embodiment of the present disclosure. Fig. 3 illustrates an example of one-dimensional scale features according to an exemplary embodiment of the present disclosure. Fig. 4 illustrates an example of a two-dimensional scale feature according to an exemplary embodiment of the present disclosure. Fig. 5 shows a schematic structural diagram of a scale feature generator according to an exemplary embodiment of the present disclosure. Fig. 6 shows a schematic structural diagram of a one-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure. Fig. 7 shows a schematic structural diagram of a two-dimensional scale feature-based scale prediction and verification network according to an exemplary embodiment of the present disclosure.

Referring to fig. 1, in step S101, a first target tracking result is determined based on a search area of a current frame image of a video.

In an exemplary embodiment of the present disclosure, the first target tracking result may include a first tracking envelope box.

In an exemplary embodiment of the present disclosure, the search area may be a full-view search area or an area larger than a target tracking result of a previous frame image. For example, when it is determined that the target tracking of the current frame image fails, it is necessary to newly perform target tracking on the current frame image, that is, to newly determine the target tracking result of the current frame image to retrieve the target, in which case the first target tracking result is determined based on the full-view search area or an area larger than the target tracking result of the previous frame image. In one example, when it is determined that the target tracking of the current frame image fails, the full-view search area is taken as the first target tracking result. In another example, when it is determined that the target tracking of the current frame image fails, an area larger than the target tracking result of the previous frame image is taken as the first target tracking result.

In an exemplary embodiment of the present disclosure, the search area may be determined based on a target tracking result of a previous frame image. For example, when it is determined that the target tracking of the previous frame image is successful, the target tracking result of the previous frame image is determined as the first target tracking result.

In exemplary embodiments of the present disclosure, the first target tracking result may also be adjusted based on an apparent characteristic of the first target tracking result.

In step S102, the scale of the target in the search area is predicted based on the scale features of the first target tracking result.

In exemplary embodiments of the present disclosure, the scale features of the first target tracking result may be first acquired before predicting the scale of the target in the search area based on the scale features of the first target tracking result.

In exemplary embodiments of the present disclosure, when acquiring the scale feature of the first target tracking result, a multi-scale template region of interest (Region Of Interest, simply ROI) feature may be acquired first, the ROI feature of the first target tracking result is acquired, and then the scale feature of the first target tracking result is determined based on the multi-scale template ROI feature and the ROI feature of the first target tracking result. Here, the ROI features of the first object tracking result include ROI features of at least one scale.

In exemplary embodiments of the present disclosure, before determining the scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result, the ROI features of each scale in the ROI features of the first target tracking result may also be feature aligned based on the apparent features of the first target tracking result.

For example, as shown in FIG. 2, a first frame image I of a video is first acquired ₁ Envelope frame b combined with given initial target object ₀ For image I ₁ Cutting to obtain a target object image Z, and extracting depth features F of the image Z by adopting a convolutional neural network _Z . Acquiring a t-th frame image I of a video _t Target object envelope frame predicted from previous frame (i.e., I _t-1 Envelope frame) for image I _t Cropping to obtain search region image X _t Then, a convolutional neural network is adopted to extract a search area image X _t Depth feature F of (2) _Xt . Then, predicting the target object envelope frame B of the target object in the t-th frame _t 。

Then based on the characteristic F _Z And F is equal to _Xt At envelope box B by scale feature generator _t The position is adjusted to obtain an envelope frame B _t’ . Based on envelope frame B _t’ Generating k by a scale feature generator _x Ruler for measuringAligned envelope box B of degrees _t’ ROI feature R of (2) _X1～kx Initial target object envelope b in a first video-based frame image ₀ Generating k _z Template ROI feature R of individual scale _Z1～kz . Then, feature R is processed by a scale predictor _Z1～kz And R is R _X1～kx Performing multi-scale matching to obtain scale feature F _St 。

In an exemplary embodiment of the present disclosure, when determining the scale feature of the first target tracking result based on the multi-scale template ROI feature and the ROI feature of the first target tracking result, the multi-scale template ROI feature and the ROI feature of the first target tracking result may be subjected to a correlation calculation to obtain the scale feature of the first target tracking result.

In an exemplary embodiment of the present disclosure, when performing the correlation calculation of the multi-scale template ROI feature and the ROI feature of the first target tracking result, the correlation calculation may be performed with each scale of the ROI feature of the first target tracking result, respectively, with the multi-scale template feature.

In an exemplary embodiment of the present disclosure, the ROI features of the first target tracking result may include ROI features of one scale, and the scale features of the first target tracking result may include one-dimensional scale features.

In an exemplary embodiment of the present disclosure, the ROI features of the first target tracking result may include ROI features of multiple scales, and the scale features of the first target tracking result may include two-dimensional scale features.

In an exemplary embodiment of the present disclosure, the second target tracking result includes a second tracking envelope box.

In the task in the image, performing a Correlation operation can obtain a responsiveness Y representing the degree of similarity of both images, wherein the larger the value is, the higher the degree of similarity of the corresponding position with the target object image X in the search area image Z is. The correlation calculation is shown as follows:

the correlation calculation is shown as follows:

here, Y (i, j) represents the degree of similarity of the two images X and Z, h and w represent the size of the image X, and i, j, u, v are coordinates in the images, respectively.

The scale features are related operations between features of different scales, and the correlation between the features of different scales is obtained, and is calculated as follows:

one-dimensional scale features calculate the scale association of multi-scale template features with a single scale candidate region (predicted envelope), which need only be generated once at the initialization of the tracking system. The computational cost is less. The ROI features of the predicted envelope frame can be related to k using, for example, the following formula _z And performing related operation on the ROI features of the template envelope frames with different scales to obtain one-dimensional scale features.

Here, S (S _z I, j) represents one-dimensional scale features, f _x 、f _z The predicted envelope frame and the template envelope frame are respectively the ROI features. s is(s) _z Is the scale of the ROI features of the template envelope.

Based on the one-dimensional scale features, multi-scale correlation calculations of the two-dimensional scale features extend to candidate regions of different scales (i.e., predicted envelope frames). Compared with the one-dimensional scale feature, the two-dimensional scale feature contains more scale information, which is beneficial to improving the performance of the relevant module inspired by the scale feature. The predicted k can be calculated by, for example, the following formula _x ROI features and k of envelope frames of different scales _z And performing related operation on the ROI features of the template envelope frames with different scales to obtain two-dimensional scale features.

Here, S (S _x ，s _z ，i，j) Representing two-dimensional scale features, s _x Is the scale of the ROI features of the predicted envelope.

The scale features may be generated by a scale feature generator as shown in fig. 5. As shown in fig. 5, the scale features describe the scale relatedness of the object (object) within the two envelope frames (template envelope frame, predicted envelope frame), requiring that the in-frame object (object) be as centered as possible within the envelope frame. Feature alignment is to mine feature apparent information on an original tracking result by using a convolutional neural network, and further adjust the center offset of a target object (target object) in an envelope frame. After the feature alignment, multi-scale ROI features are generated. And mining scale information in the features by using a convolutional neural network, and finally, executing multi-scale correlation among the features to generate scale features.

The one-dimensional scale can be calculated using the ROI features of the template envelope frame and the predicted ROI features of the envelope frame of the same size by the following formula:

here, S1 (S _z I, j) represents a one-dimensional scale, the dimension of the scale feature S1 being 1K _z 。

The two-dimensional scale can be calculated using the ROI features of the template envelope frame and the predicted ROI features of the envelope frame of the same size by the following formula:

here, S2 (S _x ，s _z I, j) represents a two-dimensional scale, the dimension of the scale feature S2 being k _z ×k _x 。

In an exemplary embodiment of the present disclosure, when predicting the scale of the target in the search area based on the scale features of the first target tracking result, a maximum scale response value among the scale response values included in the scale features of the first target tracking result may be first selected, and then a scale corresponding to the maximum scale response value may be predicted as the scale of the target in the search area.

In an exemplary embodiment of the present disclosure, when predicting the scale of the target in the search area based on the scale feature of the first target tracking result, the scale feature may be input into a preset convolutional neural network, to obtain the scale of the target in the search area.

In step S103, the first target tracking result is adjusted based on the scale prediction result, and the second target tracking result is obtained. Here, in the target tracking process, the first target tracking result is adjusted based on the scale prediction result, so that larger scale drift in the target tracking process can be reduced, and the target tracking effect is improved. In the re-detection process (also referred to as a re-target tracking process) after the target tracking fails, the calculation cost of the re-detection process can be reduced by adjusting the first target tracking result based on the scale prediction result.

In exemplary embodiments of the present disclosure, after the second target tracking result is obtained, it may also be determined whether the target tracking was successful based on the scale features of the first target tracking result. Here, in estimating the state of target tracking, the state of target tracking can be accurately estimated by using the scale features.

In exemplary embodiments of the present disclosure, in determining whether the target tracking is successful based on the scale features of the first target tracking result, the apparent features of the first target tracking result may be first obtained, and then whether the target tracking is successful may be determined based on the apparent features and the scale features of the first target tracking result.

For example, as shown in FIG. 2, the scale feature heuristics module is first based on the scale feature F _St Prediction envelope frame scale and confidence. The scale predictor is based on scale feature F _St Predicting the current scale; verifiers for scale feature guidance based on scale feature F _St Distribution pattern and apparent characteristic F of (2) _Xt 、F _Z Confidence in the envelope is estimated.

For example, as shown in fig. 6, based on the one-dimensional scale feature, the scale predictor judges the current scale according to the size of the scale response; and the state validator is used for respectively calculating the confidence coefficient based on the apparent information and the confidence coefficient based on the scale features according to the distribution mode of the scale features and the apparent features, and fusing and outputting the final confidence coefficient. Target tracking based on one-dimensional scale features is suitable for scenes with low calculation amount requirements.

For example, as shown in FIG. 7, based on two-dimensional scale features, a scale predictor employs a convolutional neural network to mine scale information in the scale features. And the state validator is used for respectively calculating the confidence coefficient based on the apparent information and the confidence coefficient based on the scale features according to the distribution mode of the scale features and the apparent features, and fusing and outputting the final confidence coefficient. Target tracking based on two-dimensional scale is suitable for scenes with higher performance requirements.

After performing the target object scale prediction for the current frame image of the video to track the target in steps S101 to S104, the current frame image of the video (e.g., I _t ) Is the next frame image (e.g., I _t+1 ) Target object scale prediction is performed to track the target.

With the target tracking method according to the exemplary embodiment of the present disclosure, a large scale drift occurring in the target tracking process may be reduced, thereby improving the effect of target tracking. Further, using the target tracking method according to an exemplary embodiment of the present disclosure, the calculation cost of the re-detection process may be reduced. In addition, using the target tracking method according to the exemplary embodiments of the present disclosure, the state of target tracking may be accurately estimated, improving the accuracy of target tracking.

Further, according to an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed, implements the object tracking method according to an exemplary embodiment of the present disclosure.

In an exemplary embodiment of the present disclosure, the computer-readable storage medium may carry one or more programs, which when executed, may implement the steps of: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.

The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the present disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer program embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing. The computer readable storage medium may be embodied in any device; or may exist alone without being assembled into the device.

Further, according to an exemplary embodiment of the present disclosure, a computer program product is provided, instructions in which are executable by a processor of a computer device to perform a method of object tracking according to an exemplary embodiment of the present disclosure.

The object tracking method according to the exemplary embodiment of the present disclosure has been described above in connection with fig. 1 to 7. Hereinafter, a target tracking apparatus and units thereof according to an exemplary embodiment of the present disclosure will be described with reference to fig. 8.

Fig. 8 shows a block diagram of a target tracking apparatus according to an exemplary embodiment of the present disclosure.

Referring to fig. 8, the object tracking apparatus includes an object determining unit 81, a scale predicting unit 82, and a scale adjusting unit 83.

The target determination unit 81 is configured to determine a first target tracking result based on a search area of a current frame image of the video.

In an exemplary embodiment of the present disclosure, the search area may be a full-view search area or an area larger than a target tracking result of a previous frame image.

In an exemplary embodiment of the present disclosure, the search area may be determined based on a target tracking result of a previous frame image.

In an exemplary embodiment of the present disclosure, the target tracking apparatus may further include a feature adjustment unit (not shown) configured to adjust the first target tracking result based on an apparent feature of the first target tracking result.

The scale prediction unit 82 is configured to predict the scale of the target in the search area based on the scale features of the first target tracking result.

In an exemplary embodiment of the present disclosure, the object tracking device may further include a scale feature acquisition unit (not shown) configured to acquire scale features of the first object tracking result.

In an exemplary embodiment of the present disclosure, the scale feature acquisition unit may be configured to: acquiring interesting ROI features of the multi-scale template; acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale; and determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result.

In an exemplary embodiment of the present disclosure, the object tracking apparatus may further include: a feature alignment unit (not shown) configured to perform feature alignment on each scale of the ROI features of the first target tracking result based on the apparent features of the first target tracking result.

In an exemplary embodiment of the present disclosure, the scale feature acquisition unit may be configured to: and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.

In an exemplary embodiment of the present disclosure, the scale feature acquisition unit may be configured to: and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.

The scale adjustment unit 83 is configured to adjust the first target tracking result based on the scale prediction result, resulting in a second target tracking result.

In an exemplary embodiment of the present disclosure, the second target tracking result may include a second tracking envelope box.

In an exemplary embodiment of the present disclosure, the object tracking device may further include a result checking unit (not shown) configured to determine whether the object tracking is successful based on the scale features of the first object tracking result.

In an exemplary embodiment of the present disclosure, the result checking unit may be configured to: acquiring apparent characteristics of a first target tracking result; whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.

An object tracking device according to an exemplary embodiment of the present disclosure has been described above in connection with fig. 8. Next, a computing device according to an exemplary embodiment of the present disclosure is described in connection with fig. 9.

Referring to fig. 9, a computing device 9 according to an exemplary embodiment of the present disclosure includes a memory 91 and a processor 92, the memory 91 having stored thereon a computer program which, when executed by the processor 92, implements a target tracking method according to an exemplary embodiment of the present disclosure.

In an exemplary embodiment of the present disclosure, the following steps may be implemented when the computer program is executed by the processor 92: acquiring a first target tracking result based on a search area of a current frame image of the video; predicting a scale of the target in the search area based on the scale features of the first target tracking result; and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.

Computing devices in embodiments of the present disclosure may include, but are not limited to, devices such as mobile phones, notebook computers, PDAs (personal digital assistants), PADs (tablet computers), desktop computers, and the like. The computing device illustrated in fig. 9 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure.

The object tracking method and apparatus according to the exemplary embodiments of the present disclosure have been described above with reference to fig. 1 to 9. However, it should be understood that: the object tracking device and its elements shown in fig. 8 may be configured as software, hardware, firmware, or any combination thereof, respectively, that performs a particular function, the computing device shown in fig. 9 is not limited to including the components shown above, but may add or delete some components as desired, and the above components may also be combined.

According to the target tracking method and device of the exemplary embodiment of the disclosure, the first target tracking result is obtained based on the search area of the current frame image of the video, the scale of the target in the search area is predicted based on the scale characteristics of the first target tracking result, and then the first target tracking result is adjusted based on the scale prediction result to obtain the second target tracking result, so that the target tracking effect is improved under the condition of reducing calculation consumption.

In an exemplary embodiment of the present disclosure, the object tracking method may obtain an object tracking result of a subsequent frame of the output video by taking the initialized object tracking result of a first frame of the video, as input data of the artificial intelligence model.

The artificial intelligence model may be obtained through training. Herein, "obtaining by training" refers to training a basic artificial intelligence model having a plurality of training data by a training algorithm to obtain predefined operational rules or artificial intelligence models configured to perform a desired feature (or purpose).

As an example, the artificial intelligence model may include a plurality of neural network layers. Each of the plurality of neural network layers includes a plurality of weight values, and the neural network calculation is performed by calculation between the calculation result of the previous layer and the plurality of weight values.

Visual understanding, like human vision, is a technique for identifying and processing things including, for example, object recognition, object tracking, image retrieval, human recognition, scene recognition, three-dimensional reconstruction/localization, or image enhancement.

While the present disclosure has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the following claims.

Claims

1. A target tracking method, comprising:

determining a first target tracking result based on a search area of a current frame image of the video;

predicting a scale of the target in the search area based on the scale features of the first target tracking result;

and adjusting the first target tracking result based on the scale prediction result to obtain a second target tracking result.

2. The object tracking method according to claim 1, wherein the search area is a full-view search area or an area larger than an object tracking result of a previous frame image.

3. The object tracking method according to claim 1, wherein the search area is determined based on an object tracking result of a previous frame image.

4. The target tracking method of claim 1, wherein the first target tracking result comprises a first tracking envelope box and the second target tracking result comprises a second tracking envelope box.

5. The target tracking method according to claim 1, further comprising:

whether the target tracking is successful is determined based on the scale features of the first target tracking result.

6. The method of claim 5, wherein determining whether the target tracking was successful based on the scale characteristics of the first target tracking result comprises:

acquiring apparent characteristics of a first target tracking result;

whether the target tracking is successful is determined based on the apparent features and the scale features of the first target tracking result.

7. The target tracking method of claim 6, wherein prior to predicting the scale of the target in the search area based on the scale features of the first target tracking result, the target tracking method further comprises:

and acquiring the scale characteristics of the first target tracking result.

8. The method of claim 7, wherein obtaining the scale feature of the first target tracking result comprises:

acquiring a region of interest (ROI) characteristic of a multi-scale template;

acquiring ROI features of a first target tracking result, wherein the ROI features of the first target tracking result comprise ROI features of at least one scale;

9. The target tracking method of claim 8, wherein prior to determining the scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result, the target tracking method further comprises:

feature alignment is performed on the ROI features of each scale in the ROI features of the first target tracking result based on the apparent features of the first target tracking result.

10. The target tracking method of claim 8, wherein determining scale features of the first target tracking result based on the multi-scale template ROI features and the ROI features of the first target tracking result comprises:

and performing related calculation on the multi-scale template ROI features and the ROI features of the first target tracking result to obtain scale features of the first target tracking result.

11. The object tracking method of claim 10, wherein performing a correlation calculation of the multi-scale template ROI feature and the ROI feature of the first object tracking result comprises:

and respectively carrying out correlation calculation on the ROI characteristics of each scale in the ROI characteristics of the first target tracking result and the multi-scale template characteristics.

12. The method of claim 8, wherein the ROI features of the first target tracking result comprise ROI features of one scale and the scale features of the first target tracking result comprise one-dimensional scale features.

13. The object tracking method of claim 8 wherein the ROI features of the first object tracking result comprise ROI features of a plurality of scales and the scale features of the first object tracking result comprise two-dimensional scale features.

14. The target tracking method of claim 6, further comprising:

the first target tracking result is adjusted based on the apparent characteristics of the first target tracking result.

15. An object tracking device, characterized in that the object tracking device comprises:

a target determination unit configured to determine a first target tracking result based on a search area of a current frame image of the video;

a scale prediction unit configured to predict a scale of a target in the search area based on a scale feature of the first target tracking result;

and the scale adjustment unit is configured to adjust the first target tracking result based on the scale prediction result to obtain a second target tracking result.

16. The object tracking device of claim 15, wherein the search area is a full-view search area or an area larger than an object tracking result of a previous frame image.

17. The object tracking device of claim 15 wherein the search area is determined based on an object tracking result of a previous frame image.

18. The object tracking device of claim 15 wherein the first object tracking result comprises a first tracking envelope box and the second object tracking result comprises a second tracking envelope box.

19. The object tracking device of claim 15, further comprising:

and a result checking unit configured to determine whether the target tracking is successful based on the scale features of the first target tracking result.

20. The object tracking device of claim 19, wherein the result checking unit is configured to:

acquiring apparent characteristics of a first target tracking result;

21. The object tracking device of claim 20, further comprising:

and the scale feature acquisition unit is configured to acquire the scale feature of the first target tracking result.

22. The object tracking device of claim 21, wherein the scale feature acquisition unit is configured to:

acquiring a multi-scale template ROI feature;

23. The object tracking device of claim 22, further comprising:

and a feature alignment unit configured to perform feature alignment on the ROI feature of each scale among the ROI features of the first target tracking result based on the apparent feature of the first target tracking result.

24. The object tracking device of claim 22, wherein the scale feature acquisition unit is configured to:

25. The object tracking device of claim 24, wherein the scale feature acquisition unit is configured to:

26. The object tracking device of claim 22 wherein the ROI features of the first object tracking result comprise ROI features of one dimension and the scale features of the first object tracking result comprise one-dimensional scale features.

27. The object tracking device of claim 22 wherein the ROI features of the first object tracking result comprise ROI features of a plurality of dimensions and the scale features of the first object tracking result comprise two-dimensional scale features.

28. The object tracking device of claim 22, further comprising:

and a feature adjustment unit configured to adjust the first target tracking result based on the apparent feature of the first target tracking result.

29. A computer readable storage medium storing a computer program, characterized in that the object tracking method according to any one of claims 1-14 is implemented when the computer program is executed by a processor.

30. A computing device, the computing device comprising:

at least one processor;

at least one memory storing a computer program which, when executed by the at least one processor, implements the object tracking method of any one of claims 1-14.