CN112085763A

CN112085763A - Visual tracking method and device based on target self-adaptive initialization

Info

Publication number: CN112085763A
Application number: CN202010838903.2A
Authority: CN
Inventors: 宋旭博
Original assignee: Beijing Moviebook Technology Corp ltd
Current assignee: Beijing Moviebook Technology Corp ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2020-12-15

Abstract

The application discloses a visual tracking method and device based on target self-adaptive initialization. The method comprises the following steps: acquiring position information of a target object in an initial frame of a video sequence; acquiring the characteristic information of the target object in the video sequence according to the position information of the initial frame; and realizing the visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information. The device comprises: the device comprises an initialization module, a feature extraction module and a target detection module. According to the method and the device, the stable tracking characteristics are learned, and when the tracking object is selected from the initial frame, the region meeting the stable characteristics is selected for tracking, so that a stable tracking result is achieved.

Description

Visual tracking method and device based on target self-adaptive initialization

Technical Field

The present application relates to image recognition technologies, and in particular, to a visual tracking method and apparatus based on target adaptive initialization.

Background

Video tracking refers to the problem of determining the location, path and characteristics of a particular object in a sequence of images from a video. This is an active research topic in the field of computer vision, with applications in many practical applications, such as surveillance, security and human-computer interaction. The traditional visual tracking method is divided into two types, namely generative visual tracking and discriminant visual tracking. The purpose of generative visual tracking is to generate a representative appearance of the target and use them to find the target region in the upcoming frame. In another type, discriminative visual tracking is mainly to distinguish the foreground from the background of each frame.

In recent years, a visual tracker based on correlation filtering has gained considerable popularity in visual tracking methods, which have been highly successful in real-time tracking due to their high computational efficiency, while also yielding accurate tracking results. Bibi et al propose an advanced correlation filter using multiple templates, kernels and multidimensional characteristics. Suimail et al enhanced the correlation filter with three sparse correlation loss functions by alleviating the over-fitting problem of conventional correlation filters. Zhang et al propose a new visual tracking method based on online learning, which marks unmarked samples in the visual tracking process. The performance of this approach far exceeds that of other advanced trackers. The visual tracking based on deep learning and based on the Convolutional Neural Network (CNN) visual tracker shows excellent performance in the aspect of visual tracking. Held et al have proposed a Siamese network for visual tracking, and this network can extract abundant target characteristics fast.

However, the generative visual tracking method is prone to cause target loss during target tracking due to sudden change of light and deformation of a target object. Filter-based approaches do not take full advantage of the information of the initial frame to establish a good initial configuration. The neural network based approach is to create a more accurate tracking effect by sacrificing speed.

Disclosure of Invention

It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.

According to an aspect of the present application, there is provided a visual tracking method based on target adaptive initialization, including:

acquiring position information of a target object in an initial frame of a video sequence;

acquiring the characteristic information of the target object in the video sequence according to the position information of the initial frame;

and realizing the visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information.

Preferably, the location information includes: center position information, width information, and height information of the target object.

Preferably, the acquiring the position information of the target object in the initial frame of the video sequence comprises:

determining a plurality of pieces of initial setting position information of the target object under the condition of disturbance in an initial frame, and determining the central position variance, the width variance and the height variance corresponding to the plurality of pieces of initial setting position information;

and obtaining the most stable initial setting position information as the position information of the target object in the initial frame by comparing the plurality of initial setting position information.

Preferably, the acquiring the feature information of the target object in the video sequence comprises:

respectively expressing the peak condition, the steepness and the distribution height of the distribution of the target object in a function establishing mode by obtaining the position information of the target object under the disturbance condition;

and adding the peak value, the steep value and the height value in the target object in the video sequence to serve as generated characteristic information.

Preferably, the method further comprises:

and carrying out graying processing on the video sequence.

Preferably, the target object satisfies a normal distribution in a disturbance situation.

In another aspect, the present invention further provides a visual tracking apparatus based on target adaptive initialization, including:

the initialization module is used for acquiring the position information of a target object in an initial frame of a video sequence;

the characteristic extraction module is set to collect the characteristic information of the target object in the video sequence according to the position information of the initial frame;

and the target detection module is arranged for realizing visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information.

Preferably, the acquiring, by the initialization module, the position information of the target object in the initial frame of the video sequence includes:

Preferably, the acquiring, by the feature extraction module, feature information of the target object in the video sequence includes:

Preferably, the device further comprises:

a graying module configured to perform graying processing on the video sequence.

According to the visual tracking method and device based on target self-adaption initialization, the stable tracking characteristics are learned, and when a tracking object is selected from an initial frame, an area meeting the stable characteristics is selected for tracking, so that a stable tracking result is achieved.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

FIG. 1 is a schematic flow chart diagram of a visual tracking method based on target adaptive initialization according to one embodiment of the present application;

FIG. 2 is a block diagram of a schematic structure of a visual tracking apparatus based on target adaptive initialization according to an embodiment of the present application;

FIG. 3 is a block schematic diagram of a computing device according to one embodiment of the present application;

FIG. 4 is a block diagram of a schematic structure of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Fig. 1 is a schematic flow chart diagram of a visual tracking method based on target adaptive initialization according to an embodiment of the present application. The visual tracking method based on target adaptive initialization generally comprises the following steps:

s1, acquiring the position information of the target object in the initial frame of the video sequence;

s2, collecting the characteristic information of the target object in the video sequence according to the position information of the initial frame;

and S3, visually tracking the target object by using a full convolution network tracker according to the characteristic information.

For advanced computer vision systems, such as autopilots, social robots, surveillance systems and virtual reality systems, a visual tracking method is necessary as a processing step. However, existing visual tracking methods have many limitations, one key limitation being that the visual tracking accuracy depends largely on the initial configuration (e.g., position and size) of the target. The embodiment of the invention provides a tracking method adaptive to target initialization. By learning the stably tracked features, when a tracked object is selected in an initial frame, an area satisfying the stable features is selected for tracking, so that a stable tracking result is achieved.

In the embodiment of the present invention, the location information includes: center position information, width information, and height information of the target object.

In this embodiment of the present invention, the acquiring, in step S1, the position information of the target object in the initial frame of the video sequence includes:

In this embodiment of the present invention, the acquiring, in step S2, the feature information of the target object in the video sequence includes:

In the embodiment of the invention, the target object in the target initialization process is the target with the best position and size, and the most stable and accurate tracking result can be generated by the visual tracker. Stable visual tracking results require that the visual tracking method can produce similar tracking results even if the initial configuration of the target object is disturbed to some extent in space. Therefore, in order to find stable position information of the target object in the initial frame, the embodiment of the present invention utilizes the likelihood values of the image blocks, that is, a structurally stable image can be described by a distribution. And applies a criterion: the stable position information of an initial frame should have: a unimodal distribution; a steep shape; the height of the distribution should be high.

In the embodiment of the present invention, the method further includes:

and carrying out graying processing on the video sequence.

In the embodiment of the invention, the target object meets normal distribution under the condition of disturbance.

The following describes in detail a visual tracking method based on target adaptive initialization according to an embodiment of the present invention:

first, it is necessary to initialize the configuration of the target object at an initial frame, the configuration including the position information of the target object. The embodiment of the invention focuses on the influence of different configurations of target object initialization on the subsequent tracking stability. For video tracking problems, it is often necessary to select or choose an area in the initial frame as the target area for tracking. According to the embodiment of the invention, the self-adaptive selection is carried out after the selected target area through the preset principle, so that a more stably tracked target is obtained.

Initially, a tracking target is selected to obtain the information of the center position, width and height of a tracking frame

X⁽ⁱ⁾＝{X⁽ⁱ⁾ _c，X⁽ⁱ⁾ _w，X⁽ⁱ⁾ _h}

Then, noise (gaussian noise) is taken into account in the coordinates of the initial setting position information, thus creating several different configurations around the initial setting position information. Considering gaussian noise

X⁽ⁱ⁾＝{X⁽ⁱ⁾ _c+Z_c，X⁽ⁱ⁾ _w+Z_w，X⁽ⁱ⁾ _h+Z_h}

Wherein Z_c、Z_w、Z_hObeying a normal distribution with a mean value of 0 and a variance of σ_c ²、σ_w ²、σ_h ²。

This corresponds to a large number of pieces of initial setting position information at similar positions. A more stable configuration is obtained as the position information of the target object in the initial frame by comparing the initial setting position information of these positions.

By obtaining the central point position, the length and the width of the disturbance conditions, the peak condition, the steepness and the distribution height of the distribution are respectively expressed by establishing a function. And respectively calculating the comprehensive situation of each group through an argmax function, and selecting the group with the highest sum of the three values as a self-generated new target configuration. The Full Convolution Network Tracker (FCNT) proposed by Wang et al is applied as a visual tracking method in the subsequent process. It selects the appropriate feature map to remove noise and uncorrelated features. For visual tracking, first an adaptive initial target object location information is obtained by the above method, this initialization information is used, and FCNT is applied to obtain better tracking results.

FIG. 2 is a block diagram of a schematic structure of a visual tracking apparatus based on target adaptive initialization according to an embodiment of the present application. The visual tracking device based on target adaptive initialization comprises:

In the embodiment of the present invention, the acquiring, by the initialization module, the position information of the target object in the initial frame of the video sequence includes:

In an embodiment of the present invention, the acquiring, by the feature extraction module, the feature information of the target object in the video sequence includes:

In the embodiment of the present invention, the apparatus further includes:

Embodiments also provide a computing device, referring to fig. 3, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.

The embodiment of the application also provides a computer readable storage medium. Referring to fig. 4, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.

The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A visual tracking method based on target adaptive initialization comprises the following steps:

2. The method of claim 1, wherein the location information comprises: center position information, width information, and height information of the target object.

3. The method of claim 1 or 2, wherein obtaining position information of the target object in the initial frame of the video sequence comprises:

4. The method of claim 3, wherein acquiring feature information of the target object in the video sequence comprises:

5. The method according to claim 1 or 2, characterized in that the method further comprises:

and carrying out graying processing on the video sequence.

6. The method of claim 3, wherein the target object satisfies a normal distribution under perturbation conditions.

7. A visual tracking apparatus based on target adaptive initialization, comprising:

8. The apparatus of claim 7, wherein the initialization module obtaining the position information of the target object in the initial frame of the video sequence comprises:

9. The apparatus of claim 8, wherein the feature extraction module acquiring feature information of the target object in the video sequence comprises:

10. The apparatus of claim 7, further comprising: