CN112085763A - Visual tracking method and device based on target self-adaptive initialization - Google Patents

Visual tracking method and device based on target self-adaptive initialization Download PDF

Info

Publication number
CN112085763A
CN112085763A CN202010838903.2A CN202010838903A CN112085763A CN 112085763 A CN112085763 A CN 112085763A CN 202010838903 A CN202010838903 A CN 202010838903A CN 112085763 A CN112085763 A CN 112085763A
Authority
CN
China
Prior art keywords
target object
position information
video sequence
information
initial frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010838903.2A
Other languages
Chinese (zh)
Inventor
宋旭博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Moviebook Technology Corp ltd
Original Assignee
Beijing Moviebook Technology Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Moviebook Technology Corp ltd filed Critical Beijing Moviebook Technology Corp ltd
Priority to CN202010838903.2A priority Critical patent/CN112085763A/en
Publication of CN112085763A publication Critical patent/CN112085763A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The application discloses a visual tracking method and device based on target self-adaptive initialization. The method comprises the following steps: acquiring position information of a target object in an initial frame of a video sequence; acquiring the characteristic information of the target object in the video sequence according to the position information of the initial frame; and realizing the visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information. The device comprises: the device comprises an initialization module, a feature extraction module and a target detection module. According to the method and the device, the stable tracking characteristics are learned, and when the tracking object is selected from the initial frame, the region meeting the stable characteristics is selected for tracking, so that a stable tracking result is achieved.

Description

Visual tracking method and device based on target self-adaptive initialization
Technical Field
The present application relates to image recognition technologies, and in particular, to a visual tracking method and apparatus based on target adaptive initialization.
Background
Video tracking refers to the problem of determining the location, path and characteristics of a particular object in a sequence of images from a video. This is an active research topic in the field of computer vision, with applications in many practical applications, such as surveillance, security and human-computer interaction. The traditional visual tracking method is divided into two types, namely generative visual tracking and discriminant visual tracking. The purpose of generative visual tracking is to generate a representative appearance of the target and use them to find the target region in the upcoming frame. In another type, discriminative visual tracking is mainly to distinguish the foreground from the background of each frame.
In recent years, a visual tracker based on correlation filtering has gained considerable popularity in visual tracking methods, which have been highly successful in real-time tracking due to their high computational efficiency, while also yielding accurate tracking results. Bibi et al propose an advanced correlation filter using multiple templates, kernels and multidimensional characteristics. Suimail et al enhanced the correlation filter with three sparse correlation loss functions by alleviating the over-fitting problem of conventional correlation filters. Zhang et al propose a new visual tracking method based on online learning, which marks unmarked samples in the visual tracking process. The performance of this approach far exceeds that of other advanced trackers. The visual tracking based on deep learning and based on the Convolutional Neural Network (CNN) visual tracker shows excellent performance in the aspect of visual tracking. Held et al have proposed a Siamese network for visual tracking, and this network can extract abundant target characteristics fast.
However, the generative visual tracking method is prone to cause target loss during target tracking due to sudden change of light and deformation of a target object. Filter-based approaches do not take full advantage of the information of the initial frame to establish a good initial configuration. The neural network based approach is to create a more accurate tracking effect by sacrificing speed.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to an aspect of the present application, there is provided a visual tracking method based on target adaptive initialization, including:
acquiring position information of a target object in an initial frame of a video sequence;
acquiring the characteristic information of the target object in the video sequence according to the position information of the initial frame;
and realizing the visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information.
Preferably, the location information includes: center position information, width information, and height information of the target object.
Preferably, the acquiring the position information of the target object in the initial frame of the video sequence comprises:
determining a plurality of pieces of initial setting position information of the target object under the condition of disturbance in an initial frame, and determining the central position variance, the width variance and the height variance corresponding to the plurality of pieces of initial setting position information;
and obtaining the most stable initial setting position information as the position information of the target object in the initial frame by comparing the plurality of initial setting position information.
Preferably, the acquiring the feature information of the target object in the video sequence comprises:
respectively expressing the peak condition, the steepness and the distribution height of the distribution of the target object in a function establishing mode by obtaining the position information of the target object under the disturbance condition;
and adding the peak value, the steep value and the height value in the target object in the video sequence to serve as generated characteristic information.
Preferably, the method further comprises:
and carrying out graying processing on the video sequence.
Preferably, the target object satisfies a normal distribution in a disturbance situation.
In another aspect, the present invention further provides a visual tracking apparatus based on target adaptive initialization, including:
the initialization module is used for acquiring the position information of a target object in an initial frame of a video sequence;
the characteristic extraction module is set to collect the characteristic information of the target object in the video sequence according to the position information of the initial frame;
and the target detection module is arranged for realizing visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information.
Preferably, the acquiring, by the initialization module, the position information of the target object in the initial frame of the video sequence includes:
determining a plurality of pieces of initial setting position information of the target object under the condition of disturbance in an initial frame, and determining the central position variance, the width variance and the height variance corresponding to the plurality of pieces of initial setting position information;
and obtaining the most stable initial setting position information as the position information of the target object in the initial frame by comparing the plurality of initial setting position information.
Preferably, the acquiring, by the feature extraction module, feature information of the target object in the video sequence includes:
respectively expressing the peak condition, the steepness and the distribution height of the distribution of the target object in a function establishing mode by obtaining the position information of the target object under the disturbance condition;
and adding the peak value, the steep value and the height value in the target object in the video sequence to serve as generated characteristic information.
Preferably, the device further comprises:
a graying module configured to perform graying processing on the video sequence.
According to the visual tracking method and device based on target self-adaption initialization, the stable tracking characteristics are learned, and when a tracking object is selected from an initial frame, an area meeting the stable characteristics is selected for tracking, so that a stable tracking result is achieved.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a schematic flow chart diagram of a visual tracking method based on target adaptive initialization according to one embodiment of the present application;
FIG. 2 is a block diagram of a schematic structure of a visual tracking apparatus based on target adaptive initialization according to an embodiment of the present application;
FIG. 3 is a block schematic diagram of a computing device according to one embodiment of the present application;
FIG. 4 is a block diagram of a schematic structure of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
Fig. 1 is a schematic flow chart diagram of a visual tracking method based on target adaptive initialization according to an embodiment of the present application. The visual tracking method based on target adaptive initialization generally comprises the following steps:
s1, acquiring the position information of the target object in the initial frame of the video sequence;
s2, collecting the characteristic information of the target object in the video sequence according to the position information of the initial frame;
and S3, visually tracking the target object by using a full convolution network tracker according to the characteristic information.
For advanced computer vision systems, such as autopilots, social robots, surveillance systems and virtual reality systems, a visual tracking method is necessary as a processing step. However, existing visual tracking methods have many limitations, one key limitation being that the visual tracking accuracy depends largely on the initial configuration (e.g., position and size) of the target. The embodiment of the invention provides a tracking method adaptive to target initialization. By learning the stably tracked features, when a tracked object is selected in an initial frame, an area satisfying the stable features is selected for tracking, so that a stable tracking result is achieved.
In the embodiment of the present invention, the location information includes: center position information, width information, and height information of the target object.
In this embodiment of the present invention, the acquiring, in step S1, the position information of the target object in the initial frame of the video sequence includes:
determining a plurality of pieces of initial setting position information of the target object under the condition of disturbance in an initial frame, and determining the central position variance, the width variance and the height variance corresponding to the plurality of pieces of initial setting position information;
and obtaining the most stable initial setting position information as the position information of the target object in the initial frame by comparing the plurality of initial setting position information.
In this embodiment of the present invention, the acquiring, in step S2, the feature information of the target object in the video sequence includes:
respectively expressing the peak condition, the steepness and the distribution height of the distribution of the target object in a function establishing mode by obtaining the position information of the target object under the disturbance condition;
and adding the peak value, the steep value and the height value in the target object in the video sequence to serve as generated characteristic information.
In the embodiment of the invention, the target object in the target initialization process is the target with the best position and size, and the most stable and accurate tracking result can be generated by the visual tracker. Stable visual tracking results require that the visual tracking method can produce similar tracking results even if the initial configuration of the target object is disturbed to some extent in space. Therefore, in order to find stable position information of the target object in the initial frame, the embodiment of the present invention utilizes the likelihood values of the image blocks, that is, a structurally stable image can be described by a distribution. And applies a criterion: the stable position information of an initial frame should have: a unimodal distribution; a steep shape; the height of the distribution should be high.
In the embodiment of the present invention, the method further includes:
and carrying out graying processing on the video sequence.
In the embodiment of the invention, the target object meets normal distribution under the condition of disturbance.
The following describes in detail a visual tracking method based on target adaptive initialization according to an embodiment of the present invention:
first, it is necessary to initialize the configuration of the target object at an initial frame, the configuration including the position information of the target object. The embodiment of the invention focuses on the influence of different configurations of target object initialization on the subsequent tracking stability. For video tracking problems, it is often necessary to select or choose an area in the initial frame as the target area for tracking. According to the embodiment of the invention, the self-adaptive selection is carried out after the selected target area through the preset principle, so that a more stably tracked target is obtained.
Initially, a tracking target is selected to obtain the information of the center position, width and height of a tracking frame
X(i)={X(i) c,X(i) w,X(i) h}
Then, noise (gaussian noise) is taken into account in the coordinates of the initial setting position information, thus creating several different configurations around the initial setting position information. Considering gaussian noise
X(i)={X(i) c+Zc,X(i) w+Zw,X(i) h+Zh}
Wherein Zc、Zw、ZhObeying a normal distribution with a mean value of 0 and a variance of σc 2、σw 2、σh 2
This corresponds to a large number of pieces of initial setting position information at similar positions. A more stable configuration is obtained as the position information of the target object in the initial frame by comparing the initial setting position information of these positions.
By obtaining the central point position, the length and the width of the disturbance conditions, the peak condition, the steepness and the distribution height of the distribution are respectively expressed by establishing a function. And respectively calculating the comprehensive situation of each group through an argmax function, and selecting the group with the highest sum of the three values as a self-generated new target configuration. The Full Convolution Network Tracker (FCNT) proposed by Wang et al is applied as a visual tracking method in the subsequent process. It selects the appropriate feature map to remove noise and uncorrelated features. For visual tracking, first an adaptive initial target object location information is obtained by the above method, this initialization information is used, and FCNT is applied to obtain better tracking results.
FIG. 2 is a block diagram of a schematic structure of a visual tracking apparatus based on target adaptive initialization according to an embodiment of the present application. The visual tracking device based on target adaptive initialization comprises:
the initialization module is used for acquiring the position information of a target object in an initial frame of a video sequence;
the characteristic extraction module is set to collect the characteristic information of the target object in the video sequence according to the position information of the initial frame;
and the target detection module is arranged for realizing visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information.
In the embodiment of the present invention, the acquiring, by the initialization module, the position information of the target object in the initial frame of the video sequence includes:
determining a plurality of pieces of initial setting position information of the target object under the condition of disturbance in an initial frame, and determining the central position variance, the width variance and the height variance corresponding to the plurality of pieces of initial setting position information;
and obtaining the most stable initial setting position information as the position information of the target object in the initial frame by comparing the plurality of initial setting position information.
In an embodiment of the present invention, the acquiring, by the feature extraction module, the feature information of the target object in the video sequence includes:
respectively expressing the peak condition, the steepness and the distribution height of the distribution of the target object in a function establishing mode by obtaining the position information of the target object under the disturbance condition;
and adding the peak value, the steep value and the height value in the target object in the video sequence to serve as generated characteristic information.
In the embodiment of the present invention, the apparatus further includes:
a graying module configured to perform graying processing on the video sequence.
Embodiments also provide a computing device, referring to fig. 3, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program, when executed by the processor 1110, implementing the method steps 1131 for performing any of the methods according to the invention.
The embodiment of the application also provides a computer readable storage medium. Referring to fig. 4, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.
The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A visual tracking method based on target adaptive initialization comprises the following steps:
acquiring position information of a target object in an initial frame of a video sequence;
acquiring the characteristic information of the target object in the video sequence according to the position information of the initial frame;
and realizing the visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information.
2. The method of claim 1, wherein the location information comprises: center position information, width information, and height information of the target object.
3. The method of claim 1 or 2, wherein obtaining position information of the target object in the initial frame of the video sequence comprises:
determining a plurality of pieces of initial setting position information of the target object under the condition of disturbance in an initial frame, and determining the central position variance, the width variance and the height variance corresponding to the plurality of pieces of initial setting position information;
and obtaining the most stable initial setting position information as the position information of the target object in the initial frame by comparing the plurality of initial setting position information.
4. The method of claim 3, wherein acquiring feature information of the target object in the video sequence comprises:
respectively expressing the peak condition, the steepness and the distribution height of the distribution of the target object in a function establishing mode by obtaining the position information of the target object under the disturbance condition;
and adding the peak value, the steep value and the height value in the target object in the video sequence to serve as generated characteristic information.
5. The method according to claim 1 or 2, characterized in that the method further comprises:
and carrying out graying processing on the video sequence.
6. The method of claim 3, wherein the target object satisfies a normal distribution under perturbation conditions.
7. A visual tracking apparatus based on target adaptive initialization, comprising:
the initialization module is used for acquiring the position information of a target object in an initial frame of a video sequence;
the characteristic extraction module is set to collect the characteristic information of the target object in the video sequence according to the position information of the initial frame;
and the target detection module is arranged for realizing visual tracking of the target object by utilizing a full convolution network tracker according to the characteristic information.
8. The apparatus of claim 7, wherein the initialization module obtaining the position information of the target object in the initial frame of the video sequence comprises:
determining a plurality of pieces of initial setting position information of the target object under the condition of disturbance in an initial frame, and determining the central position variance, the width variance and the height variance corresponding to the plurality of pieces of initial setting position information;
and obtaining the most stable initial setting position information as the position information of the target object in the initial frame by comparing the plurality of initial setting position information.
9. The apparatus of claim 8, wherein the feature extraction module acquiring feature information of the target object in the video sequence comprises:
respectively expressing the peak condition, the steepness and the distribution height of the distribution of the target object in a function establishing mode by obtaining the position information of the target object under the disturbance condition;
and adding the peak value, the steep value and the height value in the target object in the video sequence to serve as generated characteristic information.
10. The apparatus of claim 7, further comprising:
a graying module configured to perform graying processing on the video sequence.
CN202010838903.2A 2020-08-19 2020-08-19 Visual tracking method and device based on target self-adaptive initialization Pending CN112085763A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010838903.2A CN112085763A (en) 2020-08-19 2020-08-19 Visual tracking method and device based on target self-adaptive initialization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010838903.2A CN112085763A (en) 2020-08-19 2020-08-19 Visual tracking method and device based on target self-adaptive initialization

Publications (1)

Publication Number Publication Date
CN112085763A true CN112085763A (en) 2020-12-15

Family

ID=73728562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010838903.2A Pending CN112085763A (en) 2020-08-19 2020-08-19 Visual tracking method and device based on target self-adaptive initialization

Country Status (1)

Country Link
CN (1) CN112085763A (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991280A (en) * 2019-11-20 2020-04-10 北京影谱科技股份有限公司 Video tracking method and device based on template matching and SURF

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991280A (en) * 2019-11-20 2020-04-10 北京影谱科技股份有限公司 Video tracking method and device based on template matching and SURF

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUISIK KIM等: ""Robust visual tracking with adaptive initial configuration and likelihood landscape analysis"", 《IET COMPUTER VISION》, vol. 13, no. 1, pages 2 - 3 *

Similar Documents

Publication Publication Date Title
JP6694829B2 (en) Rule-based video importance analysis
CN109446889B (en) Object tracking method and device based on twin matching network
Sabuhi et al. Applications of generative adversarial networks in anomaly detection: a systematic literature review
JP5604256B2 (en) Human motion detection device and program thereof
CN110853033A (en) Video detection method and device based on inter-frame similarity
Tian et al. Scene Text Detection in Video by Learning Locally and Globally.
CN113140005A (en) Target object positioning method, device, equipment and storage medium
CN106033613B (en) Method for tracking target and device
JP2010257267A (en) Device, method and program for detecting object area
Günay et al. Real-time dynamic texture recognition using random sampling and dimension reduction
CN112085763A (en) Visual tracking method and device based on target self-adaptive initialization
US11647294B2 (en) Panoramic video data process
CN110263196B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
Bakhat et al. Human activity recognition based on an amalgamation of CEV & SGM features
Sarcar et al. Detecting violent arm movements using cnn-lstm
CN113673550A (en) Clustering method, clustering device, electronic equipment and computer-readable storage medium
Kaddar et al. Deepfake Detection Using Spatiotemporal Transformer
CN113129332A (en) Method and apparatus for performing target object tracking
CN113658217A (en) Adaptive target tracking method, device and storage medium
CN111275692B (en) Infrared small target detection method based on generation countermeasure network
Wu A Study on the Intelligent Application of the Code and the Contour of the Image Feature
CN117333926B (en) Picture aggregation method and device, electronic equipment and readable storage medium
CN113920159B (en) Infrared air small and medium target tracking method based on full convolution twin network
CN113052853B (en) Video target tracking method and device in complex environment
WO2021193352A1 (en) Image tracking device, image tracking method, and computer-readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination