CN117788524A

CN117788524A - Plane target tracking method, device, equipment and medium based on multitask learning

Info

Publication number: CN117788524A
Application number: CN202410003226.0A
Authority: CN
Inventors: 杨英仪; 刘钊; 蔡笋; 赵兵; 梅鹏; 李文胜
Original assignee: Electric Power Research Institute of Guangdong Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Guangdong Power Grid Co Ltd
Priority date: 2024-01-02
Filing date: 2024-01-02
Publication date: 2024-03-29

Abstract

The application provides a plane target tracking method based on multi-task learning, which is characterized in that features of a reference frame and a target frame are respectively extracted through a feature extraction module, a correlation body is obtained after the features are subjected to correlation filtering operation, and the correlation body is used as input of an optical flow estimation module to estimate dense optical flow. And then aligning the features of the reference frame to a target frame space through grid sampling operation by using an optical flow result, wherein the aligned features are used for estimating a plane mask by a mask estimating module, filtering by using the mask to obtain a dense and reliable optical flow result, and finally robustly estimating a homography matrix by using a DLT method. Therefore, the method and the device jointly estimate the dense optical flow, the plane mask and the homography matrix, namely the number of matching points ensured by the dense optical flow, the plane mask filters interference such as background and shielding, and the plane target can be tracked in a robust way through the dense and reliable optical flow, so that not only is the shielding and blurring resistance of the model improved, but also the plane target tracking precision is improved.

Description

Plane target tracking method, device, equipment and medium based on multitask learning

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to a planar target tracking method, device, computer device, and computer readable storage medium based on multi-task learning.

Background

Planar target tracking is a basic computer vision technology for detecting and tracking planar target motion in video sequences, generally, planar targets refer to objects which can be approximately regarded as planar objects, such as posters or computer screens, and related researches are carried out in the early 20 th century, and the planar target tracking technology is widely applied to the fields of augmented reality, visual servoing, industrial automation and the like.

Planar target tracking algorithms based on deep learning typically comprise four major parts: 1) Extracting image features; 2) Constructing a characteristic correlation body; 3) Optical flow estimation; 4) Homography matrix estimation. Image feature extraction aims to extract robust features from an input source image and a reference image by means of learning. Feature correlators are built as a pre-step of matching, which can be understood as the degree of difference between two features, often used to estimate the correlation between different locations in two different images. The optical flow estimation step is a crucial step, directly influences the accuracy of homography matrix estimation, and estimates the optical flow of each pixel or plane vertex in a regression mode. Homography matrix estimation utilizes optical flow results to estimate homography matrices from reference frames to target frames using a direct linear transformation algorithm.

In recent years, due to the rapid development of deep learning, the related research of the planar target tracking algorithm obtains a favorite result, the performance of the planar target tracking algorithm is far superior to that of the traditional method, and the planar target tracking algorithm can be used for tracking robustly under some complex conditions. There is room for improvement in the network model of existing approaches.

The existing method generally relies on feature consistency to match and track, but strong surface changes of a planar target caused by factors such as blurring, illumination changes, shielding and the like often violate consistency preconditions, so that tracking fails. The reason is elaborated, and most of the existing models are single-task learning, so that robust feature representation under complex conditions is difficult to learn.

Therefore, how to realize robust tracking under the conditions of shielding and blurring and improve the tracking precision of a planar target is a technical problem to be solved urgently at present.

Disclosure of Invention

The application provides a planar target tracking method, a planar target tracking device, a planar target tracking computer device and a planar target tracking computer readable storage medium based on multi-task learning so as to solve the technical problem that planar target tracking accuracy is low under the condition of shielding and blurring.

In a first aspect, an embodiment of the present application provides a planar target tracking method based on multitask learning, where the planar target tracking method includes:

Extracting reference frame characteristics of a reference frame and target frame characteristics of a target frame by a characteristic extraction module in the planar target tracking model;

constructing a correlation body of the reference frame and the target frame based on the reference frame characteristics and the target frame characteristics through an optical flow estimation module in the plane target tracking model, and carrying out optical flow estimation from the reference frame to the target frame based on the correlation body to obtain a current optical flow result;

performing network sampling on the reference region features based on the current optical flow result through a mask segmentation module in the planar target tracking model to obtain reference frame features after the reference frames are aligned; splicing the aligned reference frame features and the target region features in the space of the target frame to obtain a target mask;

and determining a homography matrix for planar target tracking based on the target mask and the current optical flow result through a linear transformation module in the planar target tracking model.

Further, the extracting, by the feature extracting module in the planar target tracking model, the reference frame feature of the reference frame and the target frame feature of the target frame includes:

Determining the interest area of the reference frame and the interest area of the target frame through a plane sampling module in the plane target tracking model;

and extracting a reference region characteristic of the region of interest of the reference frame as the reference frame characteristic and extracting a target region characteristic of the region of interest of the target frame as the target frame characteristic through the characteristic extraction module.

Further, before extracting the reference frame features of the reference frame and the target frame features of the target frame, the feature extraction module in the planar target tracking model further includes:

constructing a simulation scene based on at least one of preset environmental parameters, preset plane parameters and preset camera parameters;

constructing a training data set based on a moving image set of a target in the simulation scene;

and training to generate the planar target tracking model based on the training data set.

Further, before constructing the simulation scene, the method includes:

setting a physical environment of the simulation scene to obtain the preset environment parameters through a panoramic image in a preset image library, wherein the preset environment parameters comprise illumination conditions and physical backgrounds;

Decoupling a plane into an initial plane, a material and a set of maps, and obtaining the preset plane parameters based on length and width parameters and orientation parameters of the initial plane, type information of the material and type information of the maps;

the preset camera parameters are obtained by setting camera characteristic parameters and camera motion parameters, wherein the camera characteristic parameters comprise resolution parameters, cone-of-view parameters, focus parameters, depth of field parameters and aperture parameters, and the camera motion parameters comprise camera translation parameters and camera rotation parameters.

Further, the constructing a training data set based on the moving image set of the target in the simulation scene includes:

generating a random mask through a random shielding algorithm, wherein the area of the random mask is not larger than the area of a preset mask;

and fusing the random mask and the moving image set to obtain the training data set for simulating that the plane target is blocked.

Further, the constructing the relatives of the reference frame and the target frame based on the reference frame features and the target frame features includes:

and obtaining the correlation body based on the dot product of the channel value of the reference frame characteristic and the channel value of the target frame characteristic.

Further, after the determining, by the linear transformation module in the planar target tracking model, a homography matrix for planar target tracking based on the target mask and the current optical flow result, the method further includes:

a planar target in the reference frame is determined in the target frame based on the linear transformation model and the homography matrix.

In a second aspect, the present application further provides that the planar target tracking apparatus based on multitasking learning includes:

the feature extraction module is used for extracting the reference frame features of the reference frame and the target frame features of the target frame;

the optical flow estimation module is used for constructing a correlation body of the reference frame and the target frame based on the reference frame characteristics and the target frame characteristics, and carrying out optical flow estimation from the reference frame to the target frame based on the correlation body to obtain a current optical flow result;

the mask segmentation module is used for carrying out network sampling on the reference region characteristics based on the current optical flow result to obtain the reference frame characteristics after the reference frame alignment; splicing the aligned reference frame features and the target region features in the space of the target frame to obtain a target mask;

And the linear transformation module is used for determining a homography matrix for planar target tracking based on the target mask and the current optical flow result.

Further, the planar target tracking apparatus further includes:

the scene simulation module is used for constructing a simulation scene based on at least one of preset environment parameters, preset plane parameters and preset camera parameters;

the data construction module is used for constructing a training data set based on a moving image set of the target in the simulation scene;

and the model training module is used for training and generating the planar target tracking model based on the training data set.

Further, the data construction module is further configured to:

Further, the planar target tracking apparatus further includes:

the environment parameter setting module is used for setting the physical environment of the simulation scene to obtain the preset environment parameters through the panoramic pictures in the preset picture library, wherein the preset environment parameters comprise illumination conditions and physical backgrounds;

The plane parameter setting module is used for decoupling a plane into an initial plane, a material and a set of maps, and obtaining the preset plane parameters based on the length and width parameters and the orientation parameters of the initial plane, the type information of the material and the type information of the maps;

the camera parameter setting module is used for obtaining the preset camera parameters by setting camera characteristic parameters and camera motion parameters, wherein the camera characteristic parameters comprise resolution parameters, cone-viewing parameters, focus parameters, depth parameters and aperture parameters, and the camera motion parameters comprise camera translation parameters and camera rotation parameters.

Further, the feature extraction module includes:

a region of interest determining unit configured to determine a region of interest of the reference frame and a region of interest of the target frame;

and the region feature extraction unit is used for extracting the reference region feature of the region of interest of the reference frame as the reference frame feature and extracting the target region feature of the region of interest of the target frame as the target frame feature.

Further, the optical flow estimation module is further configured to:

Further, the planar target tracking apparatus further includes:

and the target tracking module is used for determining a plane target in the reference frame in the target frame based on the linear transformation model and the homography matrix.

In a third aspect, the present application also provides a computer device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor is configured to execute the computer program and implement the planar target tracking method based on multi-task learning as described above when the computer program is executed.

In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement a planar target tracking method based on multi-task learning as described above.

Compared with the prior art, the planar target tracking method based on the multi-task learning provided by the embodiment of the application extracts the reference frame characteristics of the reference frame and the target frame characteristics of the target frame through the characteristic extraction module in the planar target tracking model; constructing a correlation body of the reference frame and the target frame based on the reference frame characteristics and the target frame characteristics through an optical flow estimation module in the plane target tracking model, and carrying out optical flow estimation from the reference frame to the target frame based on the correlation body to obtain a current optical flow result; performing network sampling on the reference region features based on the current optical flow result through a mask segmentation module in the planar target tracking model to obtain reference frame features after the reference frames are aligned; splicing the aligned reference frame features and the target region features in the space of the target frame to obtain a target mask; and determining a homography matrix for planar target tracking based on the target mask and the current optical flow result through a linear transformation module in the planar target tracking model. Through the mode, the invention provides a robust planar target tracking network, dense optical flow, a planar mask and a homography matrix are estimated in a combined mode, namely, the number of matching points ensured by the dense optical flow, the planar mask filters interference such as background and shielding, and the planar target can be tracked in a robust mode through the dense reliable optical flow, so that the robust capability of resisting shielding and blurring of a model is improved, and the planar target tracking precision is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a planar target tracking method based on multi-task learning according to an embodiment of the present application;

fig. 2 is a schematic diagram of a network structure of a planar target tracking model based on multi-task learning according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a planar sampling process according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a simulation data image set according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a feature extraction process according to an embodiment of the present disclosure;

FIG. 6 is a schematic block diagram of a planar target tracking apparatus based on multi-task learning provided by an embodiment of the present application;

fig. 7 is a schematic diagram of a planar target tracking apparatus based on multi-task learning according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the described embodiments are some, but not all, examples of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

It is to be understood that the terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that, in order to clearly describe the technical solutions of the embodiments of the present application, in the examples of the present application, the words "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

The inventors of the present application found that:

first, in terms of training data, existing methods require a large amount of supervised data to train, and the acquisition and labeling of such data is very difficult and time consuming. Therefore, the existing method is often trained by constructing simulation data based on homography transformation, but the simulation data can only simulate the deformation in a plane, cannot simulate details such as illumination, shielding, materials and the like in a real scene, and is difficult to generate an image with sense of reality, so that the generalization capability of a model is influenced.

Secondly, in the aspect of models, the existing method usually relies on feature consistency to match and track, but strong surface changes of a planar target caused by factors such as blurring, illumination changes, shielding and the like often violate consistency preconditions, so that tracking failure is caused. The reason is elaborated, and most of the existing models are single-task learning, so that robust feature representation under complex conditions is difficult to learn.

The invention aims at the problems, and provides a simulation data generation mode capable of realistically simulating real conditions, a training data set is constructed, and a robust planar target tracking algorithm based on multi-task learning is trained based on the data set, so that the anti-shielding and fuzzy robust capacity of the model is improved.

In order to solve the above problems, the present application provides a planar target tracking method based on multi-task learning.

Referring to fig. 1, fig. 1 is a schematic flow chart of a planar target tracking method based on multi-task learning according to an embodiment of the present application, including steps S101 to S104.

In order to perform robust tracking under the condition of shielding and blurring and improve the tracking precision of a planar target, the embodiment adopts an anti-shielding and blurring planar target robust tracking technology, and a planar target tracking model based on multi-task learning is trained by utilizing multi-task learning, so that the model is promoted to learn the anti-blurring and blurring robust characteristics, and the tracking accuracy is improved.

As shown in fig. 2, the planar target tracking model proposed in the present embodiment mainly includes a planar sampling module, an optical flow estimation module, a mask segmentation module, and a direct linear transformation module (Direct Linear Transformation, DLT). And taking the reference frame and the target frame as input, cutting out the interest areas from the two input frames through a plane sampling module, extracting the characteristics of the two interest areas through a characteristic extraction module, and carrying out correlation filtering operation on the characteristics to obtain a four-dimensional correlation body. The correlation volume records the correlation of any two pixels in the reference frame and the target frame. And takes the correlation as input to an optical flow estimation module to estimate dense optical flow. The optical flow result is used as characteristic alignment, namely, the characteristics of the reference frame are aligned to the target frame space through grid sampling operation by using the optical flow result, the aligned characteristics are used for estimating a plane mask by a mask estimating module, then the mask is used for filtering to obtain a dense and reliable optical flow result, and finally a homography matrix is robustly estimated by using a DLT method. Specifically:

Step S101, extracting reference frame characteristics of a reference frame and target frame characteristics of a target frame through a characteristic extraction module in a planar target tracking model;

in this embodiment, the reference frame is decomposed into RGB three-channel data images, the RGB three-channel data images are passed through a 3×3 convolution layer, shallow image texture details are encoded into a 64-dimensional feature space to obtain a first dimension feature, and then the feature dimension is increased to 128 dimensions by a stacked 6-layer residual module to obtain a second dimension feature. Finally, through a 1X 1 convolution layer, the 128-dimensional second dimension feature is mapped to 256 dimensions to obtain a third dimension feature, the third dimension feature of the RGB three-channel data image is utilized to construct a high-dimension feature map, and the target frame and the reference frame are extracted by a feature extraction module with shared weights to obtain 256-dimensional high-level dimension features for subsequent steps.

Further, the step S101 includes:

In this embodiment, the plane sampling module clips the region of interest where the plane exists from the reference frame and the target frame. Given an initial plane, p _i Representing points on a plane using homography matricesSampling a plane onto a template space of W.times.H, thenWherein (1)>Is p _i At the corresponding point in the template space. The plane sampling module transforms the region of interest to the template space, and compared with the whole image processing, the calculation amount can be reduced, and the background interference can be reduced. Defining homography matrix from i frame template space to j frame template space as +.>Can be expressed as:

further, before step S101, the method further includes:

In this embodiment, the real scene is simulated from the aspects of environment, plane, camera, etc., so as to render realistic data, construct a training data set, and mask and shelter the training data set by a random shelter algorithm to obtain a simulated training data set. And then training to generate a planar target tracking model through simulating the training data set.

Further, before constructing the simulation scene, the method further comprises the steps of:

The simulation data based on pixel transformation cannot simulate complex situations such as jitter, defocus, light change and the like in a real environment, so that the performance of a planar target tracking algorithm is limited. In the embodiment, a new rendering-based data set construction method is provided, and by simulating the effects of motion blur, defocus blur and the like generated during camera motion, a new data set is constructed according to the effects, so that difficult samples such as jitter, defocus, light transformation and the like in a real scene are better simulated. Mainly including world environment settings, plane settings, camera settings and dataset construction.

A HDRI (picture file format) library of Poly Haven (public 3D asset library) is used as a world background, including various scenes such as indoor, outdoor, city, field, daytime, nighttime, etc. Because the HDRI file contains illumination information of each pixel point, namely when the camera switches the visual angle, illumination conditions can also change, so that the simulation of illumination change of a real scene is realized, and illumination changes caused by visual angle changes in different environments are better restored. The physical environment of the simulation scene is defined through the real HDRI panorama to obtain preset environment parameters, including illumination conditions, physical background and the like, which are very close to the actual application scene, so that the data rendered by the method has stronger sense of reality.

To simulate planes in different environments, the planes are deconstructed into a set of initial planes, textures and maps, as follows:

1) Generating an initial plane by random length and width and placing the initial plane at an origin;

2) Randomly selecting the resources of the AmbientCG (a material website comprising hundreds of materials such as wood, stone, brick, glass, paper and the like) as the plane materials;

3) Randomly selecting a map serving as a plane from a 4K picture library crawled on the Internet;

therefore, the plane is decoupled into three parts of a two-dimensional plane, a material and a texture, wherein parameters such as the length, the width, the orientation and the like of the plane can be modified by setting the two-dimensional plane, and planes with different sizes can be simulated; different materials are arranged, and different targets in real life, such as a mirror plane, a metal plane and the like, can be simulated by modifying the degree of metal, roughness, concave-convex mapping and the like; by setting different textures, different exterior planes can be simulated. The length and width parameters and the orientation parameters of the two-dimensional plane, the type information of the materials and the type information of the textures are set to obtain the preset plane parameters, so that the multi-aspect fine simulation of the real tracking plane target is realized, and the reality of the simulation result is improved.

Both motion blur and defocus blur are related to the camera, so camera settings are crucial. The camera related settings mainly include:

1) Camera movement

Given a plane model, the camera supports basic motion modes such as zoom-in and zoom-out motion, rotation motion, shaking motion, view angle transformation motion and the like, and can also be combined.

2) Defocus simulation

By simulating some characteristics of the camera, such as motion blur, defocus blur, etc., camera motion common in actual object tracking is simulated at the same time. The camera setting is divided into two parts of camera characteristic setting and camera motion setting to obtain preset camera parameters. The camera characteristic setting comprises setting parameters such as resolution, cone, focus, depth of field, aperture and the like, so that the characteristics of a real camera are simulated. Camera motion settings simulate motion in real object tracking by translating and rotating the camera. And simulating the change of the focal point by using the gaussian distribution, the formula is as follows:

P _f ～N(P _c ，σ ² ),

wherein P is _f For camera focusing, P _c Is the center point of the plane, sigma ² Is the variance.

In this embodiment, the shielding condition of the planar target is simulated by a random shielding algorithm, which includes two small steps of randomly generating a mask and fusing images.

A random mask is first generated by a random occlusion algorithm. Let the length and width of the mask picture be h, w, the mask occupation area ratio threshold be th, and the mask edge number be n. Randomly generating n coordinate pointsWherein the method comprises the steps ofIs a random number. And connecting the coordinate points to form a polygon, and calculating the area proportion of the polygon. And outputting a mask if the area proportion occupied by the polygon is larger than th, otherwise regenerating the coordinate point.

Fusing the polygon random mask generated in the last step with an image, and assuming mask, mask length h, w and plane vertex coordinates p _i Mask occupies a planar target proportion interval th _low ，th _high ]. First randomly selecting a threshold value of the ratio of the shielding mask to the planar target, th=ζ _th (th _low ≤ξ _th ≤th _high ) Wherein xi _th Is a random number. Then generating a rectangle with length and width h respectively _rect ，w _rect The formula is as follows:

the mask is then randomly placed in the matrix with new coordinates ofWherein, is 0.ltoreq.ζ _x ≤h _rect -h，0≤ξ _y ≤w _rect -w, the formula is as follows:

And solving the initial simulation image set by using a direct linear transformation algorithm based on the random mask and the initial simulation image set to obtain a preset homography matrix H from the matrix to the plane, transforming the mask to the plane by using the preset homography matrix to obtain plane mask information, and masking and shielding the initial simulation image set by using the plane mask information to obtain an image fusion result, wherein the image fusion result is used as a simulation data image set, namely a training data set.

Step S102, constructing a correlation body of the reference frame and the target frame based on the reference frame characteristics and the target frame characteristics through an optical flow estimation module in the plane target tracking model, and carrying out optical flow estimation from the reference frame to the target frame based on the correlation body to obtain a current optical flow result;

in this embodiment, RAFT is used for optical flow estimation, and the method uses correlation operation to calculate correlation between all pixels, assuming image feature f ₁ ，f ₂ ∈R ^H*W* C, four-dimensional correlator is C (f ₁ ，f ₂ )∈R ^H*W*H*W RAFT samples the latter two-dimensional by using different pooling cores, performs table lookup operation on different scales, can sense optical flow movement of different degrees, and obtains dense and reliable optical flow results through a cyclic iteration updating module.

In this embodiment, assuming that the correlation body is C, the target frame is characterized by F ^t The reference frame is characterized by F ^r The correlation building formula is as follows:

where i, j and k, l are the pixel coordinates of the target frame and the reference frame, respectively,for the value of the h channel of the ith row, j and column of the pixel characteristic of the target frame,/for the pixel characteristic of the target frame>The value of the h channel of the kth row and column is the pixel characteristic of the reference frame.

Step S103, performing network sampling on the reference region features based on the current optical flow result by a mask segmentation module in the planar target tracking model to obtain reference frame features after the reference frame alignment; splicing the aligned reference frame features and the target region features in the space of the target frame to obtain a target mask;

in this embodiment, the mask segmentation module is configured to segment out interference parts such as occlusion, and filter out reliable optical flow results, so as to perform robust tracking. The module uses the aligned pairs of features to input.

Specifically, grid sampling is performed on the reference frame features using the optical flow results to obtain aligned features. The module consists of three convolution layers with a core of 3 of the Relu activation function and one convolution layer with a core of 1, and finally outputs a segmentation result through the Sigmoid activation function.

Step S104, determining a homography matrix for planar target tracking based on the target mask and the current optical flow result through a linear transformation module in the planar target tracking model.

In this embodiment, the high-dimensional features of the initial image set are extracted by two convolution layers and six-layer residual modules in the trained planar target tracking model to obtain a high-dimensional feature map, and the high-dimensional feature extraction process is shown in fig. 5, and the high-dimensional feature map is extracted by the feature extraction module based on the shared weight to obtain the high-dimensional features of the target frame and the high-dimensional features of the reference frame in the initial image set, so as to obtain the features of the reference frame and the features of the target frame.

And calculating by utilizing the reference frame characteristic and the target frame characteristic to construct a correlation body characteristic, and iteratively estimating the optical flow according to the correlation body characteristic, the reference frame characteristic and the target frame characteristic to obtain a target optical flow result. Specifically, a first optical flow result is obtained by using the reference frame feature and the target frame feature, matching relevant features of different scales are indexed on a relevant pyramid by using the first optical flow result, the matching relevant features are output to an optical flow residual through a cyclic neural network GRU, the optical flow residual is added with the current first optical flow result to obtain a new optical flow result, the step is iterated for a plurality of times, and finally, a precise target optical flow result is output, namely, the motion vector of each pixel from the reference frame to the target frame is estimated.

Firstly, filtering out non-planar area features in reference frame features by using a planar mask to obtain reference frame planar features with only planar area features, mapping the reference frame planar features to a target frame space through optical flow, splicing the reference frame planar features with the target frame features, inputting the spliced reference frame planar features into a target mask estimation module, and finally outputting a target mask.

Combining the mask and the optical flow result, filtering to obtain an optical flow on a plane target area, namely a plane area optical flow result, and inputting the plane area optical flow result into a direct linear transformation algorithm to calculate to obtain a plane target homography matrix.

By the above manner, in this embodiment, a multi-task learning network combining optical flow estimation and mask estimation is provided, and relative to the offset of four corner points of the model direct regression plane, the optical flow estimation can generate dense matching point pairs, and by combining with the plane target mask, the error optical flow result caused by background, shielding and the like is filtered, so that the tracking robustness of the model under the conditions of shielding, exceeding the viewing angle and the like can be improved. In addition, the embodiment also provides a simulation data generation mode capable of realistically simulating the real situation. The real scene is simulated from the environment, the plane, the camera and the like, is a rendering process from 3D to 2D, and can effectively simulate motion blur, defocus blur and surface change caused by illumination. In addition, in the embodiment, the simulation result is randomly marked with a polygonal mask, so that a training data set is constructed, the model can learn more robust features, and the resistance to blurring and shielding is improved. In connection with the dataset construction approach presented herein, the model can learn feature representations that are robust to ambiguity.

Referring to fig. 6, fig. 6 is a schematic block diagram of a planar target tracking apparatus based on multi-task learning according to an embodiment of the present application.

As shown, the planar target tracking apparatus based on the multitasking learning includes:

a feature extraction module 10, configured to extract a reference frame feature of a reference frame and a target frame feature of a target frame;

the optical flow estimation module 20 is configured to construct a correlation body of the reference frame and the target frame based on the reference frame feature and the target frame feature, and perform optical flow estimation from the reference frame to the target frame based on the correlation body, so as to obtain a current optical flow result;

a mask segmentation module 30, configured to perform network sampling on the reference region feature based on the current optical flow result, to obtain a reference frame feature after the reference frame alignment; splicing the aligned reference frame features and the target region features in the space of the target frame to obtain a target mask;

a linear transformation module 40 for determining a homography matrix for planar target tracking based on the target mask and the current optical flow result.

Further, the planar target tracking apparatus further includes:

Further, the data construction module is further configured to:

Further, the planar target tracking apparatus further includes:

Further, the feature extraction module 10 includes:

Further, the optical flow estimation module 20 is further configured to:

Further, the planar target tracking apparatus further includes:

Referring to fig. 7, fig. 7 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device may be a server.

Referring to fig. 7, the computer device includes a processor, a memory, and a network interface connected by a model bus, wherein the memory may include a non-volatile storage medium and an internal memory.

The non-volatile storage medium may store an operational model and a computer program. The computer program includes program instructions that, when executed, cause the processor to perform any of a number of planar object tracking methods based on multi-tasking.

The processor is used to provide computing and control capabilities to support the operation of the entire computer device.

The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of planar target tracking methods based on multi-tasking learning.

The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

It should be appreciated that the processor may be a Central processing unit (Central ProcessingUnit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-Programmable gate arrays (FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Embodiments of the present application further provide a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, and the processor executes the program instructions to implement any of the multi-task learning-based planar target tracking methods provided in the embodiments of the present application.

The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The planar target tracking method based on the multi-task learning is characterized by comprising the following steps of:

2. The planar target tracking method according to claim 1, wherein before extracting the reference frame features of the reference frame and the target frame features of the target frame by the feature extraction module in the planar target tracking model, the planar target tracking method further comprises:

3. The planar target tracking method according to claim 2, wherein before constructing the simulation scene, the planar target tracking method includes:

4. The planar target tracking method according to claim 2, wherein the constructing a training data set based on a set of moving images of a target in the simulation scene includes:

5. The planar target tracking method according to claim 1, wherein the extracting, by the feature extraction module in the planar target tracking model, the reference frame features of the reference frame and the target frame features of the target frame includes:

6. The planar target tracking method according to claim 1, wherein the constructing a correlation of the reference frame and the target frame based on the reference frame feature and the target frame feature comprises:

7. The planar target tracking method according to any one of claims 1 to 6, wherein the determining, by a linear transformation module in the planar target tracking model, a homography matrix for planar target tracking based on the target mask and the current optical flow result, further comprises:

8. A planar target tracking apparatus based on multi-task learning, the planar target tracking apparatus based on multi-task learning comprising:

9. A computer device, the computer device comprising a memory and a processor;

the memory is used for storing a computer program;

The processor is configured to execute the computer program and implement the planar target tracking method based on multitasking learning according to any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which when executed by a processor causes the processor to implement the planar target tracking method based on multitasking learning according to any one of claims 1 to 7.