CN110688895B

CN110688895B - Underground cross-vision field target detection tracking method based on multi-template learning

Info

Publication number: CN110688895B
Application number: CN201910782295.5A
Authority: CN
Inventors: 云霄; 孙彦景; 芦楠楠; 王赛楠
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2019-08-23
Filing date: 2019-08-23
Publication date: 2023-04-07
Anticipated expiration: 2039-08-23
Also published as: CN110688895A

Abstract

The invention discloses a multi-template learning-based underground cross-vision field target detection and tracking method, which comprises the following steps of: s10, carrying out related filter tracking processing on a current frame image obtained by shooting a current view by adopting an initial template to obtain a first response image; the initial template is determined according to a target sample set; s20, performing related filter tracking processing on the current frame image by adopting a process template to obtain a second response image; the process template is determined according to the tracking object in each view field; s30, carrying out linear weighting processing on the first response diagram and the second response diagram to obtain a final response diagram; and S40, determining the position with the maximum response value in the final response image as the target position of the tracking target. By adopting the method, the long-time tracking of different target objects can be realized, and the tracking accuracy is higher.

Description

Underground cross-vision field target detection tracking method based on multi-template learning

Technical Field

The invention relates to the technical field of signal processing, in particular to an underground cross-vision field target detection and tracking method based on multi-template learning.

Background

Coal is a basic resource and energy source in China, is an important guarantee for national economic development, and coal mining work is one of key development industries in China. The underground coal mine fully-mechanized coal mining face is a high-risk production environment, and has the defects of limited space, high temperature, high humidity, heavy coal ash, low visibility, multiple high-power devices and complex working conditions. At present, most of safety management work aiming at personnel targets and the like of the underground working face adopts a manual monitoring mode, and the safety management system has the limitations of short duration, narrow coverage range, high cost and the like. Along with the development of intelligent mine construction, the coal mining process gradually tends to be less/unmanned, so that the real-time visual remote monitoring of a working face is realized, the underground safe production is favorably ensured, the automation and informatization management level of coal mine production is improved, and a foundation is laid for realizing intelligent mining. The coal mine fully mechanized coal mining face has high temperature, high humidity, heavy coal ash and low visibility, and provides higher requirements for personnel video tracking; meanwhile, the space of a working face is limited, a large number of high-power devices are provided, the working condition is complex, the visual field range of a single camera is limited, automatic cross-visual field tracking needs to be achieved among a plurality of cameras, and long-term and stable target detection and tracking are achieved.

In the aspect of target detection, effective features are difficult to extract by a traditional and classical target detection method based on Haar features or Histogram of Oriented Gradient (HOG), so that the detection accuracy is not high in a complex scene. And a Convolutional Neural Network (CNN) method represented by Faster R-CNN can be used for mining characteristics more beneficial to distinguishing different targets, and achieves excellent performance in target detection challenges. However, most of the features extracted by the traditional method are limited to features between classes, so that the accuracy of matching identification in the classes is reduced, different targets cannot be distinguished and identified, the identification function is limited, and the identification accuracy is low.

Disclosure of Invention

Aiming at the problems, the invention provides an underground cross-vision field target detection and tracking method based on multi-template learning.

In order to realize the aim of the invention, the invention provides a multi-template learning-based underground cross-visual field target detection and tracking method, which comprises the following steps:

s10, carrying out related filter tracking processing on a current frame image obtained by shooting a current view by adopting an initial template to obtain a first response image; the initial template is determined according to the target sample set;

s20, performing related filter tracking processing on the current frame image by adopting a process template to obtain a second response image; the process template is determined according to the tracking object in each view field;

s30, carrying out linear weighting processing on the first response diagram and the second response diagram to obtain a final response diagram;

and S40, determining the position with the maximum response value in the final response image as the target position of the tracking target.

In one embodiment, before performing the correlation filter tracking process on the current frame image captured in the current view by using the initial template to obtain the first response map, the method further includes:

detecting the probability that each first tracked object in a first candidate frame of a first view belongs to each sample in a target sample set, and determining the first object label of each first tracked object according to the sample label corresponding to the maximum probability;

when the first tracked object moves out of the current visual field, acquiring the feature vector of each candidate object in a second candidate frame of the second visual field, selecting a second tracked object from each candidate object according to the feature vector and an object template, and identifying a second object label of the second tracked object; the similarity between the second tracked object and the target template exceeds a similarity threshold; the target template is each tracking object in the previous view field; the second field of view is a field of view in which the first tracked object appears after the first tracked object moves out of the current field of view;

determining an initial template according to the target sample set, and generating a process template according to the first tracked objects, the first object labels corresponding to the first tracked objects, the second tracked objects and the second object labels corresponding to the second tracked objects.

For one embodiment, the target sample set includes a tagged target sample set and an untagged target sample set;

the determining an initial template from the set of target samples comprises:

moving samples in the labeled target sample set that do not match a first tracked object to the unlabeled target sample set;

and determining an initial template according to the labeled target sample set and the unlabeled target sample set.

As an embodiment, the formula for calculating the probability that each first tracked object in the first candidate frame of the first view belongs to each sample in the target sample set includes:

wherein,

representing the cosine similarity of the first candidate box x and the labeled target sample, and lambda represents the controlMaking the degree of flatness of the probability distribution,/ _i The feature vector representing the ith labeled sample of the labeled target sample set, i =1, …, N _l ，N _l Total number of target labels, u, for unlabeled target sample set _k A feature vector representing the kth labeled sample of the unlabeled target sample set, k =1, …, N _U ，N _U The total number of target tags in the unlabeled target sample set.

In one embodiment, the selecting a second tracked object from the candidate targets according to the feature vector and the target template includes:

and substituting the vector set comprising the feature vectors of the candidate targets into a similarity calculation formula to calculate the similarity between each candidate target and each target template, and determining the candidate target corresponding to the similarity larger than the similarity threshold as a second tracking object.

As one embodiment, the similarity calculation formula includes:

wherein t represents a set of first tracked objects from different perspectives, t _i Is the feature vector of the target template at the ith view angle, T is the set of the feature vectors of the candidate targets, like (T, T) represents the similarity between T and T, cos (T) _i T) represents T _i Cosine similarity to T.

In one embodiment, the formula of the linear weighting process includes:

f'＝γ ₁ f ₁ +γ ₂ f ₂ ，

wherein f' represents the final response diagram, f ₁ Showing a first response diagram, f ₂ Showing a second response diagram, γ ₁ Denotes a first weight value, gamma ₂ Representing the second weight.

In one embodiment, the method for tracking targets across visual fields in a well based on multi-template learning further includes:

acquiring a target position of a tracking target in a current frame image;

acquiring the position of a tracking target in a previous frame of image to obtain a reference position;

calculating an offset between the target position and a reference position;

and if the offset is larger than the offset threshold, judging that the current target tracking fails.

In one embodiment, the method for detecting and tracking targets across visual fields in a well based on multi-template learning further includes:

and acquiring the current filter parameter of the relevant filter, and updating the filter parameter of the relevant filter in the process of carrying out target tracking on the next frame of image according to the current filter parameter.

According to the underground cross-visual field target tracking method based on multi-template learning, the initial template is adopted to carry out relevant filter tracking processing on the current frame image obtained by shooting the current visual field to obtain the first response image, the process template is adopted to carry out relevant filter tracking processing on the current frame image to obtain the second response image, linear weighting processing is carried out on the first response image and the second response image to obtain the final response image, the position with the maximum response value in the final response image is determined as the target position of the tracking target, tracking of the tracking target is achieved according to the target position, long-time tracking of different target objects can be achieved, and high tracking accuracy is achieved.

Drawings

FIG. 1 is a flow diagram of a method for detecting and tracking targets across views downhole based on multi-template learning according to one embodiment;

FIG. 2 is a diagram of a framework for a method for correlation filtering tracking to obtain a corresponding response map in one embodiment;

FIG. 3 is a plot of a comparison analysis of center position error for downhole video experimental results according to one embodiment;

FIG. 4 is a graph of a comparative analysis of the tracking success rate of downhole video experimental results for one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1, fig. 1 is a flowchart of a method for detecting and tracking a target across view downhole based on multi-template learning according to an embodiment, including the following steps:

s10, carrying out related filter tracking processing on a current frame image obtained by shooting a current view by adopting an initial template to obtain a first response image; the initial template is a template determined according to a target sample set.

Before the above steps, an initial template including a target sample set may be pre-constructed, and an initial detection template is provided for detecting a target position in a current frame image. The target sample set comprises a plurality of samples corresponding to the tracking target, and each sample represents one state of the tracking target.

Specifically, the process of performing the correlation filter tracking process on the current frame image captured by the current view zone by using the initial template may include:

wherein f represents a response map (first response map) obtained by the correlation filter tracking processing,

filter parameters representing the associated filter, <' > or>

Indicates the kernel correlation vector between the training sample (e.g., initial template) of the correlation filter and the candidate sample in the current frame image, symbol &indicatesthe convolution, F ^-1 Representing an inverse fourier transform.

S20, performing related filter tracking processing on the current frame image by adopting a process template to obtain a second response image; the process template is determined according to the tracking object in each view field; the process template can be tracked accordingly in dependence on the tracked objects determined in the respective field of view.

The above steps may be preceded by constructing in advance a process template including the tracked object determined in each field of view, and providing a process detection template for detection of the target position in the current frame image (current frame image).

Specifically, the above-mentioned process template is adopted to perform the tracking processing of the correlation filter on the current frame image:

where f denotes a response map (second response map) obtained by the correlation filter tracking process,

filter parameters representing the associated filter, <' > or>

Denotes a kernel correlation vector between the training samples of the correlation filter (e.g., the process template) and the candidate samples in the current frame image, symbol [ ] denotes a convolution, a kernel correlation vector between the initial template (the training samples of the correlation filter) and the candidate samples in the current frame image, F ^-1 Representing an inverse fourier transform.

And S30, carrying out linear weighting processing on the first response diagram and the second response diagram to obtain a final response diagram.

And the step of superposing and fusing the images of the first response image and the second response image to obtain a final response image, and determining the coordinates of the central point of the tracking frame according to the peak value of the response image of the final response image so as to determine the corresponding target position.

Specifically, the formula of the linear weighting process in the above steps may include:

f'＝γ ₁ f ₁ +γ ₂ f ₂ ，

wherein f' represents the final response diagram, f ₁ Showing a first response diagram, f ₂ Showing a second response diagram, γ ₁ Denotes a first weight value, gamma ₂ Representing the second weight. The above gamma-ray ₁ And gamma ₂ Can be realized according to the direct characteristics of the initial template and the process template respectively, such as the gamma ray ₁ Set to 0.5, gamma ₂ Set to 0.5, and so on.

The gray value of the final response image can be identified, the response value is represented through the gray value, and if the gray value at a certain position represents that the response value at the certain position is maximum, the certain position can be determined as the target position of the tracking target.

According to the underground cross-view target tracking method based on multi-template learning, the initial template is adopted to perform relevant filter tracking processing on a current frame image obtained by shooting a current view to obtain a first response image, the process template is adopted to perform relevant filter tracking processing on the current frame image to obtain a second response image, linear weighting processing is performed on the first response image and the second response image to obtain a final response image, the position with the maximum response value in the final response image is determined as the target position of a tracking target, tracking of the tracking target is achieved according to the target position, long-time tracking of different target objects can be achieved, and high tracking accuracy is achieved.

detecting the probability that each first tracked object in a first candidate frame of a first visual field belongs to each sample in a target sample set, and determining a first object label of each first tracked object according to a sample label corresponding to the maximum probability;

when the first tracked object moves out of the current visual field (such as the first visual field), acquiring a feature vector of each candidate object in a second candidate frame of the second visual field, selecting a second tracked object in each candidate object according to the feature vector and each first tracked object, and identifying a second object tag of the second tracked object; the similarity between the second tracked object and the target template exceeds a similarity threshold; the target template is each tracking object in the previous view field; the second field of view is a field of view in which the first tracked object appears after the first tracked object moves out of the current field of view;

The first view is the view of the first camera, and the second view is the view of one camera except the first camera.

Optionally, when the corresponding tracked object (e.g., a second tracked object) moves out of the current view (e.g., a second view), the next view into which the tracked object enters may be further determined again as a new second view, then the feature vectors of the candidate objects in the second candidate frame of the second view are acquired, the second tracked object is selected from the candidate objects according to the feature vectors and the first tracked objects, and a second object tag of the second tracked object is identified; and repeatedly executing the process of continuously determining the next visual field into which the tracked object enters as a new second visual field when the corresponding tracked object moves out of the current visual field (such as the second visual field), then acquiring the feature vector of each candidate object in a second candidate frame of the second visual field, selecting a second tracked object in each candidate object according to the feature vector and each first tracked object, and identifying a second object label of the second tracked object to determine the second tracked object of each visual field, so that the process template is generated according to each first tracked object, the first object label corresponding to each first tracked object, all second tracked objects of each second visual field, and the second object label corresponding to each second tracked object, so as to ensure the integrity of the generated process template.

the determining an initial template from the set of target samples comprises:

moving samples which are not matched with a first tracking object in the labeled target sample set to the unlabeled target sample set;

The labeled target sample set is a sample set labeled with each sample label, that is, all samples contained in the labeled target sample set are labeled with labels and can be marked as SL, and the unlabeled target sample set is a sample set not labeled with each sample label, that is, all samples contained in the unlabeled target sample set are not labeled with related labels and can be marked as SU.

In one example, a cross-view template pool can also be constructed by the initial templates and the process templates, and the influence of the cross-view target appearance serious change on the target identification and tracking robustness can be prevented by collecting the initial templates and the process templates of each view. The cross-field template pool contains two types of templates: initial template I and Process template set { C _i H, i =1,2. The initial template I is determined by extracting image blocks according to the positions of the current cameras when the current cameras are determined as targets for the first time; process template { C _i And the maximum similarity between the target and the initial template is selected in each view field to determine so as to improve the detection, identification and tracking robustness of the target in the process of moving across the view fields.

According to the selected initial template and the selected process template, the detection and identification target position in the initial vision field in the template tracking process is used as a starting point, and the related filtering algorithm is respectively applied to the initial template and the process template to respectively obtain respective response graphs. And carrying out image superposition and fusion on the response image obtained by the template, searching the peak value of the response image, and determining the peak value as the target position. And extracting a target area according to the position of the last frame of the target, and calculating the similarity between the target area and the initial template. And judging the similarity, and if the similarity is smaller than a threshold value, storing the tracking result into a process template set to replace an old template. Otherwise, the process template set is not updated. And finally, predicting the tracking of the next camera by using the updated process template set.

wherein,

representing the cosine similarity of the first candidate box x and the labeled target sample, lambda representing the smoothness of the control probability distribution, l _i The feature vector representing the ith labeled sample of the labeled target sample set, i =1, …, N _l ，N _l Total number of target tags, u, for unlabeled target sample set _k Feature vector representing kth labeled sample of unlabeled target sample set, k =1, …, N _U ，N _U The total number of target labels in the unlabeled target sample set.

In one example, to achieve robust discrimination of different targets, the network is trained with the ability to extract intra-class difference features. Comparing the first candidate box x with all label samples and non-label samples in the SL and SU, respectively, and calculating a maximum log-likelihood function of the sample x:

L＝E _x [logp _o ]，

wherein,p _o representing the probability maximum value corresponding to each first tracked object in the first candidate frame x, namely the probability maximum value in the probability that each first tracked object belongs to each sample in the target sample set, wherein o is the label of the target sample corresponding to the maximum likelihood function of x, namely the identification result of the target label updates the unmatched feature vector into the unlabeled target sample set queue, and removes the outdated sample feature, E _x []Representing a maximum likelihood function.

As one embodiment, the similarity calculation formula includes:

wherein t represents a set of first tracked objects from different perspectives, t _i Is the feature vector of the target template at the ith view angle (256-dimensional feature vector of the target template at the ith view angle), T is the set of the feature vectors of the candidate targets, likehood (T, T) represents the similarity between T and T, cos (T, T) _i T) represents T _i Cosine similarity with T.

The target templates are tracking objects in the previous view, specifically, the target templates may correspond to the tracking objects in the previous view one by one, and one target template represents one tracking object. In this embodiment, the similarity between each candidate target and each target template may be calculated, and the candidate target corresponding to the similarity greater than the similarity threshold may be determined as the second tracked object, so that a plurality of second tracked objects may be determined.

Specifically, the formula for calculating the cosine similarity may include:

in the formula, X represents a candidate box (a set of feature vectors of candidate objects), Y represents an object template, and X _g An element of X, y _g Is an element of Y.

In one embodiment, in the process of tracking a complex scene underground, due to the influence of factors such as object shielding or view field emergence, a tracking target drifting phenomenon is easily generated, and a tracking frame deviates from an actual target position to a background position, so that inaccurate tracking or even failure is caused. The traditional long-time tracking algorithm performs re-detection once in each frame, but the algorithm is complex and time-consuming due to the fact that one detector needs to be operated in each frame.

In order to solve the above problem, the method for tracking a target across a view field in a well based on multi-template learning further includes:

acquiring a target position of a tracking target in a current frame image;

calculating an offset between the target position and a reference position;

The embodiment can perform re-detection on the current target, and realize a re-detection mechanism in the process of tracking the underground cross-view target. Specifically, the above re-detection mechanism does not perform re-detection every frame, but only when the target drifts; the mechanism judges the generation of tracking drift through the offset of the central position of the target of the adjacent frame, if the offset of the target of the adjacent frame is large and completely does not accord with the target motion condition, the tracking frame can deviate from the tracked target greatly, and on the contrary, if the offset of the target between the adjacent frames is small and accords with the target motion under the monitoring scene, the tracking frame can accurately track the target; in addition, the embodiment adopts a threshold activation strategy to re-detect the target, that is, when the target displacement (the offset between the target position and the reference position) is greater than a certain threshold, it indicates that the tracked target drifts, and the re-detector is started to re-detect the target in the current frame to position the target position, so that the effect of robust long-term tracking is achieved while the tracking model is simplified.

In one example, the position (x) of the current frame target is recorded in the target tracking process _i ,y _i ,w _i ,h _i ) And the target position (x) of the previous frame _i-1 ,y _i-1 ,w _i-1 ,h _i-1 ) The calculation formula for calculating the offset of the target center position between the two frames may be:

d＝(x _i -x _i-1 ) ² +(y _i -y _i-1 ) ²

if d is larger than Γ, Γ is an offset threshold, which indicates that tracking fails, and at this time, a detector may be used to detect the target again in the whole image by using a sliding window strategy.

In one embodiment, the above method for tracking targets across viewing zones in a well based on multi-template learning further includes:

As an embodiment, in the multi-template learning process, a correlation filter is used to extract a response graph of a current frame image and a corresponding template (such as an initial template or a process template), and a method framework for performing correlation filtering tracking on the current frame image according to the initial template and the process template to obtain a corresponding response graph may refer to fig. 2, where the tracking process shown in fig. 2 includes two stages, i.e., training and detection. In the training stage, the initial template and the process template in the template pool are used for respectively training the relevant filters, and the filtering process is respectively represented by a solid line and a dotted line; in the detection stage, a detection sample is extracted according to the position of the target of the previous frame, and is subjected to correlation calculation with the filter obtained by training, so that an initial template response graph and a process template response graph are obtained respectively. And in order to integrate the performances of the two filters, the two response graphs are subjected to image superposition and fusion to obtain a final response graph, and the center point coordinate of the tracking frame is determined according to the peak value of the response graph of the final response graph. The training phase and the detection phase in fig. 2 are described below, respectively:

a training stage:

determining the position of a detected target as the initial position of the target by a nonparametric optimization in-class characteristic target detection method, and then sampling an image block x with the size of M multiplied by N ₁ . Acquiring a process template x according to the similarity of the target in the process of crossing the vision field ₂ . Template { x _i I =1,2 is regarded as basic samples, and a cyclic shift operation is performed on them to construct a large number of negative samples x _i (k)，x _i (k)＝P ^k x, k ∈ {0,1,.., m-1} × {0,1,.., n-1}, where P is a permutation matrix. The purpose of training the filter is to find a w such that the function f (k) = w ^T x satisfies:

where μ is a regularization term effective to prevent the over-fitting phenomenon, w is a filter used for subsequent detection, y _i (k) Is a label data conforming to the Gaussian distribution, and a sample x _i (k) Is a corresponding relationship. However, equation (5) is difficult to obtain an analytical solution, and therefore it needs to be transformed into the fourier transform domain to solve:

wherein denotes a complex conjugate, and ^ denotes a Fourier transform, an indication of vector corresponding element multiplication. Obtained by

Is the solution of the linear regression model, but in reality, many problems are that the linearity is inseparable, which results in the equation (5) having no closed-form solution and the ridge regression filter having noEffective training is carried out by the method. Aiming at the problem, kernel skills are introduced when a filter is trained, and the method maps low-dimensional sample features to a high-dimensional space by using a kernel function, so that the problem of linear inseparability is solved, and the 'dimensionality disaster' is avoided.

After introducing the kernel technique, the solution of ridge regression in the fourier transform domain is:

wherein,

represents a sample x _i The autocorrelation operation can be calculated using the following gaussian kernel function formula:

a detection stage:

when a new frame of image comes, the sample image block z is extracted centering on the target position determined by the previous frame. At the same time, the filters corresponding to the initial template and the final template, respectively, are obtained in the training stage

And &>

The regression function for the sample to be tested can thus be calculated as:

wherein,

representing a training sample x _i And a kernel correlation vector with the candidate sample z. F ^-1 Representing an inverse Fourier transform。

Let f ₁ And f ₂ Respectively representing the filter response graphs corresponding to the initial template and the final template, and obtaining a final response graph through linear weighting of the response graphs:

f'＝γ ₁ f ₁ +γ ₂ f ₂ 。

then, finding the position with the maximum response value in the fused response image, determining the position as the final target position, and updating the filter model and the target appearance model, wherein the updating formula of the filter parameters comprises:

represents the updated filter parameter, <' > or>

Represents the filter parameter before update, <' > is asserted>

Which represents the object of the current frame,

representing the target of the previous frame and eta represents the control parameter.

The underground cross-vision field target tracking method based on multi-template learning is a novel target detection-cross-vision field re-identification-long-time tracking method, can form a set of complete underground cross-vision field target detection and tracking system, and realizes long-term and stable cross-vision field detection and tracking of the target. Firstly, the embodiment provides a target detection and identification method based on nonparametric optimization in-class features, and a nonparametric optimization training CNN model is used for learning in-class distinguishing features in a target space; studying multi-view matching to explore more complete and diverse information of the target from multiple views of the target to cope with dramatic appearance changes of the target; and training a target matching network based on the in-class features of different label targets, and learning the distinguishing features of the targets to effectively identify the target labels. Then, a cross-view-field target re-identification method based on multi-view fusion is provided, and target re-identification precision is improved by fusing a plurality of view sample characteristics; and constructing and updating a cross-visual-field template pool, and acquiring an appearance model of target diversity by combining a multi-visual-field template, so as to enhance the robustness of the appearance model to complex scenes. And finally, providing related filtering tracking based on multi-template learning, performing multi-template learning according to the constructed cross-vision field template pool to train a filter, and realizing automatic start detection during target drifting by utilizing a re-detection mechanism to relocate the target so as to realize long-time tracking of the target.

In one example, the method for detecting and tracking targets across the visual field under the well based on multi-template learning provided by the invention is compared with five other classical tracking algorithms, wherein the five classical tracking algorithms comprise: KCF (Kernelized Correlation Filters), CSK (circular Structure of Tracking-by-Detection with Kernels), TLD (Tracking-Learning-Detection), fDSST (fast distributed Scale spacing Tracking), and ROT (encryption-aware real-time object t Tracking). FIG. 3 shows a center position error comparative analysis of downhole video experiment results. The graphs are respectively the comparison of the central position errors of the algorithm (the underground cross-view target tracking method based on multi-template learning provided by the invention) and five classical tracking algorithms (KCF, CSK, TLD, fDSST and ROT) in six underground monitoring camera videos, and the lower the numerical value is, the better the experimental effect is. The five contrast classical tracking algorithms are single-view tracking, and the target starting position is manually framed at the initial frame and the target re-identification position of the cross camera. In each graph shown in fig. 3, the abscissa represents the number of frames, and the ordinate represents the error of the center position in pixels, and fig. 3 shows that the tracking accuracy of the method of the present invention is better. FIG. 4 shows a comparative tracking success rate analysis of downhole video experimental results. The graphs in fig. 4 are respectively the comparison of the tracking success rate of the algorithm and five classical tracking algorithms (KCF, CSK, TLD, fdst, ROT) in six downhole monitoring camera videos, and the higher the value is, the better the experimental effect is. The five compared classical tracking algorithms are single view tracking, and the target starting position is manually framed at the initial frame and the target re-identification of the cross camera. In each graph shown in fig. 4, the abscissa represents the number of frames, and the ordinate represents the tracking success rate, and fig. 4 shows that the tracking success rate of the method of the present invention is high.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application merely distinguish similar objects, and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may exchange a specific order or sequence when allowed. It should be understood that "first \ second \ third" distinct objects may be interchanged under appropriate circumstances such that the embodiments of the application described herein may be implemented in an order other than those illustrated or described herein.

The terms "comprising" and "having" and any variations thereof in the embodiments of the present application are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or device that comprises a list of steps or modules is not limited to the listed steps or modules but may alternatively include other steps or modules not listed or inherent to such process, method, product, or device.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A multi-template learning-based underground cross-view target detection tracking method is characterized by comprising the following steps:

s40, determining the position with the maximum response value in the final response image as the target position of the tracking target;

before the performing a correlation filter tracking process on the current frame image obtained by shooting the current view with the initial template to obtain the first response map, the method further includes:

when the first tracked object moves out of the current visual field, acquiring the feature vector of each candidate object in a second candidate frame of a second visual field, selecting a second tracked object from each candidate object according to the feature vector and an object template, and identifying a second object label of the second tracked object; the similarity between the second tracked object and the target template exceeds a similarity threshold; the target template is each tracking object in the previous view field; the second field of view is a field of view in which the first tracked object appears after the first tracked object moves out of the current field of view;

2. The multi-template learning-based downhole cross-view target detection tracking method according to claim 1, wherein the target sample set comprises a labeled target sample set and an unlabeled target sample set;

the determining an initial template from the set of target samples comprises:

3. The method for detecting and tracking targets across the visual field in the well based on multi-template learning according to claim 2, wherein the formula for calculating the probability that each first tracked object in the first candidate frame of the first visual field belongs to each sample in the target sample set comprises:

wherein,

representing the cosine similarity of the first candidate box x and the labeled target sample, lambda representing the smoothness of the control probability distribution, l _i The feature vector representing the ith labeled sample of the labeled target sample set, i =1, …,N _l ，N _l Total number of target tags, u, for unlabeled target sample set _k Feature vector representing kth labeled sample of unlabeled target sample set, k =1, …, N _U ，N _U The total number of target tags in the unlabeled target sample set.

4. The method for detecting and tracking the target across the visual field in the well based on the multi-template learning according to claim 1, wherein the selecting the second tracked object from the candidate targets according to the feature vector and the target template comprises:

5. The method for detecting and tracking the target across the visual field in the well based on the multi-template learning according to claim 4, wherein the similarity calculation formula comprises:

6. The method for detecting and tracking the target across the visual field based on the multi-template learning according to any one of claims 1 to 5, wherein the formula of the linear weighting process comprises:

f'＝γ ₁ f ₁ +γ ₂ f ₂ ，

wherein f' represents the final response diagram, f ₁ Is shown asA response map, f ₂ Denotes a second response diagram, γ ₁ Represents a first weight value, gamma ₂ Representing the second weight.

7. The method for detecting and tracking the target across the visual field based on the multi-template learning according to any one of claims 1 to 5, further comprising:

acquiring a target position of a tracking target in a current frame image;

acquiring the position of a tracking target in a previous frame image to obtain a reference position;

calculating an offset between the target position and a reference position;

8. The method for detecting and tracking the target across the visual field based on the multi-template learning according to any one of claims 1 to 5, further comprising: