CN116385498A

CN116385498A - Target tracking method and system based on artificial intelligence

Info

Publication number: CN116385498A
Application number: CN202310653926.XA
Authority: CN
Inventors: 钟为金; 崔雄文; 王维; 路航
Original assignee: Dfine Technology Co Ltd
Current assignee: Dfine Technology Co Ltd
Priority date: 2023-06-05
Filing date: 2023-06-05
Publication date: 2023-07-04

Abstract

The invention relates to the technical field of target tracking, and discloses a target tracking method and system based on artificial intelligence, wherein the method is used for carrying out fusion target tracking based on a YOLOv4 target detection algorithm and a KCF rapid tracking algorithm: performing target detection by using a YOLOv4 algorithm; and (3) acquiring a target prediction position by adopting a KCF algorithm, and simultaneously taking the current frame as an input image of a YOLOv4 target detection model to perform target retrieval to obtain accurate target detection position and scale information so as to realize target tracking. The invention solves the problems of low tracking success rate, low efficiency and the like in the prior art.

Description

Target tracking method and system based on artificial intelligence

Technical Field

The invention relates to the technical field of target tracking, in particular to a target tracking method and system based on artificial intelligence.

Background

Currently, common target tracking algorithms are YOLOv4 algorithm (deep learning regression detection algorithm) and KCF algorithm (kernel-related filter tracking algorithm). The YOLOv4 algorithm has good detection and tracking capability on complete targets in a simple scene, has strong robustness on scale change, deformation and the like, can solve the problems of target shielding and high-speed maneuvering, but can only detect and track known targets, and the detection and tracking effect is to be improved under the conditions of long distance, small targets and unobvious target characteristics. However, the KCF algorithm does not need to know the kind of the target, but when the target is subjected to scale transformation, shielding and fast movement, a large amount of background information is introduced in the sampling process, and errors are accumulated in the model updating process, so that the tracking frame drifting target is lost. Aiming at the defects in the prior art, the invention adopts the technology of fusing the YOLOv4 and the KCF algorithm, can make up for the respective disadvantages, can exert the respective advantages, improves the success rate of the tracking algorithm, and shows stronger robustness in complex scenes.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a target tracking method and a target tracking system based on artificial intelligence, which solve the problems of low tracking success rate, low efficiency and the like in the prior art.

The invention solves the problems by adopting the following technical scheme:

an artificial intelligence-based target tracking method performs fusion target tracking based on a YOLOv4 target detection algorithm and a KCF rapid tracking algorithm: performing target detection by using a YOLOv4 algorithm; and (3) acquiring a target prediction position by adopting a KCF algorithm, and simultaneously taking the current frame as an input image of a YOLOv4 target detection model to perform target retrieval to obtain accurate target detection position and scale information so as to realize target tracking.

As a preferred technical scheme, the method comprises the following steps:

s1, generating a target detection model: collecting pictures on line and manually labeling targets to form a training data set, and training a YOLOv4 deep learning model by using the training data set to generate a YOLOv4 target detection model;

s2, KCF target tracker training: detecting a target by using a trained YOLOv4 target detection model, acquiring target position and scale information, initializing a KCF target tracker, and training the KCF target tracker;

s3, target tracking: tracking the target by adopting a KCF target tracker to obtain a predicted position; meanwhile, taking the current frame as an input image of the YOLOv4 target detection model, and carrying out target retrieval to obtain target detection position and scale information;

s4, confirming the target position: calculating the intersection ratio of the detection position and the prediction position of the target in the current frame image, and if the intersection ratio is smaller than a preset threshold value, using the target detection position as the target position of the current frame; if the intersection ratio is larger than a preset threshold value or the target is detected to be undetected, using the predicted position as the target position of the current frame, and using the target position and scale information of the current frame to update the KCF target tracker;

s5, video target tracking: and repeating the steps S3 to S4 for the next frame of image to realize the tracking of the target.

In step S1, the YOLOv4 deep learning model includes a backbone network CSP network structure, a spatial pyramid pooling layer, a path aggregation network, and a YOLO frame header; the backbone network CSP network structure is used for extracting features of an original image and outputting 3-scale feature graphs; the space pyramid pooling layer and the path aggregation network are used for carrying out feature fusion on the feature graphs with 3 scales; the YOLO frame header is used for predicting the feature map after feature fusion.

As a preferred technical solution, step S3 includes the following steps:

s31, establishing a KCF tracker model: establishing an objective function and an objective of a KCF tracker model;

s32, online matching: obtaining a frequency domain representation of the response value by using the sampling sample and the training sample;

s33, updating a KCF tracker template: the KCF tracker model parameters are updated.

As a preferred technical solution, in step S31, an objective function is established by a ridge regression method:

the object is to minimize the distance between the sampled data and the real object position of the next frame, and the expression is:

；

in the method, in the process of the invention,

representing sample variable, ++>

Representing the number of sample data>

The expression number is->

Sample characteristics of->

Represents the conjugate transpose->

Representing an objective function +.>

Representing regularization parameters, ++>

The representative column vector represents the weight coefficient,

representing regular items->

Representing sample characteristics->

Is a tag value of (2);

for a pair of

Differentiating to make the derivative be 0, and writing the minimum value of the obtained loss function into a complex domain form as follows:

；

in the method, in the process of the invention,

representing row vectors +.>

The representation is a column vector; />

The transposed matrix of the conjugate complex number of (2) is->

。

Using the diagonalized nature of the cyclic matrix

Representation in the Fourier domain->

；

In the method, in the process of the invention,

representation->

Representation in the frequency domain, < >>

Representation->

Representation in the frequency domain, < >>

Representing multiplication by element.

As a preferred embodiment, in step S31, a Gaussian kernel function is introduced

Will->

Solution to transform into high-dimensional weights +.>

Is solved by (1):

；

in the method, in the process of the invention,

a kernel matrix representing a kernel space;

the representation in the frequency domain is: />

；

In the method, in the process of the invention,

representation->

Representation in the frequency domain, < >>

Representation->

Is the fourier transform of the first row of (c).

As a preferred embodiment, in step S32, the frequency domain of the response value is represented as:

；

in the method, in the process of the invention,

representing the kernel matrix->

First row->

Nuclear matrix representing similarity of sampled samples and training samples, which is transformed by Fourier inversion>

Conversion from frequency domain to time domain->

Find->

The position corresponding to the maximum target.

As a preferred technical solution, in step S33, the model parameters at the past time are sampled and combined, and the update formula of the model parameters added by the bilinear interpolation method is updated as follows:

，

；

in the method, in the process of the invention,

representing a new training sample set,/->

Representing update step size, +.>

Representing the old sample set, +.>

Representing the filter parameters +.>

Representing training sample set, ++>

Representing the old training sample set, +.>

Representation->

Representation in the frequency domain.

As a preferable technical solution, in step S4, a calculation formula of the intersection ratio of the detected position and the predicted position of the target is:

，

，

，

，

，

，

，

，

=/>

，

=/>

，

；

in the method, in the process of the invention,

indicating the cross ratio of the detected position and the predicted position of the target,/->

Representing the area of the target frame +.>

Indicating the area of the detection frame +.>

An abscissa representing the vertex on the target frame located on the rectangle where the target frame intersects the detection frame,/->

Representing the ordinate of the vertices on the target frame that lie on the rectangle where the target frame intersects the detection frame,

an abscissa representing a vertex on the target frame that is on the same diagonal as a vertex on a rectangle that intersects the target frame and the detection frame, +.>

An ordinate representing the vertex on the target frame that is on the same diagonal as the vertex on the rectangle where the target frame intersects the detection frame, +.>

An abscissa representing the vertex on the rectangle on the detection frame where the target frame intersects the detection frame,/->

Ordinate representing the vertex on the detection frame located on the rectangle where the target frame intersects the detection frame,/->

An abscissa representing the vertex on the detection frame that is on the same diagonal as the vertex on the rectangle that intersects the target frame and the detection frame, +.>

Representing the ordinate of the vertex on the detection frame that is on the same diagonal as the vertex on the rectangle that intersects the target frame and the detection frame, +.>

An abscissa representing one diagonal point of a rectangle where the target frame intersects the detection frame, +.>

Ordinate representing one diagonal point of a rectangle where the target frame intersects the detection frame, +.>

Another diagonal point of the rectangle representing the intersection of the target frame and the detection frame, abscissa, +.>

Another diagonal point of the rectangle representing the intersection of the target frame and the detection frame, ordinate, +.>

Representing the length of the target frame +.>

Representing the width of the target frame +.>

Indicating the length of the detection frame, < >>

Indicating the width of the detection frame.

An artificial intelligence-based target tracking system for realizing the artificial intelligence-based target tracking method comprises the following modules connected in sequence:

the target detection model generation module: the method comprises the steps of collecting pictures offline and manually labeling targets to form a training data set, and training a YOLOv4 deep learning model by using the training data set to generate a YOLOv4 target detection model;

KCF target tracker training module: the method comprises the steps of detecting a target by using a trained YOLOv4 target detection model, acquiring target position and scale information, initializing a KCF target tracker, and training the KCF target tracker;

a target tracking module: the method comprises the steps of tracking a target by adopting a KCF target tracker to obtain a predicted position; meanwhile, taking the current frame as an input image of the YOLOv4 target detection model, and carrying out target retrieval to obtain target detection position and scale information;

a target position confirmation module: the method comprises the steps of calculating the intersection ratio of a detection position and a prediction position of a target in a current frame image, and taking the target detection position as the target position of the current frame if the intersection ratio is smaller than a preset threshold; if the intersection ratio is larger than a preset threshold value or the target is detected to be undetected, using the predicted position as the target position of the current frame, and using the target position and scale information of the current frame to update the KCF target tracker;

video target tracking module: the method is used for repeatedly tracking the target and confirming the position of the target for the next frame of image, so that the target is tracked.

Compared with the prior art, the invention has the following beneficial effects:

(1) The YOLOv4 adopted by the invention is an end-to-end real-time target detection algorithm based on deep learning, and is tested on an MScoco data set by using a Tesla V100 graphics card, so that the precision of 43.5% mAP (65.7% AP) can be achieved, the speed of 65FPS is achieved, and compared with the detection precision AP and the speed FPS of EfficientDet and spineNet, the detection precision AP and the speed FPS are respectively improved by 10% and 12%, and the improvement effect is remarkable; the Yolov4 extracts target characteristics through a deep convolution network, so that a weak and small target in a photoelectric image can be effectively detected; in addition, YOLOv4 is based on multi-scale target detection, so that the influence caused by scale change of a target in the detection process is overcome, and the accuracy and the robustness of target detection are improved;

(2) The KCF algorithm adopted by the invention also carries out a large amount of study on background information, the classifier has high accuracy in distinguishing between the background and the target, and has robustness in a complex environment, and the method is excellent in the aspect of disclosure of a data set at present; meanwhile, the KCF algorithm adopts an on-line training strategy, so that a large number of target samples do not need to be prepared in advance for training the model.

Drawings

FIG. 1 is a schematic diagram of steps of an artificial intelligence based target tracking method according to the present invention;

FIG. 2 is a schematic diagram showing the intersection ratio of the detected position and the predicted position of the target.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

Example 1

As shown in fig. 1 to 2, the invention is mainly applied to an anti-unmanned aerial vehicle system, wherein a radar and a radio monitoring system are responsible for searching and finding a target, a photoelectric system controls a pan-tilt lens according to target angle and distance data provided by the radar to complete the tasks of target detection, target locking and target tracking, and the invention mainly completes the algorithm implementation of realizing the rapid target tracking and target locking.

In order to solve the problems in the prior art, the invention provides an unmanned aerial vehicle target tracking algorithm method based on artificial intelligence, namely a fusion target tracking algorithm based on a YOLOv4 target detection algorithm and a KCF rapid tracking algorithm, which can effectively improve the success rate and efficiency of the tracking algorithm.

The target detection of the invention adopts YOLOv4 to carry out auxiliary target searching, locking and automatic tracking. YOLOv4 is an end-to-end deep learning regression detection algorithm, and is the target detection algorithm with the most balanced speed and precision so far. By the integration of a plurality of advanced methods, short plates (with high speed, small objects which are not good at detection, and the like) of the YOLO series are fully supplemented, so that the surprising effect and the supergroup speed are achieved. Compared with three main stream target detection algorithms of YOLOv4 and EfficientDet, spineNet, YOLOv4 is an end-to-end real-time target detection algorithm based on deep learning, and is tested on an MScoco data set by using a Tesla V100 video card, so that the accuracy of 43.5% mAP (65.7% AP) can be achieved, the speed of 65FPS can be achieved, and compared with the detection accuracy of AP and speed of FPS of EffiientDet and spineNet, the detection accuracy of AP and speed of FPS are respectively improved by 10% and 12%, and the improvement effect is remarkable. The Yolov4 extracts target characteristics through a deep convolution network, so that a weak and small target in a photoelectric image can be effectively detected; in addition, YOLOv4 is based on multi-scale target detection, so that the influence caused by the scale change of the target in the detection process is overcome, and the accuracy and the robustness of the target detection are improved.

The object tracking of the present invention predicts the size and position of an object in a subsequent frame given the object size and position of an initial frame of a video sequence. Detecting a target by using a trained YOLOv4 target detection model, acquiring target position and scale information, initializing a KCF target tracker, and training the KCF target tracker. The KCF algorithm adopts an on-line training strategy, and a large number of target samples do not need to be prepared in advance for training the model. In the tracking process, a target tracker is trained based on the current frame of the video, the target position of the next frame is determined by using the tracker, and then the tracker is updated with the new target position, so that continuous tracking of the target is realized through iteration. The fusion target tracking algorithm based on the YOLOv4 and KCF algorithms comprises the following steps, see fig. 1 for details:

step S1: and collecting pictures on line, and performing target manual labeling to form a training data set, and training the YOLOv4 deep learning model by using the training data set to obtain a target detection model. The YOLOv4 deep learning model mainly consists of three parts, namely CSPDarknet53 (CSP network structure), SPP (spatial pyramid pooling layer) +PANeT (Path aggregation network) and YOLO Head (YOLO frame Head): SPDarknet53 is used as a backbone network of the YOLOv4 algorithm and is responsible for extracting features of an original image and outputting feature images with 3 scales; SPP+PANeT is responsible for carrying out feature fusion on the feature graphs of 3 scales extracted by the backbone network; and predicting the YOLO Head by using the feature map after feature fusion. The learning model is based on an end-to-end real-time target detection algorithm of deep learning, and can rapidly improve the target detection precision and speed;

step S2: detecting a specific target by using a trained target detection model, acquiring target position and scale information, initializing a KCF target tracker, and training the KCF target tracker;

step S3: tracking a specific target by adopting a KCF algorithm to obtain a predicted position; meanwhile, the current frame is used as an input image of a YOLOv4 target detection model, target retrieval is carried out, and accurate target detection position and scale information are obtained; the main steps of the KCF algorithm are 3 links of model establishment, online matching and template updating.

The first link is as follows: establishing a model, and establishing an objective function in a ridge regression mode:

the goal is to minimize the distance of the sampled data from the real target location of the next frame:

wherein: />

Representing sample variable, ++>

Representing the number of sample data>

The expression number is->

Sample characteristics of->

Represents the conjugate transpose->

Representing an objective function +.>

Representing regularization parameters, ++>

Representing column vectors representing weight coefficients, +.>

Representing regular items->

Representing sample characteristics->

Is a label value of (a).

Preventing the model from over fitting. Differentiating W to make the derivative be 0, and obtaining the minimum value by the loss function:

；

representing row vectors +.>

The representation is a column vector; />

The transposed matrix of the conjugate complex number of (2) is->

。

The diagonalized nature of the cyclic matrix is used to derive a representation of W in the fourier domain,

the method comprises the steps of carrying out a first treatment on the surface of the In (1) the->

Representation->

Representation in the frequency domain, < >>

Representation->

Representation in the frequency domain, < >>

Representing multiplication by element.

For most cases the solution of W is a nonlinear problem by introducing a Gaussian kernel function

Converting the solution of w into a high-dimensional weight in a high-dimensional space>

:

；

Wherein the method comprises the steps of

A kernel matrix representing a kernel space.

αRepresentation in the frequency domain:

wherein->

Is the fourier transform of the first row of the K matrix.

And a second link: on-line matching, definition

Is a kernel for representing the similarity between the sampled sample and the training sample in the kernel spaceMatrix, the sampling sample and the training sample are subjected to related operation, and frequency domain representation of the response value is obtained: />

Wherein the method comprises the steps of

Is a nuclear matrix->

Is to be +.>

Conversion from frequency domain to time domain->

Find->

The position corresponding to the maximum value is the obtained position.

And a third link: main pairs in KCF tracker template updating process

And training sample set +.>

Updating is carried out, after the algorithm is executed, a new target prediction position is obtained, a new base sample is obtained, and a cyclic matrix is generated to obtain a new sample set->

Then training to obtain new +.>

Finally, the update step length is set by using the model parameter of the last frame and using the linear interpolation method>

Updating the tracker, sampling and combining model parameters at the past momentThe number is added to the updating process of the model parameters by using a bilinear interpolation method:

in the method, in the process of the invention,

representing a new training sample set,/->

Representing update step size, +.>

Representing the old sample set, +.>

Representing the filter parameters +.>

Representing training sample set, ++>

Representing the old training sample set, +.>

Representation->

Representation in the frequency domain.

Step S4: calculating the intersection ratio of the detection position and the prediction position of the target in the current frame image

The intersection ratio refers to the ratio of the intersection of the target frame and the detection frame to the area of the union; defining the diagonal coordinates of rectangle A and rectangle B as

、/>

At the same time, the diagonal coordinates of the intersection rectangle are defined as +.>

Then the method of calculating the diagonal coordinates of the intersection rectangle is as follows:

；

；

；

；

the intersection and union are then calculated as follows:

；

；

；

；

=/>

；

=/>

；

=/>

；

the schematic diagram is shown in fig. 2.

The above calculation can be used to calculate the intersection ratio of the detected position and the predicted position of the target. If the cross ratio is smaller than the preset threshold, using the target detection position as the target position of the current frame; if the cross ratio is larger than a preset threshold value or the target is detected to be undetected, using the predicted position as the target position of the current frame; updating the KCF target tracker by using the current frame target position and scale information;

step S5: and repeating the steps S3 to S4 for the next frame of image to realize the tracking of the video target.

According to the invention, a fusion target tracking algorithm based on the YOLOv4 target detection algorithm and the KCF rapid tracking algorithm is adopted, optimal results are obtained in detection precision and tracking stability, and the method has strong robustness and instantaneity, so that the method can be used for rapidly, efficiently and reliably tracking and locking the target, and effectively improving the overall performance of the unmanned aerial vehicle system. The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

As described above, the present invention can be preferably implemented.

All of the features disclosed in all of the embodiments of this specification, or all of the steps in any method or process disclosed implicitly, except for the mutually exclusive features and/or steps, may be combined and/or expanded and substituted in any way.

The foregoing description of the preferred embodiment of the invention is not intended to limit the invention in any way, but rather to cover all modifications, equivalents, improvements and alternatives falling within the spirit and principles of the invention.

Claims

1. The target tracking method based on artificial intelligence is characterized in that fusion target tracking is performed based on a YOLOv4 target detection algorithm and a KCF rapid tracking algorithm: performing target detection by using a YOLOv4 algorithm; and (3) acquiring a target prediction position by adopting a KCF algorithm, and simultaneously taking the current frame as an input image of a YOLOv4 target detection model to perform target retrieval to obtain accurate target detection position and scale information so as to realize target tracking.

2. The artificial intelligence based object tracking method of claim 1, comprising the steps of:

3. The method of claim 2, wherein in step S1, the YOLOv4 deep learning model includes a backbone network CSP network structure, a spatial pyramid pooling layer and path aggregation network, and a YOLO frame header; the backbone network CSP network structure is used for extracting features of an original image and outputting 3-scale feature graphs; the space pyramid pooling layer and the path aggregation network are used for carrying out feature fusion on the feature graphs with 3 scales; the YOLO frame header is used for predicting the feature map after feature fusion.

4. An artificial intelligence based object tracking method according to claim 3, characterised in that step S3 comprises the steps of:

5. The artificial intelligence based object tracking method according to claim 4, wherein in step S31, the object function is established by a ridge regression method:

；

in the method, in the process of the invention,

representing sample variable, ++>

Representing the number of sample data>

The expression number is->

Sample characteristics of->

Represents the conjugate transpose->

Representing an objective function +.>

Representing regularization parameters, ++>

Representing column vectors representing weight coefficients, +.>

Representing regular items->

Representing sample characteristics->

Is a tag value of (2);

for a pair of

；

in the method, in the process of the invention,

representing row vectors +.>

The representation is a column vector; />

The transposed matrix of the conjugate complex number of (2) is->

；

Using the diagonalized nature of the cyclic matrix

Representation in the Fourier domain->

；

In the method, in the process of the invention,

representation->

Representation in the frequency domain, < >>

Representation->

Representation in the frequency domain, < >>

Representing multiplication by element.

6. The artificial intelligence based object tracking method of claim 5, wherein the artificial intelligence based object tracking method comprises the following stepsIn step S31, a Gaussian kernel function is introduced

Will->

Solution to transform into high-dimensional weights +.>

Is solved by (1):

；

in the method, in the process of the invention,

a kernel matrix representing a kernel space;

the representation in the frequency domain is: />

；

In the method, in the process of the invention,

representation->

Representation in the frequency domain, < >>

Representation->

Is the fourier transform of the first row of (c).

7. The artificial intelligence based object tracking method of claim 6, wherein in step S32, the frequency domain of the response value is expressed as:

；

in the method, in the process of the invention,

representing the kernel matrix->

First row->

Conversion from frequency domain to time domain->

Find->

The maximum value is the position corresponding to the target.

8. The method according to claim 7, wherein in step S33, the model parameters at the past time are sampled and combined, and the update process of adding the model parameters by using the bilinear interpolation method updates the formula as follows:

，

；

in the method, in the process of the invention,

representing a new training sample set,/->

Representing update step size, +.>

Representing the old sample set, +.>

Representing the filter parameters +.>

Representing training sample set, ++>

Representing the old training sample set, +.>

Representation->

Representation in the frequency domain.

9. The method according to any one of claims 2 to 8, wherein in step S4, a calculation formula of an intersection ratio of the detected position and the predicted position of the target is:

，

，

，

，

，

，

，

，

=/>

，

=/>

，

；

in the method, in the process of the invention,

Surface representing target frameAccumulation of pathogenic qi>

Indicating the area of the detection frame +.>

Ordinate representing the vertex on the rectangle on the target frame intersecting the detection frame,/->

Representing the ordinate of the vertices on the detection frame that lie on the rectangle where the target frame intersects the detection frame,

Representing the rectangle on the detection frame intersecting the target frame and the detection frameIs on the ordinate of the vertex of the same diagonal,/->

Representing the length of the target frame +.>

Representing the width of the target frame +.>

Indicating the length of the detection frame, < >>

Indicating the width of the detection frame.

10. An artificial intelligence based object tracking system for implementing an artificial intelligence based object tracking method according to any one of claims 1 to 9, comprising the following modules connected in sequence: