CN115272409A

CN115272409A - Single-target long-time tracking method based on deep neural network

Info

Publication number: CN115272409A
Application number: CN202210831023.1A
Authority: CN
Inventors: 黄飞; 彭蓉华; 宁平
Original assignee: Changsha Chaochuang Electronic Technology Co ltd
Current assignee: Changsha Chaochuang Electronic Technology Co ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-11-01

Abstract

The invention belongs to the technical field of target tracking, and discloses a single-target long-time tracking method based on a deep neural network, which comprises the following steps: acquiring a tracking target image and a tracking search area image, constructing a twin network-based feature extraction network, and extracting features; in the tracking process, selecting a target image through PNR (portable network noise ratio) scores, adding the target image into a template library, sending the target image into a model learning network for model learning and updating, convolving a model target feature map and a search feature map to obtain a response map, sending the response map into a scale regression network to obtain a scale position score map, and obtaining target scales; when the target is blocked or lost, target recapturing is carried out; and in the tracking process, recording the information of the target and the suspected target, and selecting a tracking position which is most matched with the historical track when an interfering object occurs. The invention reduces the model drift phenomenon; the scale estimation is more stable and accurate; and a re-searching mechanism and an anti-interference matching technology are adopted, so that the algorithm is suitable for long-time target tracking.

Description

Single-target long-time tracking method based on deep neural network

Technical Field

The invention belongs to the technical field of visual target tracking, and particularly relates to a single-target long-time tracking method based on a deep neural network.

Background

The single-target tracking refers to tracking a specified target in a video, wherein the target is specified by people at the beginning, and the tracking algorithm estimates the position and the scale of the specified target by utilizing the context information of the video or an image sequence. The single target tracking algorithm has wide application fields, including intelligent video monitoring, unmanned driving, the field of robots, the field of unmanned aerial vehicles and the like. The application scenes of the target tracking algorithm are diversified, for example, in shopping malls with a lot of people, roads, highways, sky and the like, so that the target tracking algorithm faces some difficult challenges, such as illumination change, target deformation, occlusion, complex background and the like.

At present, in a tracking algorithm based on a deep neural network, the tracking algorithm based on a twin network has good performance in speed and accuracy and is widely researched. Attention-based methods, RPN-network-scale-estimation-based methods, spatio-temporal modeling-based methods, etc. have emerged. However, in the real long-term tracking process, the appearance of the target usually changes dynamically, even changes rapidly, the target is shielded, and the like.

Disclosure of Invention

Aiming at the existing problems, the invention provides a template base updating and scale regression network scale estimation based on RMAD (namely PNR value) and an anti-interference long-time tracking algorithm based on Hungarian matching algorithm so as to improve the tracking accuracy and robustness of the tracking algorithm under the conditions of target deformation, target occlusion and the like.

The invention discloses a single-target long-time tracking method based on a deep neural network, which comprises the following steps:

acquiring a tracking target image and a tracking search area image, and constructing a twin network-based feature extraction network for feature extraction;

an online learning model: in the tracking process, selecting a target image through PNR score, adding the target image into a template library, and then sending the target image into a model learning network for model learning and updating;

and (3) target scale estimation: convolving the model target characteristic diagram and the search characteristic diagram to obtain a response diagram, sending the response diagram into a scale regression network to obtain a scale position score diagram, and obtaining a target scale;

target recapture: in the tracking process, the state of the target is judged, whether the target is shielded or lost is judged, and when the target is lost, the target searching range is expanded, and target recapture is carried out;

anti-interference matching and tracking: in the tracking process, the position, the scale, the confidence coefficient and the textural feature information of the target and the suspected target are recorded, when an interfering object occurs, the Hungarian matching algorithm is adopted to track and match the currently estimated tracking position and the historical tracking track, and the tracking position which is most matched with the historical track is selected, so that the influence of the interfering object is eliminated, and the long-term tracking of the target is guaranteed.

Further, the twin network employs the ResNet50 as a backbone network.

Further, the loss function of the model learning network is:

wherein v is_jIs a feature map of the training image, c_jRepresenting the coordinates of the centre of the object, the purpose being to find the model f, the object image being represented by A_cTo indicate that a is a background pixel when a pixel in the image is a background pixel _c0, when A is the target_c1,s =vf, representing target confidence score, y_cScore, y, representing each coordinate position_cInitialization with a Gaussian distribution, y_c,A_c,η_cThe parameters are learned by on-line training.

Further, the calculation formula of the PNR is as follows:

wherein S'_maxRepresents the maximum value, S 'in the classification score response map S'_minRepresents the minimum value, M, in the classification score response map S_SDenotes the mean value of S' after removal of the maximum response value, T_PNRIs the mean of all historical PNR values, T_S′Is all historical S'_maxIs measured.

Further, the scale regression network is trained according to the following algorithm:

the loss function is:

L_CIOU＝-ln(CIOU)

wherein L is_CIOUIs the CIOU loss in terms of (x)_l,y_l) The top left corner of the real target box is represented by (x)_r,y_r) The lower right corner representing the real target frame, (x)_c,y_c) Represents the coordinates of the center point, then

And obtaining the highest-score target position according to the classification score map, and taking the mean value of top-k target frames around the highest-score target position as the frame of the final target.

Further, the target state judgment is carried out according to the classification score of the target: when the score is less than 0.25, judging that the target is blocked or the target is not found currently, starting a counter, when the counter is more than 5, indicating that the target is lost, expanding the search area by 1 time every 2 frames according to the size of the target, searching the target by using a classification network, and when the score of the target is more than 0.8, considering that the target is found; when the search time exceeds a certain number of frames, the target is considered lost.

Further, when a plurality of suspected targets appear in the tracking process, information of the suspected targets is stored, wherein the information comprises the position of the center point, the scale, the classification confidence coefficient and the SILTP characteristics.

Further, for a given location (x)_c,y_c) The coding formula of the SILTP characteristic is as follows:

wherein g is_cGray value of the central pixel, g_kFor the other k pixels in a circle of radius R within the central pixel field, s is the piecewise function, e =0.3:

further, the cost in the Hungarian matching algorithm is calculated by cosine similarity as similarity between vectors,

wherein obj_iThe vector is composed of the position, the scale, the classification confidence coefficient and the SILTP characteristic value of the center point of the suspected target, tr_TThe vector is the center point position, the scale, the classification confidence coefficient and the SILTP characteristic value of the nearest point of the tracking track.

Compared with the prior art, the invention has the following beneficial effects:

the template base is updated by adding a reliable target template into the template base according to the PNR value of the classification score response graph, so that the model drift phenomenon is reduced;

the scale estimation adopts a scale regression network to ensure that the scale is more stable and accurate;

a long-term tracking strategy of the target is designed, and after the target is lost, the dynamic expansion of the search range and the re-identification are started;

and an anti-interference matching algorithm is adopted, so that similar interference is reduced, and the tracking algorithm can track the target for a long time.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a classification score plot for the occurrence of an analog;

FIG. 3 is a block diagram of the deep neural network of the present invention;

FIG. 4 is a graph of the tracking results of the present invention.

Detailed Description

The present invention is further described with reference to the drawings, but the present invention is not limited thereto in any way, and any modifications or alterations based on the teaching of the present invention shall fall within the scope of the present invention.

In order to achieve the purpose, the invention adopts the following technical scheme:

s1, acquiring a tracking target image and a tracking search area image, and constructing a twin network-based feature extraction network for feature extraction.

S2, an online learning model: in the tracking process, a target image is selected through the PNR score and added into a template library, and then the target image is sent into a model learning network for model learning and updating.

S3, target scale estimation: and (4) convolving the model target characteristic diagram and the search characteristic diagram to obtain a response diagram, and sending the response diagram into a scale regression network to obtain a scale position score diagram and obtain a target scale.

S4, target recapture: and in the tracking process, judging the state of the target, judging whether the target is shielded or lost, and enlarging the target searching range to perform target recapture when the target is lost.

S5, anti-interference matching tracking: in the tracking process, the positions, the dimensions, the confidence degrees and the texture feature information of the target and the suspected target are recorded, when an interfering object occurs, the Hungarian matching algorithm is adopted to track and match the currently estimated tracking position and the historical tracking track, and the tracking position which is most matched with the historical track is selected, so that the influence of the interfering object is eliminated, and the long-term tracking of the target is guaranteed.

The invention is further described below with reference to the accompanying drawings.

Example 1

As shown in FIG. 1, the invention constructs a single-target long-time tracking algorithm based on an RMAD deep neural network, and the basic idea is to extract the depth characteristics of a target and a search area by using a twin network, establish a template library for the target characteristics, obtain the target template characteristics by adopting an online learning model method, convolve the target template characteristics and the search area characteristics to obtain a classification score map, and obtain the position information of the target. Then, determining the target size by using a scale regression network; and by a tracking state judgment mechanism, the search range is expanded to search again when the target is lost. When the interferents appear, tracking and matching are carried out by adopting a Hungarian matching algorithm, and the influence of the interferents is eliminated.

The method specifically comprises the following steps:

s1, constructing a feature extraction network based on a twin network: the resnet50 is used as a backbone network, the input of the network is composed of a tracking target image and a tracking search area image, and a characteristic diagram of the tracking target and a characteristic diagram of a search area are obtained after the input of the network.

S2, after the result of the first step is obtained, sending the feature map into a classification network to obtain a classification score map: the classification scoring network consists of an online learning model network and a related convolution network. The model network input of online learning is a template set with 50 frames of target feature templates, the template set comprises 15 frames of initial target feature templates and target feature templates in the subsequent tracking process, and online learning is performed every 20 frames. The model network for online learning is specifically trained according to the following algorithm:

1) It employs a loss function of

Wherein v is_jIs a feature map of the training image, c_jRepresenting the target center coordinates. The objective is to find the model f, A for the target image_cTo indicate that a is a background pixel when a pixel in the image is a background pixel _c0, when A is the target_c1.s = v f, representing the target confidence score, y_cScore, y, representing each coordinate position_cThe initialization is performed using a gaussian distribution. Wherein y is_c,A_c,η_cThe parameters are learned by on-line training.

And then judging whether the target characteristic image is added into the template library or not according to the PNR peak signal-to-noise ratio score of the classification score response graph S'. When the PNR value is larger than the threshold value T_PNRAnd the classification score is also greater than the threshold T_S′And adding the current target feature graph into the template library. When the template library is full, the oldest template except the initial template is replaced, and the calculation formula of PNR is as follows:

wherein S'_maxRepresents the maximum value, S ', in the classification score response map S'_minRepresents the minimum value, M, in the classification score response map S_SDenotes the mean value of S' after removal of the maximum response value, T_PNRIs the mean of all historical PNR values, T_S′Is all historical S'_maxIs measured.

S3, calculating a rectangular frame of the tracking target through a scale regression network: obtaining the model target characteristic diagram in the step (2), carrying out convolution on the model target characteristic diagram and the search characteristic diagram to obtain a response diagram, and sending the response diagram into a scale regression network to obtain a four-dimensional position scoring diagram A_w×h×4Where each point is composed of a four-dimensional vector, and t (i, j) = (l, t, r, b) represents the distance from the target center point (i, j) to the four sides of the target rectangular frame. The scale regression network is specifically trained according to the following algorithm:

having a loss function of

L_CIOU＝-ln(CIOU)

And obtaining the position of the target with the highest score according to the classification score map, and taking the mean value of top-k target frames around the position of the target as the frame of the final target.

S4, judging the target state according to the classification score of the target: and when the score is less than 0.25, judging that the target is blocked or the target is not found currently. And starting a counter, when the counter is more than 5, indicating that the target is lost, expanding the search area by 1 time every 2 frames according to the size of the target, searching the target by using a classification network, and when the target score is more than 0.8, considering that the target is found. When the search time exceeds a certain number of frames, the target is considered lost.

S5, anti-interference matching: when a plurality of peak points higher than a certain threshold appear on the classification score map, as shown in fig. 2, it indicates that a plurality of suspected targets appear in the tracking process, and at this time, the information obj of the suspected targets is used_iAll the information is stored, the information comprises the position of a central point, the scale, the classification confidence coefficient and the SILTP characteristic, and the Hungarian matching algorithm is used for tracking the historical track tr of the target_T(p₁,p₂,…p_T) And matching to obtain a final tracking result. The SILTP (Scale Invariant Local texture Pattern) operator is an improved version of the LBP operator, has stronger robustness to noise and illumination change of a Local image, and has stronger texture recognition characteristic. For a given position (x)_c,y_c) The coding formula of SILTP is：

Wherein g is_cGray value of the central pixel, g_kThe other N pixels of R (circle of radius) within the central pixel domain, s is the piecewise function, e =0.3:

the method for extracting the target texture features comprises the following steps: and (3) taking a target image, carrying out gray level conversion, calculating an SILTP value of each pixel point in the target image, and adopting 4 neighborhoods for anticlockwise calculation. The cost in Hungarian matching is calculated by cosine similarity, and the formula is as follows:

wherein obj_iThe vector is the information of the suspected target (center point position, scale, classification confidence, SILTP eigenvalue), tr_TThe vector is the information of the nearest point of the tracking trajectory (center point position, scale, classification confidence, SILTP feature value).

and an anti-interference matching algorithm is adopted, so that similar interference is reduced, and a tracking algorithm can track the target for a long time.

The word "preferred" is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as "preferred" is not necessarily to be construed as advantageous over other aspects or designs. Rather, use of the word "preferred" is intended to present concepts in a concrete fashion. The term "or" as used in this application is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless specified otherwise or clear from context, "X employs A or B" is intended to include either of the permutations as a matter of course. That is, if X employs A; b is used as X; or X employs both A and B, then "X employs A or B" is satisfied in any of the foregoing examples.

Also, although the disclosure has been shown and described with respect to one or an implementation, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present disclosure includes all such modifications and alterations, and is limited only by the scope of the appended claims. In particular regard to the various functions performed by the above described components (e.g., elements, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the disclosure. In addition, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or other features of the other implementations as may be desired and advantageous for a given or particular application. Furthermore, to the extent that the terms "includes," has, "" contains, "or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term" comprising.

Each functional unit in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or a plurality of or more than one unit are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium. The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Each apparatus or system described above may execute the storage method in the corresponding method embodiment.

In summary, the above-mentioned embodiment is an implementation manner of the present invention, but the implementation manner of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be regarded as equivalent replacements which are included in the protection scope of the present invention.

Claims

1. A single-target long-time tracking method based on a deep neural network is characterized by comprising the following steps:

acquiring a tracking target image and a tracking search area image, constructing a twin network-based feature extraction network, and extracting features;

in the tracking process, selecting a target image through PNR score, adding the target image into a template library, and sending the target image into a model learning network for model learning and updating to obtain a model target characteristic diagram and a search characteristic diagram;

convolving the model target characteristic diagram and the search characteristic diagram to obtain a response diagram, sending the response diagram into a scale regression network to obtain a scale position score diagram, and obtaining a target scale;

in the tracking process, the target state is judged, whether the target is shielded or lost is judged, and when the target is shielded or lost, the target searching range is expanded, and target recapture is carried out;

in the tracking process, the position, the scale, the confidence coefficient and the textural feature information of the target and the suspected target are recorded, when an interfering object occurs, the Hungarian matching algorithm is adopted to track and match the currently estimated tracking position and the historical tracking track, and the tracking position which is most matched with the historical track is selected, so that the influence of the interfering object is eliminated, and the long-term tracking of the target is guaranteed.

2. The deep neural network-based single-target long-time tracking method according to claim 1, wherein the twin network adopts ResNet50 as a backbone network.

3. The deep neural network-based single-target long-time tracking method according to claim 1, wherein the loss function of the model learning network is as follows:

wherein v is_jIs a feature map of the training image, c_jRepresenting the coordinates of the centre of the object, the purpose being to find the model f, the object image being represented by A_cTo indicate that a is a background pixel when a pixel in the image is a background pixel_cAbout 0, when A is the target_c1,s =vf, representing target confidence score, y_cScore, y, representing each coordinate position_cInitialization with a Gaussian distribution, y_c,A_c,η_cThe parameters are learned by on-line training.

4. The deep neural network-based single-target long-time tracking method according to claim 1, wherein the calculation formula of the PNR is as follows:

5. The deep neural network-based single-target long-time tracking method according to claim 1, wherein the scale regression network is trained according to the following algorithm:

the loss function is:

L_CIOU＝-ln(CIOU)

wherein L is_CIOUIs the CIOU loss, expressed as (x)_l,y_l) The top left corner of the real target box is represented by (x)_r,y_r) The lower right corner representing the real target frame, (x)_c,y_c) Represents the coordinates of the center point, then

6. The deep neural network-based single-target long-time tracking method according to claim 1, wherein the target state judgment is performed according to the classification score of the target: when the score is less than 0.25, judging that the target is blocked or the target is not found currently, starting a counter, when the counter is more than 5, indicating that the target is lost, expanding the search area by 1 time every 2 frames according to the size of the target, searching the target by using a classification network, and when the score of the target is more than 0.8, considering that the target is found; when the search time exceeds a certain number of frames, the target is considered lost.

7. The single-target long-time tracking method based on the deep neural network as claimed in claim 1, wherein when a plurality of suspected targets appear in the tracking process, information of the suspected targets is saved, wherein the information includes a central point position, a scale, a classification confidence and an SILTP feature.

8. The deep neural network-based single-target long-time tracking method of claim 7, wherein for a given position (x)_c,y_c) The coding formula of the SILTP characteristic is as follows:

9. the single-target long-term tracking method based on the deep neural network as claimed in claim 8, wherein the cost in the Hungarian matching algorithm is calculated by cosine similarity as similarity between vectors,