CN113610895A

CN113610895A - Target tracking method and device, electronic equipment and readable storage medium

Info

Publication number: CN113610895A
Application number: CN202110903265.2A
Authority: CN
Inventors: 徐召飞; 金荣璐; 刘振辉
Original assignee: Iray Technology Co Ltd
Current assignee: Iray Technology Co Ltd
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-11-05

Abstract

The application discloses a target tracking method, a target tracking device, electronic equipment and a readable storage medium. The method comprises the steps of obtaining detection frames of all targets to be tracked of a video to be processed, and using local texture features of a previous frame image as supplementary information of all detection frames of a current frame image. Generating corresponding tracking frames based on the detection frames carrying the supplementary information, and calculating the detection frames of the targets to be tracked and the multi-scale local features of the tracking frames; based on the multi-scale local features, the image matching algorithm is utilized to carry out multiple circular matching on each detection frame and each tracking frame, and multi-target tracking is carried out based on the detection frames and the tracking frames which are successfully matched, so that the accuracy of the multi-target tracking is effectively improved, and the service requirement of actual high-precision target tracking can be met.

Description

Target tracking method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of visual image processing technologies, and in particular, to a target tracking method and apparatus, an electronic device, and a readable storage medium.

Background

As is well known, in a good weather condition, a visible light image collected by a common visible device can meet the requirements of most image processing fields, however, in a condition of insufficient illumination conditions such as rainy days, foggy days or nights, the imaging effect of the visible light device is poor, the obtained visible light image is blurred in imaging, and the rear-end image processing is not facilitated. The long-wave infrared utilizes the passive imaging principle of the infrared to just make up the situation that the visible light cannot obtain the effect of a high-quality image in severe weather. Due to the fact that the infrared image and the visible light image are good and bad, image analysis can be carried out in many fields such as automatic driving, intelligent security and protection, remote sensing and industrial monitoring by combining the infrared image and the visible light image at the same time, and the double-light device capable of collecting the visible light image and the infrared image at the same time is applied. In view of the need for all-weather continuous detection, there are more and more bi-optic devices deployed, and the need for visual image processing technology is also increasing. In the application field of real-time monitoring by using dual optical devices, target detection and tracking are one of the important machine vision business requirements.

In some application scenarios, such as security monitoring and automatic driving, a plurality of targets are often tracked simultaneously, that is, multi-target tracking, which cannot be achieved only by using a target detection algorithm or a single-target tracking algorithm. In these application scenarios, the infrared device is used to monitor the surrounding environment and track the surrounding targets, which is an important means for realizing night monitoring. Due to the characteristics of low signal-to-noise ratio, discontinuous space, easiness in being influenced by environmental temperature and the like, the infrared image lacks more information compared with a common visible light image, and the difficulty of multi-target tracking on the infrared image is increased. In the current multi-target tracking task, accurate identification and tracking of multiple targets such as people and vehicles in infrared images is a key technical problem to be solved at present.

In the multi-target tracking problem, the existing target track is usually matched according to the detection result of the target in each frame image; for newly appearing targets, new targets need to be generated; for a target that has left the field of view of the image acquisition device, the tracking of the trajectory needs to be terminated. In this process, the matching of the target with the detection can be regarded as re-identification of the target. The multi-target tracking algorithm in the related art excessively depends on the result of target detection, and if the result of target detection is inaccurate or shielding occurs, the result of inaccurate tracking information or lost tracking information is caused, and the business requirement of an actual application scene is difficult to meet.

Disclosure of Invention

The application provides a target tracking method, a target tracking device, an electronic device and a readable storage medium, which effectively solve the problem that the target tracking result is inaccurate or the target tracking is lost due to the shielding condition, improve the accuracy of target tracking and meet the service requirement of actual high-precision target tracking.

In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:

an embodiment of the present invention provides a target tracking method, including:

acquiring detection frames of all targets to be tracked of a video to be processed, and taking local texture characteristics of a previous frame of image as supplementary information of all detection frames of a current frame of image;

generating corresponding tracking frames based on the detection frames carrying the supplementary information, and calculating the detection frames of the targets to be tracked and the multi-scale local features of the tracking frames;

and performing repeated matching on each detection frame and each tracking frame for multiple times by using an image matching algorithm to obtain the successfully matched detection frame and tracking frame.

Optionally, the image matching algorithm is a hungarian algorithm, and after the detection frames and the tracking frames are repeatedly matched for multiple times by using the image matching algorithm, before the detection frames and the tracking frames which are successfully matched are obtained, the method further includes:

calling tuning optimization cross-comparison relations for the detection frames to be matched and the tracking frames to be matched which are not successfully matched, and respectively calculating cross-comparison tuning values between each detection frame to be matched and each tracking frame to be matched;

based on the intersection-to-parallel ratio optimization value between each detection frame to be matched and each tracking frame to be matched, performing repeated circular matching again for multiple times by using the Hungarian algorithm;

the tuning and optimizing intersection-comparison relational expression is determined jointly according to the intersection ratio of the current to-be-matched detection frame and the current to-be-matched tracking frame, the distance between the current to-be-matched tracking frame and the central point of the current to-be-matched detection frame, and the frame size information of the current to-be-matched detection frame and the current to-be-matched tracking frame.

Optionally, the repeatedly matching each detection frame and each tracking frame in a plurality of cycles by using an image matching algorithm includes:

and according to the sequence of the track loss time of the target to be tracked from small to large, carrying out repeated matching on each detection frame and each tracking frame for multiple times in a circulating manner until the current iteration time is the maximum iteration time.

Optionally, after the multi-scale local features of the detection frame and the tracking frame of each target to be tracked are calculated, before the detection frame and the tracking frame are repeatedly matched for multiple times by using an image matching algorithm, the method further includes:

respectively carrying out normalization processing on the multi-scale local features of each detection frame and each tracking frame of each target to be tracked to obtain normalized multi-scale local feature vectors;

and for each detection frame, respectively calculating the inner product of the normalized multi-scale local feature vector of the current detection frame and the normalized multi-scale local feature vector of each tracking frame, and constructing a weight coefficient matrix of the Hungarian algorithm based on the inner product result.

Optionally, the generating a corresponding tracking frame based on each detection frame carrying the supplemental information includes:

calculating the position information of each detection frame in the current frame image according to the motion state parameters of each detection frame;

calculating Kalman filtering gain through a preset observation matrix;

calculating according to the position information of each detection frame and the Kalman filtering gain to obtain the position information of the corresponding tracking frame;

correspondingly, after the detection frame and the tracking frame which are successfully matched are obtained, the method further comprises the following steps:

and updating the parameters of Kalman filtering by using the detection frame and the tracking frame which are successfully matched.

Optionally, the using the local texture feature of the previous frame image as the supplementary information of each detection frame of the current frame image includes:

calculating the position information of each detection frame in the current frame image according to the motion state parameters of each detection frame so as to determine the image area of each detection frame in the current frame image;

for each detection frame, calculating gradient information of each pixel point in a target image area corresponding to the current detection frame; calculating gradient information based on the gradient information of each pixel point to obtain gradient histogram information; performing block normalization processing based on target pixel point information used in the gradient histogram information calculation process to obtain block feature information; constructing local texture features of the current detection frame based on all block feature information in the target image region;

constructing a plurality of target frames with the same width and height in the range of the distance between the central points of the detection frames of the previous frame as a first preset value, and respectively calculating local texture characteristics in the range of each target frame;

and performing inner product operation on the local texture features of each target frame and the local texture features of the detection frame corresponding to the current frame image, and taking the target frame with the maximum inner product value as supplementary information.

Optionally, the calculating the position information of each detection frame in the current frame image according to the motion state parameter of each detection frame includes:

representing the motion state information of each target to be tracked by using the central point position of the detection frame, the frame size information, the central point position and the speed information of the frame size information in an image coordinate system;

constructing a state transition matrix for describing a target motion relation between two adjacent frames of the video to be processed based on the motion state information of each target to be tracked and the fact that each target to be tracked moves at a constant speed;

and predicting the running state parameter of the current moment according to the state transition matrix and the motion process noise by using the motion state parameter of the previous moment so as to determine the position coordinates of each detection frame in the current frame image based on the motion state parameter of the current moment.

Another aspect of an embodiment of the present invention provides a target tracking apparatus, including:

the detection information acquisition module is used for acquiring detection frames of all targets to be tracked of the video to be processed and taking the local texture characteristics of the previous frame of image as the supplementary information of all detection frames of the current frame of image;

the tracking frame calculation module is used for generating corresponding tracking frames based on the detection frames carrying the supplementary information;

the multi-scale characteristic calculation module is used for calculating the multi-scale local characteristics of the detection frame and the tracking frame of each target to be tracked;

and the matching module is used for performing repeated cyclic matching on each detection frame and each tracking frame for many times by using an image matching algorithm to obtain the detection frame and the tracking frame which are successfully matched.

An embodiment of the present invention further provides an electronic device, which includes a processor, and the processor is configured to implement the steps of the target tracking method according to any one of the foregoing items when executing the computer program stored in the memory.

Finally, an embodiment of the present invention provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the object tracking method according to any of the previous items.

The technical scheme provided by the application has the advantages that the image local texture feature of the previous frame is added in the generation stage of the current frame detection frame to serve as extra supplementary information, the previous part of the target detection task is supplemented, and the tracking frame is generated by the detection frame, so that the problem that the target is not detected out by a target detection model in the current frame can be solved to a certain extent, the problem that the tracking frame is lost in some frames can also be solved, the phenomenon that the detection frame is missed to be detected to cause the target to be tracked to be lost can be avoided to the greatest extent, and the tracking accuracy of the target is effectively improved. Furthermore, matching between the detection frame and the tracking frame is carried out based on the multi-scale local features, the multi-scale local features are more focused on the local features of the image, are insensitive to rotation, scale scaling and brightness change, and also keep stability to a certain degree on view angle change, affine transformation and noise. The shielded targets can be found back again through multiple matching screening within a certain time range, the ID conversion times of the shielded targets are reduced, the problem that the targets are shielded for a long time can be solved, the current situation that target tracking results are inaccurate or target tracking is lost due to shielding is effectively solved, the accuracy of target tracking is improved, and the service requirement of actual high-precision target tracking can be met.

In addition, the embodiment of the invention also provides a corresponding implementation device, electronic equipment and a readable storage medium for the target tracking method, so that the method has higher practicability, and the device, the electronic equipment and the readable storage medium have corresponding advantages.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings required to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a target tracking method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a Gaussian pyramid and a Gaussian difference pyramid according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-scale local feature construction process provided in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a gradient histogram calculation process according to an embodiment of the present invention;

FIG. 5 is a block acquisition diagram provided by an embodiment of the present invention;

fig. 6 is a schematic diagram of an HOG feature supplementary detection block provided in an embodiment of the present invention;

fig. 7 is a schematic diagram of a tracking result without adding HOG feature supplement according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a tracking result supplemented with an HOG feature according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart of another target tracking method according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating tracking results in an illustrative example provided by embodiments of the invention;

FIG. 11 is a diagram illustrating tracking results in another illustrative example provided by an embodiment of the present invention;

FIG. 12 is a block diagram of an embodiment of a target tracking device according to the present invention;

fig. 13 is a block diagram of a specific implementation of an electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

Having described the technical solutions of the embodiments of the present invention, various non-limiting embodiments of the present application are described in detail below.

Referring to fig. 1, fig. 1 is a schematic flow chart of a target tracking method according to an embodiment of the present invention, where the present embodiment uses a feature supplementing and preserving method to perform multi-target tracking, so as to achieve re-tracking of an occluded target, such as a person or a vehicle, in a video image, such as an infrared video image, and meet requirements of real-time performance and accuracy in practical applications, the embodiment of the present invention may include the following contents:

s101: and acquiring detection frames of all to-be-tracked targets of the to-be-processed video, and taking the local texture features of the previous frame image as the supplementary information of all detection frames of the current frame image.

The video to be processed in this embodiment may be a visible light video, an infrared video, or a fusion video of visible light and infrared light, which does not affect the implementation of the present application. The target to be tracked is a target to be tracked in the current application scene, the number of the targets to be tracked can be multiple or 1, the targets to be tracked can be multiple targets of different types or multiple targets of the same type, for example, people and vehicles can be tracked simultaneously, multiple people can be tracked, and multiple vehicles can be tracked. Before the step is executed, a target detection model can be constructed based on any one of the existing target detection methods, such as a target detection model constructed based on a convolutional neural network model, yolov3, and the like, and the construction process of the target detection model can refer to the content recorded in the related art of the adopted target detection algorithm, and is not described herein again. And carrying out target identification on the target in each frame of image in the video to be processed by utilizing the trained or constructed target detection model, and outputting a corresponding detection frame. Considering the precision of the target detection model, the condition that detection is missed or the target is not detected due to the fact that the target is blocked may exist, in order to improve the target tracking precision, the local texture features of the previous frame image of the current frame image can be used as the supplementary information of each detection frame of the current frame image, namely, when the current frame image is tracked, the local texture features of the current frame image and the previous frame image are used for tracking, so that even if the target detection model is inaccurate or the target is lost, the lost target can be found again based on the local texture features of the previous frame image, and the overall target tracking precision is improved.

S102: and generating corresponding tracking frames based on the detection frames carrying the supplementary information, and calculating the detection frames of the targets to be tracked and the multi-scale local features of the tracking frames.

After the detection frames of the targets to be tracked of the current frame image are determined in the previous step, the tracking frame of each detection frame can be calculated in any mode, that is, the tracking frame is generated based on the detection frames. The multi-scale local feature calculation of the tracking frame and the detection frame is divided into four steps of scale space establishment, pole detection, feature point direction assignment and feature descriptor calculation, and the calculation process of each step is introduced by combining the following steps with the following figures 2 and 3:

a1: and establishing a scale space. The purpose of constructing the scale space is to find positions with invariance in scale change, and continuous scale change can be used, namely stable characteristic points can be found in all possible scale change in the scale space, and the poles found in this way can ensure invariance in image scaling and rotation change. The scale space L (x, y, σ) of an image may be defined as L (x, y, σ) ═ G (x, y, σ) × I (x, y), where I (x, y) is the original image, x is the x-th row of the image matrix, y is the y-th column of the image matrix, G (x, y, σ) is a gaussian function whose scale space is variable, x represents a convolution operation, σ represents the size of the scale space, and larger σ represents a more blurred image, which may be used to represent the profile of the image; the smaller the sigma, the clearer the representation, and can be used for representing the details of the image.

A2: a pole is detected. In this step, a stable and invariant extreme point is found in the scale space by using gaussian difference, and a gaussian difference function D (x, y, σ) can be expressed as D (x, y, σ) ([ G (x, y, k σ) -G (x, y, σ) ] × I (x, y) ([ L (x, y, k σ) -L (x, y, σ) ], where k σ and σ are smooth scales of two continuous images, and the obtained difference image is in a gaussian difference pyramid, as shown in fig. 2. The Gaussian difference pyramid is obtained by sequentially calculating the subtraction of two adjacent layers of the Gaussian pyramid to obtain one layer of the Gaussian difference pyramid, the smooth scale of each layer in each group of the pyramid is different, the first layer of the next group of the pyramid is obtained by performing point-spaced downsampling on the feature image of the previous layer, and the purpose of doing so is to enable the Gaussian difference pyramid to meet the scale continuity. Comparing the current point with its surrounding 24 point values, if it is the maximum or minimum value, then the point is the extreme point, otherwise it is not, and the calculation amount of this comparison method is smaller.

A3: and assigning direction of the feature points. And for each key point extracted above, selecting a window around the point, forming a direction histogram by the gradient directions of all sampling points in the window, and determining the direction of the key point according to the peak value of the histogram. The dimensions of the keypoints are used to select the gaussian filtered image that participates in the computation, as well as the size of the window. In order to ensure that the directions of the same keypoint at different scales all contain the same amount of information, the size of the windows must be different. The larger the scale of the same original image is, the larger the window is; conversely, if the window size is unchanged, the larger the scale of the image, the less information within the window.

A4: and calculating a feature descriptor. The multi-scale local feature descriptor is a representation of the Gaussian image gradient statistical result in the key point field. By blocking the image region around the key point, the gradient histogram in the block is calculated, and a unique vector is generated, wherein the vector is an abstraction of the image information of the region and has uniqueness. The feature descriptor uses the gradient information of 8 directions calculated in a4 × 4 window within the keypoint scale space, and the total 4 × 4 × 8 is characterized by a 128-dimensional vector. The overall flow of constructing a multi-scale local feature is shown in fig. 3.

S103: and performing repeated matching on each detection frame and each tracking frame for multiple times by using an image matching algorithm to obtain the successfully matched detection frame and tracking frame.

The image matching algorithm in this step may adopt any method based on gray scale matching or a method based on feature matching, which does not affect the implementation of this application. Because the technical problem to be solved by the application is the problem of low tracking accuracy caused by occlusion, it can be understood that an occluded target cannot be always occluded, that is, occlusion of the target to be tracked often occurs for a certain time, and the occluded target reappears in the field of view at other times, so that a maximum iteration number or maximum matching duration can be set according to an actual application scene, that is, the surrounding environment of the video to be processed, and the matching process of S103 is set to be a cyclic process. For example, if the maximum iteration number is set to 50, the image matching algorithm is used to perform repeated matching on each detection frame and each tracking frame 50 times, and if the maximum matching duration is set to 10min, the repeated matching is performed on each detection frame and each tracking frame within 10 min. For example, for subsequent use, the detection frame and the tracking frame which can be successfully matched and the detection frame and the tracking frame which are not successfully matched in the step can be respectively stored in the list.

In order to further improve the matching precision, a limiting condition can be set in the repeated matching process, namely, each detection frame and each tracking frame are subjected to repeated matching for a plurality of times according to the sequence from small to large of the track loss time of the target to be tracked until the current iteration time is the maximum iteration time. For example, the matching process of S103 may be set to a loop with a maximum iteration number of 50, and the trajectory tracking frame and the detection frame from 0 th to 50 th are matched, where no missing trajectory is matched first, and a long-term missing trajectory is matched later. Through the part of processing, the occluded object can be retrieved again, and the number of ID conversion of the occluded and reappeared object is reduced. And after the cycle matching process is finished, obtaining a detection frame and a tracking frame which are successfully matched, obtaining the motion trail of each target to be tracked based on the detection frame and the tracking frame which are successfully matched, and completing the multi-target tracking based on the determined motion trail.

In the technical scheme provided by the embodiment of the invention, the image local texture feature of the previous frame is added in the generation stage of the detection frame of the current frame as additional supplementary information to supplement the target detection task of the front part, and the tracking frame is generated by the detection frame, so that the problem that the target is not detected by a target detection model in the current frame can be solved to a certain extent, the problem that the tracking frame is lost in some frames can be solved, the phenomenon that the target to be tracked is lost due to missed detection of the detection frame can be avoided to the greatest extent, and the tracking accuracy of the target is effectively improved. Furthermore, matching between the detection frame and the tracking frame is carried out based on the multi-scale local features, the multi-scale local features are more focused on the local features of the image, are insensitive to rotation, scale scaling and brightness change, and also keep stability to a certain degree on view angle change, affine transformation and noise. The shielded targets can be found back again through multiple matching screening within a certain time range, the ID conversion times of the shielded targets are reduced, the problem that the targets are shielded for a long time can be solved, the current situation that target tracking results are inaccurate or target tracking is lost due to shielding is effectively solved, the accuracy of target tracking is improved, and the service requirement of actual high-precision target tracking can be met.

In the foregoing embodiment, how to execute step S103 is not limited, and an implementation manner of matching between each detection frame and each tracking frame in this embodiment may include the following steps:

in the embodiment, a Hungarian algorithm can be adopted as an image matching algorithm, and the Hungarian matching is based on a theorem: if the same number is added or subtracted to each element in a row or a column of the weight coefficient matrix C-C _ ij to obtain a new matrix B-B _ ij, the assignment problem using C or B as the coefficient matrix has the same optimal assignment. The weight coefficient matrix can be determined based on the multi-scale local features of the detection frames and the tracking frames, and as an optional implementation mode, after the multi-scale local features of each detection frame and each tracking frame of each target to be tracked are obtained through calculation, normalization processing can be further performed on the multi-scale local features of each detection frame and each tracking frame of each target to be tracked, and a normalized multi-scale local feature vector is obtained; and for each detection frame, respectively calculating the inner product of the normalized multi-scale local feature vector of the current detection frame and the normalized multi-scale local feature vector of each tracking frame, so as to obtain the distribution of inner product values in the range of [0, 1], and constructing a weight coefficient matrix of the Hungarian algorithm based on the inner product result. For example, the target to be tracked includes three ABC detection frames and three ABC tracking frames, the a detection frame performs inner product with the tracking frames a, B, and C, the B detection frame performs inner product with the tracking frames a, B, and C, the C detection frame performs inner product with the tracking frames a, B, and C, and finally, the weight coefficient matrix is constructed based on 9 inner product values distributed in the range of [0, 1 ]. The Hungarian matching algorithm is totally divided into the following four steps:

b1: each row of the weight coefficient matrix is subtracted by the smallest number in that row, followed by each column of the weight coefficient matrix by the smallest number in that column.

B2: the matrix obtained in step B1 must have 0 elements and then the minimum horizontal and vertical lines are used to cover all 0 elements.

B3: and C, judging whether the total number of the lines in the step B2 is smaller than the dimension of the matrix, and if so, executing the step B4.

B4: find the minimum value in the number not covered, subtract the minimum value for each row not covered by the line, add the minimum value for each column covered, and then jump to execute B2. Until the least line segment number covering all 0 is equal to the dimension of the matrix, 0 element value of each row is taken, the number corresponding to the original matrix is found, which is the optimal distribution, and the optimal distribution is a group found firstly.

The detection frames and the tracking frames are in one-to-one correspondence, the Hungarian algorithm matching originally finds n numbers of a weight matrix formed by inner products, for example, the n numbers are the smallest sum, the n numbers cannot be in the same row and the same column, a group of solutions can be found by utilizing the 4 steps, then the target tracking is to find the n numbers with the largest inner products, namely, the relevance is the strongest, and therefore the inner product value plus a negative number enters the Hungarian matching to obtain the solution. That is, since the maximum allocation is found in the target tracking, all the calculated inner product values are sent to the Hungarian matching algorithm.

It should be noted that, if the image matching algorithm adopts the hungarian algorithm, and the problem of ID transformation of the tracking frames due to the multi-solution problem of matching of the hungarian algorithm is solved, in view of this, in order to solve the problem, after the image matching algorithm is used to perform multiple cyclic repeated matching on each detection frame and each tracking frame, before the detection frame and the tracking frame which are successfully matched are obtained, the method may further include:

calling tuning optimization cross-comparison relations for the detection frames to be matched and the tracking frames to be matched which are not successfully matched, and respectively calculating cross-comparison tuning values between each detection frame to be matched and each tracking frame to be matched; and based on the intersection-to-parallel ratio optimization value between each detection frame to be matched and each tracking frame to be matched, performing repeated circular matching again for multiple times by using the Hungarian algorithm. The tuning and comparing relation is determined according to the comparing ratio of the current to-be-matched detection frame and the current to-be-matched tracking frame, the distance between the current to-be-matched tracking frame and the central point of the current to-be-matched detection frame, and the frame size information of the current to-be-matched detection frame and the current to-be-matched tracking frame. The frame size information includes, but is not limited to, the width and height of the detection frame and the tracking frame, the width and height of the maximum bounding rectangle, and the like.

After the image matching processing is performed in the above embodiment S103, the detection frame and the tracking frame which are successfully matched are obtained, in this embodiment, the detection frame and the tracking frame which are not successfully matched can be optimized, and after the optimization processing, the image matching algorithm, such as the hungarian algorithm, is adopted again for matching. As an alternative embodiment, in order to enable the close tracking frame and the detection frame to maintain the IOU feature, and the farther tracking frame and the detection frame to reduce the IOU feature, the tuning cross-comparison relation can be expressed as:

in the formula (I), the compound is shown in the specification,

adjusting an optimal value for the intersection-parallel ratio of the current to-be-matched detection frame and the current to-be-matched tracking frame, the IOU is the intersection-parallel ratio of the current to-be-matched detection frame and the current to-be-matched tracking frame, a is the current to-be-matched tracking frame, b is the current to-be-matched detection frame, c is the diagonal distance of the maximum external rectangle of the current to-be-matched tracking frame and the current to-be-matched detection frame, w is the diagonal distance of the maximum external rectangle of the current to-be-matched tracking frame and the current to-be-matched tracking frame_aFor the width of the current trace box to be matched, w_bFor the width of the current detection frame to be matched, w_cTrack box and for current to-be-matchedThe width h of the maximum external rectangle of the current detection frame to be matched_aFor the height of the current tracking frame to be matched, h_bFor the height of the current detection frame to be matched, h_cThe height of the maximum circumscribed rectangle between the current tracking frame to be matched and the current detection frame to be matched is defined as rho (a, b) which is the distance between the current tracking frame to be matched and the center point of the current detection frame to be matched, rho (w)_a,w_b) The width difference value between the current tracking frame to be matched and the current detection frame to be matched is rho (h)_a,h_b) And the height difference value between the current tracking frame to be matched and the current detection frame to be matched is obtained.

Therefore, when the IOU between the detection frame and the tracking frame is calculated, the central point distance characteristic and the width-height difference value characteristic are introduced, the method can be called as EIOU matching for short, the similar detection frame and the tracking frame can be matched better, the detection frame and the tracking frame which are far away can be screened out, the data correspondence in the weight matrix is increased, and the problem of ID conversion of the tracking frame caused by the multi-solution problem of Hungary matching is solved.

The above embodiment does not limit the generation manner of the supplementary information of each detection frame, and this embodiment also provides a generation manner of the supplementary information of a detection frame, which may include:

as an alternative implementation manner of this embodiment, the motion state information of each target to be tracked may be represented by using the center point position of the detection frame, the frame size information, the center point position, and the speed information of the frame size information in the image coordinate system. The frame size information includes, but is not limited to, the width and height of the frame. The motion state information of each target to be tracked in this embodiment may be expressed by using a target motion state parameter, where the target motion state parameter may be used to describe a target motion state, and the target motion state parameter may be expressed as m ═ x, y, α, h, v_x，v_y，v_α，v_h]X and y represent position coordinates of the center point of the detection frame m, α is the aspect ratio of the detection frame m, h is the height of the detection frame m, and v_x，v_y，v_α，v_hRespectively expressed by four variables of x, y, alpha and hCorresponding velocity information in the image coordinate system. Because the time interval between two frames in the video to be processed is very short, and the offset of the target in the motion process is small, the motion of the target can be approximated to uniform motion, and the interval between the two frames is replaced by normalized time, namely x_t＝x_t-1+v_x，y_t＝y_t-1+v_y，α_t＝α_t-1+v_α，h_t＝h_t-1+v_h. Plus motion noise w^tConstructing a state transition matrix used for describing a target motion relation between two adjacent frames of the video to be processed as a motion relation between the two frames, wherein the operation process is converted into a matrix form:

and predicting the running state parameter of the current moment according to the state transition matrix and the motion process noise by using the motion state parameter of the previous moment so as to determine the position coordinates of each detection frame in the current frame image based on the motion state parameter of the current moment. Namely, the state transition matrix is set to F, and the t state at the current time is predicted from the t-1 state at the last time, so that the state transition matrix can be obtained

Wherein

Refers to the target motion state vector m at time t. Then, the covariance of the motion state parameters and the noise in the motion process is solved to obtain a prediction formula P of an error covariance matrix_t ^-＝FP_t-1F^T+ Q. Wherein

Q＝cov(w^t,w^t) And cov is a calculation covariance. In this way, a new error can be predicted from the error covariance and process noise at the last time instant.

The position information of each detection frame in the current frame image is calculated according to the motion state parameters of each detection frame shown in the above embodiment, so as to determine the image area where each detection frame is located in the current frame image.

For each detection frame, calculating gradient information of each pixel point in a target image area corresponding to the current detection frame; calculating gradient information based on the gradient information of each pixel point to obtain gradient histogram information; performing block normalization processing based on target pixel point information used in the gradient histogram information calculation process to obtain block characteristic information; constructing local texture features of the current detection frame based on all block feature information in the target image area;

In this embodiment, the whole process is described by taking a local texture feature as HOG (Histogram of Oriented gradients) as an example:

and after obtaining the motion state information, namely the motion state vector m, of the detection frame of the target to be tracked in the previous frame, the position of the detection frame on the image can be found. Calculating the HOG characteristics in the found region, wherein the calculation comprises four steps of gradient calculation, gradient histogram calculation, block normalization and HOG characteristic statistics:

c1: and (4) gradient calculation. Horizontal gradient g of single pixel point_xIs the difference of pixel values of the left and right adjacent pixel points, and the gradient g in the vertical direction_yIs the difference value of the pixel values of the upper and lower adjacent pixels, and then the total gradient strength of the pixel can be obtained

And direction of gradient

Where the gradient direction needs to be taken as an absolute value.

C2: and (4) calculating a gradient histogram. In step C1, the gradient strength and gradient direction of each pixel point are obtained, 9 pixel points can be selected, then 0-180 ° is divided into 9 parts equally, the interval is 20 °, the gradient strength of the corresponding position is placed according to the distance between the gradient direction and each angle point, the corresponding gradient strength is placed at the corresponding angle position if the distance is small, and if the distance is the same, the gradient strength is divided into two parts equally, and the two parts are placed at the corresponding angle position, as shown in fig. 4. By means of the gradient histogram, 18 pieces of feature information in the original 9 pixel points are put into an array with the size of 9, the feature information is reduced by half, and complexity of subsequent calculation is greatly reduced.

C3: and (4) block normalization. The above 9 pixels are used as a cell, and 4 cells are used as a block, so that one block includes 36 characteristic information amounts, and then the 36 characteristic information amounts need to be normalized, which is the normalization of the L2 norm in this embodiment, that is, the normalization is used

Wherein

Is the ith normalized feature information quantity, x_iIs the ith characteristic information amount. This has the advantage of normalizing each feature information quantity to [0, 1]]Within the range, the subsequent comparison is convenient.

C4: and (5) carrying out HOG characteristic statistics. And C3, calculating feature information of a block, and constructing the HOG features in the target region by only collecting all block feature information in the target range and putting the block feature information together, wherein the block collection needs to be moved by one cell distance each time, as shown in fig. 5.

In the range of the distance between the central points of the detection frames of the previous frame being 1, 8 frames with the same width and height are constructed, namely 8 orientations of upper, lower, left, right and diagonal angles, then the features of the HOG in the range of the 8 frames are respectively calculated, and the inner product operation is performed with the HOG of the detection frame, so that the frame with the largest value is obtained and used as the supplement of the detection frame of the next frame, as shown in fig. 6.

Because the tracking frame is generated by the detection frame, the problem of tracking loss caused by missed detection due to the detection model can be solved after the detection frame is supplemented by the HOG characteristics, and the experimental result is shown in fig. 7 and 8.

The above embodiment does not limit the generation manner of the tracking frame, and this embodiment further provides a generation manner of the tracking frame, which may include:

calculating the position information of each detection frame in the current frame image according to the motion state parameters of each detection frame; calculating Kalman filtering gain through a preset observation matrix; and calculating according to the position information of each detection frame and the Kalman filtering gain to obtain the position information of the corresponding tracking frame.

In this embodiment, the tracking frames of the detection frames are generated by using a kalman filtering method, and firstly, the tracking frames need to pass through an observation matrix based on K_t＝P_t ^-H^T·((HP_t ^-H^T)+R)^-1Calculating Kalman filtering gain, and extracting the first 4-dimensional features by multiplying an observation matrix by a motion state vector, wherein the observation matrix is as follows:

in the formula, K_tKalman filter gain at time t, P_t ^-Estimating covariance, H, a priori for time t^TR measures the noise covariance as the transpose of the observation matrix.

After the kalman filter gain is calculated, the specific coordinates of each detection frame are calculated based on the above embodiment, and the relational expression is used

A more accurate tracking frame can be obtained,

to be an estimate of the a posteriori state at time t,

is a priori state estimated value at the time t, and z is [ x, y, alpha, h ] in the target motion state parameter m]And H is an observation matrix, so that the specific position coordinates of each tracking frame can be obtained.

Correspondingly, after the detection frame and the tracking frame which are successfully matched are obtained, the following steps can be further included: and updating the parameters of the Kalman filtering by using the detection frame and the tracking frame which are successfully matched, as shown in FIG. 9. If EIOU matching is performed after S103, after EIOU matching, the detection frame and the tracking frame that are successfully matched are used to update the kalman filtering parameters, and then the updated kalman filtering method is used to perform kalman filtering again on each motion parameter in the new detection frame, that is, the target motion state parameter m in the above embodiment, so as to complete the circular tracking of the entire method.

In order to verify the effectiveness of the embodiment, the application further performs a multi-target tracking test according to a multi-target tracking method by taking the infrared video as an example, wherein the multi-target tracking method comprises the following steps: firstly, generating position coordinates by using HOG characteristic information of a previous frame detection target and a position adjacent to the previous frame detection target to supplement a next frame detection target, then predicting a motion trajectory of an object by using Kalman filtering, then matching a predicted tracking frame and a detection frame in a current frame by using Hungarian algorithm through designed multi-scale local characteristics, retaining the characteristics of an obtained result, matching the tracking frame and the detection frame by using Hungarian algorithm through improved IOU (Intersection over Unit) information, namely, the OUEI matching method in the embodiment, finally obtaining a matched result to update the Kalman filtering, and completing the tracking of the target. As shown in fig. 10 and fig. 11, based on the two graphs, the present application solves the problems of tracking loss, target occlusion, and frequent ID exchange in the current multi-target tracking algorithm by constructing a tracking algorithm with feature supplement and retention and using an improved IOU strategy. The method has higher tracking precision and operation speed, and can realize real-time tracking.

It should be noted that, in the present application, there is no strict sequential execution order among the steps, and as long as a logical order is met, the steps may be executed simultaneously or according to a certain preset order, and fig. 1 to 9 are only schematic manners, and do not represent only such an execution order.

The embodiment of the invention also provides a corresponding device for the target tracking method, so that the method has higher practicability. Wherein the means can be described separately from the functional module point of view and the hardware point of view. In the following, the target tracking apparatus provided by the embodiment of the present invention is introduced, and the target tracking apparatus described below and the target tracking method described above may be referred to correspondingly.

Based on the angle of the functional module, referring to fig. 12, fig. 12 is a structural diagram of an object tracking apparatus according to an embodiment of the present invention, in a specific implementation manner, the apparatus may include:

the detection information obtaining module 121 is configured to obtain detection frames of each target to be tracked of the video to be processed, and use local texture features of the previous frame image as supplementary information of each detection frame of the current frame image;

a tracking frame calculation module 122, configured to generate a corresponding tracking frame based on each detection frame carrying the supplemental information;

the multi-scale feature calculation module 123 is configured to calculate multi-scale local features of the detection frame and the tracking frame of each target to be tracked;

and the matching module 124 performs repeated matching on each detection frame and each tracking frame for multiple times by using an image matching algorithm to obtain the detection frame and the tracking frame which are successfully matched.

Optionally, in some embodiments of this embodiment, the apparatus may further include an EIOU matching module, configured to use the image matching algorithm as a hungarian algorithm, and call an optimization cross-over ratio relational expression to respectively calculate a cross-over ratio tuning value between each detection frame to be matched and each tracking frame to be matched for the detection frame to be matched and the tracking frame to be matched that are not successfully matched; based on the intersection-to-parallel ratio optimization value between each detection frame to be matched and each tracking frame to be matched, performing repeated circular matching again for multiple times by using the Hungarian algorithm; the tuning and comparing relation is determined according to the comparing ratio of the current to-be-matched detection frame and the current to-be-matched tracking frame, the distance between the current to-be-matched tracking frame and the central point of the current to-be-matched detection frame, and the frame size information of the current to-be-matched detection frame and the current to-be-matched tracking frame.

As an optional implementation manner of this embodiment, the EIOU matching module may be further configured to perform multiple loop repeated matching on each detection frame and each tracking frame according to a sequence from small to large of a trajectory loss time of the target to be tracked until the current iteration number is the maximum iteration number.

As another optional implementation manner of this embodiment, the apparatus may further include a normalization processing module, for performing normalization processing on the multi-scale local features of each detection frame and each tracking frame of each target to be tracked, respectively, to obtain a normalized multi-scale local feature vector; and for each detection frame, respectively calculating the inner product of the normalized multi-scale local feature vector of the current detection frame and the normalized multi-scale local feature vector of each tracking frame, and constructing a weight coefficient matrix of the Hungarian algorithm based on the inner product result.

Optionally, in other embodiments of this embodiment, the tracking frame calculation module 122 may be further configured to: calculating the position information of each detection frame in the current frame image according to the motion state parameters of each detection frame; calculating Kalman filtering gain through a preset observation matrix; calculating according to the position information of each detection frame and Kalman filtering gain to obtain the position information of the corresponding tracking frame; the corresponding device can also comprise a parameter updating module which is used for updating the parameters of the Kalman filtering by using the detection frame and the tracking frame which are successfully matched.

Optionally, in another embodiment of this embodiment, the detection information obtaining module 121 may include a detection information generating unit, where the detection information generating unit is configured to: calculating the position information of each detection frame in the current frame image according to the motion state parameters of each detection frame so as to determine the image area of each detection frame in the current frame image; for each detection frame, calculating gradient information of each pixel point in a target image area corresponding to the current detection frame; calculating gradient information based on the gradient information of each pixel point to obtain gradient histogram information; performing block normalization processing based on target pixel point information used in the gradient histogram information calculation process to obtain block characteristic information; constructing local texture features of the current detection frame based on all block feature information in the target image area; constructing a plurality of target frames with the same width and height in the range of the distance between the central points of the detection frames of the previous frame as a first preset value, and respectively calculating local texture characteristics in the range of each target frame; and performing inner product operation on the local texture features of each target frame and the local texture features of the detection frame corresponding to the current frame image, and taking the target frame with the maximum inner product value as supplementary information.

As an optional implementation manner of this embodiment, the apparatus may further include a state calculation module, configured to use speed information of the center point position, the frame size information, the center point position, and the frame size information of the detection frame, which correspond to the image coordinate system, to represent motion state information of each target to be tracked; constructing a state transition matrix for describing a target motion relation between two adjacent frames of a video to be processed based on the motion state information of each target to be tracked and the fact that each target to be tracked moves at a constant speed; and predicting the running state parameter of the current moment according to the state transition matrix and the motion process noise by using the motion state parameter of the previous moment so as to determine the position coordinates of each detection frame in the current frame image based on the motion state parameter of the current moment.

The functions of each functional module of the target tracking device in the embodiment of the present invention may be specifically implemented according to the method in the above method embodiment, and the specific implementation process may refer to the related description of the above method embodiment, which is not described herein again.

Therefore, the embodiment of the invention effectively solves the current situation that the target tracking result is inaccurate or the target tracking is lost due to the shielding condition, improves the accuracy of the target tracking and can meet the service requirement of actual high-precision target tracking.

The above mentioned target tracking device is described from the perspective of a functional module, and further, the present application also provides an electronic apparatus, which is described from the perspective of hardware. Fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 13, the electronic device includes a memory 130 for storing a computer program; a processor 131, configured to implement the steps of the target tracking method as mentioned in any of the above embodiments when executing the computer program.

Among other things, the processor 131 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 131 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 131 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 131 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 131 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 130 may include one or more computer-readable storage media, which may be non-transitory. Memory 130 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 130 is at least used for storing a computer program 1301, wherein after being loaded and executed by the processor 131, the computer program can implement the relevant steps of the target tracking method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 130 may also include an operating system 1302, data 1303, and the like, and the storage manner may be a transient storage manner or a permanent storage manner. Operating system 1302 may include Windows, Unix, Linux, etc., among others. Data 1303 may include, but is not limited to, data corresponding to target tracking results, and the like.

In some embodiments, the electronic device may further include a display 132, an input/output interface 133, a communication interface 134, alternatively referred to as a network interface, a power source 135, and a communication bus 136. The display 132 and the input/output interface 133, such as a Keyboard (Keyboard), belong to a user interface, and the optional user interface may further include a standard wired interface, a wireless interface, and the like. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, as appropriate, is used for displaying information processed in the electronic device and for displaying a visualized user interface. The communication interface 134 may optionally include a wired interface and/or a wireless interface, such as a WI-FI interface, a bluetooth interface, etc., typically used to establish a communication connection between an electronic device and other electronic devices. The communication bus 136 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 13, but this is not intended to represent only one bus or type of bus.

Those skilled in the art will appreciate that the configuration shown in fig. 13 is not intended to be limiting of the electronic device and may include more or fewer components than those shown, such as a sensor 137 that performs various functions.

The functions of the functional modules of the electronic device according to the embodiments of the present invention may be specifically implemented according to the method in the above method embodiments, and the specific implementation process may refer to the description related to the above method embodiments, which is not described herein again.

It is to be understood that, if the target tracking method in the above embodiments is implemented in the form of a software functional unit and sold or used as a stand-alone product, it may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be substantially or partially implemented in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods of the embodiments of the present application, or all or part of the technical solutions. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), an electrically erasable programmable ROM, a register, a hard disk, a removable magnetic disk, a CD-ROM, a magnetic or optical disk, and other various media capable of storing program codes.

Based on this, the embodiment of the present invention further provides a readable storage medium, which stores a computer program, and the computer program is executed by a processor, and the steps of the target tracking method according to any one of the above embodiments are provided.

The functions of the functional modules of the readable storage medium according to the embodiment of the present invention may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the description related to the foregoing method embodiment, which is not described herein again.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. For hardware including devices and electronic equipment disclosed by the embodiment, the description is relatively simple because the hardware includes the devices and the electronic equipment correspond to the method disclosed by the embodiment, and the relevant points can be obtained by referring to the description of the method.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The foregoing details a target tracking method, an apparatus, an electronic device, and a readable storage medium provided by the present application. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. A target tracking method, comprising:

2. The target tracking method according to claim 1, wherein the image matching algorithm is a Hungarian algorithm, and after the detection frames and the tracking frames are repeatedly matched for a plurality of times by using the image matching algorithm, and before the detection frames and the tracking frames which are successfully matched are obtained, the method further comprises:

3. The target tracking method of claim 2, wherein the performing a plurality of repeated matches of the detection frames and the tracking frames using an image matching algorithm comprises:

4. The target tracking method according to claim 2, wherein after the multi-scale local features of the detection frame and the tracking frame of each target to be tracked are calculated, before the detection frame and the tracking frame are repeatedly matched for a plurality of times by using an image matching algorithm, the method further comprises:

5. The target tracking method according to claim 1, wherein the generating of the corresponding tracking frame based on each detection frame carrying the supplementary information comprises:

calculating Kalman filtering gain through a preset observation matrix;

6. The target tracking method according to any one of claims 1 to 5, wherein the using the local texture feature of the previous frame image as the supplementary information of each detection frame of the current frame image comprises:

7. The method for tracking the target of claim 6, wherein the calculating the position information of each detection frame in the current frame image according to the motion state parameters of each detection frame comprises:

8. An object tracking device, comprising:

9. An electronic device comprising a processor and a memory, the processor being configured to implement the steps of the object tracking method of any one of claims 1 to 7 when executing a computer program stored in the memory.

10. A readable storage medium, having stored thereon an object tracking computer program which, when executed by a processor, carries out the steps of the object tracking method according to any one of claims 1 to 7.