CN113808162B

CN113808162B - Target tracking method, device, electronic equipment and storage medium

Info

Publication number: CN113808162B
Application number: CN202110990797.4A
Authority: CN
Inventors: 李椋; 王刚; 王以政; 吴婷; 陈明松; 李邵港; 杨欣; 雷煜; 李丽亚; 王毅
Original assignee: Academy of Military Medical Sciences AMMS of PLA; CETC 11 Research Institute
Current assignee: Academy of Military Medical Sciences AMMS of PLA; CETC 11 Research Institute
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2024-01-23
Anticipated expiration: 2041-08-26
Also published as: CN113808162A

Abstract

The disclosure provides a target tracking method, a target tracking device, electronic equipment and a storage medium, and relates to the technical fields of image processing and photoelectric tracking. The specific implementation scheme is as follows: acquiring an image sequence; determining original boundary box information and original motion information of a target in an adjacent image positioned in front of a current image in an image sequence aiming at the current image in the image sequence; determining candidate frame information of a target in the current image according to the original boundary frame information and the original motion information; and tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image. Therefore, the candidate frame information of the target in the current image is determined according to the original boundary frame information and the original motion information of the target in the adjacent image before the current image, the effectiveness of the candidate frame of the target is improved, the target in the image is tracked according to the candidate frame information, and the success rate of target tracking is improved.

Description

Target tracking method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing and photoelectric tracking technologies, and in particular, to a retinal large cell pathway model and an optical flow algorithm, and more particularly, to a target tracking method, apparatus, electronic device, and storage medium.

Background

The photoelectric tracking system belongs to a passive reconnaissance, monitoring and tracking system, and can realize visual tracking, accurate guidance and the like of a moving target. The current photoelectric tracking system has increasingly increased application requirements in military, and is not only installed on dynamic military weapons such as fighters, ships and submarines, but also applied to military highpoints such as border, observation towers, and the like for remote monitoring and threat target tracking.

For a tracking algorithm of a photoelectric tracking system, the related filtering tracking algorithm has the characteristics of easiness in deployment, low computational complexity and low power consumption, and is widely applied to actual scenes. However, the target tracking algorithm based on the kernel correlation filtering depends on the selection of the target candidate frame to a certain extent, and is easily influenced by situations such as target instantaneous movement, rapid direction change and the like due to the limited target detection range of the algorithm, so that the target tracking is invalid.

Disclosure of Invention

The disclosure provides a target tracking method, a target tracking device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a target tracking method including: acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by image acquisition equipment; determining original bounding box information and original motion information of the target in adjacent images, which are positioned before the current image, in the image sequence aiming at the current image in the image sequence; determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information; and tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image.

According to another aspect of the present disclosure, there is provided a target tracking apparatus including: the acquisition module is used for acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by the image acquisition equipment; a first determining module, configured to determine, for a current image in the image sequence, original bounding box information and original motion information of the target in an adjacent image located before the current image in the image sequence; the first determining module is further configured to determine candidate frame information of the target in the current image according to the original bounding box information and the original motion information; and the tracking module is used for tracking the target in the current image according to the candidate frame information and determining the current boundary frame information of the target in the current image.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to the embodiments of the first aspect of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of the first aspect embodiment of the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in embodiments of the first aspect of the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a model of the retinal large cell pathway according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic diagram of candidate boxes for determining targets according to an embodiment of the disclosure;

FIG. 7 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a target tracking method according to an embodiment of the present disclosure;

fig. 10 is a schematic view of a sequence of images of a drone according to an embodiment of the present disclosure;

FIG. 11 is a schematic illustration of various contexts of a drone according to an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of target tracking under different algorithms according to an embodiment of the disclosure;

FIG. 13 (a) is a plot of accuracy of the improved method on a Drone dataset according to an embodiment of the present disclosure;

fig. 13 (b) is a graph of success rate of the KCF method on a Drone dataset according to an embodiment of the present disclosure;

FIG. 14 is a schematic diagram according to a seventh embodiment of the present disclosure;

fig. 15 is a block diagram of an electronic device used to implement object tracking of embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The following describes a target tracking method, apparatus, electronic device, and storage medium of an embodiment of the present disclosure with reference to the accompanying drawings.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure. It should be noted that the object tracking method according to the embodiment of the present disclosure may be applied to the object tracking apparatus according to the embodiment of the present disclosure, and the apparatus may be configured in an electronic device. The electronic device may be a mobile terminal, such as a mobile phone, a tablet computer, a personal digital assistant, or other hardware devices with various operating systems.

As shown in fig. 1, the target tracking method may include the steps of:

step 101, acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by image acquisition equipment.

In the embodiment of the disclosure, the image acquisition device acquires the image sequence by continuously acquiring the images of the target, and the target tracking device can be connected with the image acquisition device to acquire the image sequence.

Step 102, for a current image in the image sequence, original bounding box information and original motion information of an object in a neighboring image in the image sequence that precedes the current image are determined.

Further, the image sequence may be input into a retinal large cell pathway model, and for a current image in the image sequence, at a time of a first frame image in the current image non-image sequence, raw bounding box information of a target in an adjacent image in the image sequence located before the current image may be determined, wherein the raw bounding box information includes: original center point coordinate information and original size information. The original bounding box information, in combination with the retinal large cell pathway model and the optical flow algorithm, may then determine the original motion information of the target in the adjacent image in the image sequence that precedes the current image. Wherein, as shown in fig. 2, fig. 2 is a schematic diagram of a model of the retinal large cell pathway according to an embodiment of the present disclosure. The retina large cell path model can filter out static objects and tiny shake, so as to acquire a region with larger movement energy. The results from the retinal large cell pathway model output are binarized to form a mask M that is 1 where the motion is significant and 0 where the motion is not significant. Optical flow algorithms may include the PyrLK algorithm and the flownet2.0 algorithm.

And step 103, determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information.

Further, the center point coordinate information of the candidate frame of the target in the current image can be determined according to the original boundary frame information and the original motion information, meanwhile, the candidate size information of the target in the current image is determined according to the original boundary frame information and the preset multiple, and the candidate center point coordinate information and the candidate size information are used as the candidate frame information.

And 104, tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image.

Further, the target in the current image is tracked according to the candidate frame information, so that the current boundary frame information of the target in the current image can be determined, for example, the boundary frame information of the target in the current image can be determined according to the correlation degree between each pixel point in the region image defined by the candidate frame information and the target.

In summary, an image sequence is obtained by acquiring continuous images of a target by image acquisition equipment; determining original boundary box information and original motion information of a target in an adjacent image positioned in front of a current image in an image sequence aiming at the current image in the image sequence; determining candidate frame information of a target in the current image according to the original boundary frame information and the original motion information; and tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image. Therefore, the candidate frame information of the target in the current image is determined according to the original boundary frame information and the original motion information of the target in the adjacent image before the current image, the effectiveness of the candidate frame of the target is improved, the target in the image is tracked according to the candidate frame information, and the success rate of target tracking is improved.

In order to accurately determine the original motion information of the object in the neighboring image before the current image in the image sequence, as shown in fig. 3, fig. 3 is a schematic diagram according to a second embodiment of the present disclosure, in an embodiment of the present disclosure, original bounding box information of the object in the neighboring image before the current image, optical flow information of each pixel point in the neighboring image, and a motion mask image of the neighboring image may be determined for the current image in the image sequence, so as to determine the original motion information of the object in the neighboring image before the current image in the image sequence according to the original bounding box information of the object in the neighboring image, the optical flow information of each pixel point in the neighboring image, and the mask value of each pixel point. The embodiment shown in fig. 3 comprises the following steps:

step 301, acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by using an image acquisition device.

Step 302, for a current image in an image sequence, determining original bounding box information of a target in a neighboring image preceding the current image in the image sequence, and optical flow information of each pixel point in the neighboring image.

In embodiments of the present disclosure, a sequence of images may be input into a retinal large cell pathway model, and for a current image in the sequence of images, at a first frame image in a non-image sequence of the current image, raw bounding box information for a target in an adjacent image in the sequence of images that precedes the current image may be determined. Meanwhile, optical flow information of each pixel point in adjacent images positioned in front of the current image in the image sequence can be determined according to an optical flow algorithm.

Step 303, determining a moving area mask image of the adjacent image, wherein mask values of all pixel points in the moving area mask image represent the moving intensity of the target on the pixel points. The moving area mask image of the adjacent image is an output result obtained by inputting the adjacent image into the retina large cell path model.

Since the retina large cell path model can filter out static objects and tiny shakes, the area with larger movement energy is acquired. The results from the retinal large cell pathway model output are binarized to form a mask M, e.g., 1 where the motion is significant and 0 where the motion is not significant. In the embodiment of the disclosure, the adjacent image before the current image is input into the retina large cell path model, and the retina large cell path model can output a moving region mask image of the adjacent image, wherein mask values of all pixel points in the moving region mask image represent the moving strength of the target on the pixel points.

Step 304, determining original motion information according to original bounding box information of the target in the adjacent image, optical flow information of each pixel point in the adjacent image and mask values of each pixel point.

In order to accurately determine the original motion information of the target in the adjacent image before the current image, the optical flow condition of the adjacent image can be determined according to the optical flow information of each pixel point in the adjacent image before the current image, then, the candidate pixel points are determined according to the optical flow condition of the adjacent image before the current image and the mask value of the pixel points in the area defined by the original boundary frame information of the target in the adjacent image, and the average value of the optical flow information of the candidate pixel points is used as the original motion information of the target in the adjacent image before the current image.

Step 305, determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information.

And 306, tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image.

Steps 301, 305-306 may be implemented by any one of the embodiments of the present disclosure, which is not limited thereto, and is not repeated herein.

In order to accurately determine original motion information of a target in a neighboring image before a current image, as shown in fig. 4, fig. 4 is a schematic diagram according to a third embodiment of the present disclosure, in an embodiment of the present disclosure, an optical flow condition of the neighboring image may be determined according to optical flow information of each pixel point in the neighboring image before the current image, then, a candidate pixel point is determined according to an optical flow condition of the neighboring image before the current image and a mask value of a pixel point in an area defined by original bounding box information of the target in the neighboring image, and an average value of the optical flow information of the candidate pixel point is used as the original motion information of the target in the neighboring image before the current image, the embodiment shown in fig. 4 may include the steps of:

Step 401, acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by using image acquisition equipment.

Step 402, for a current image in an image sequence, determining original bounding box information of a target in a neighboring image preceding the current image in the image sequence, and optical flow information of each pixel point in the neighboring image.

Step 403, determining a moving area mask image of the adjacent image, wherein mask values of each pixel point in the moving area mask image represent the moving intensity of the object on the pixel point. The moving area mask image of the adjacent image is an output result obtained by inputting the adjacent image into the retina large cell path model.

Step 404, determining the optical flow condition of the adjacent image according to the optical flow information of each pixel point in the adjacent image.

In an embodiment of the disclosure, optical flow information (such as movement speed and movement direction information of each pixel point) of each pixel point in a previous adjacent image in a current image may be calculated according to an optical flow algorithm, and an optical flow condition of each pixel point in the adjacent image may be determined according to the optical flow information, where the optical flow condition may include: optical flow sparseness and optical flow denseness. The optical flow sparseness may be a pixel point with a mask value being a preset value and being a corner point, and the optical flow density may be a pixel point with a mask value being a preset value.

Step 405, determining candidate pixel points according to the optical flow condition of the adjacent images and the mask value of the pixel points in the area defined by the original boundary box information.

Optionally, determining the pixel points with the mask values corresponding to the areas as preset values and corner points as candidate pixel points; or, determining the pixel point with the mask value corresponding to the region as the preset value as the candidate pixel point.

That is, a pixel point (e.g., a pixel point having a mask value of 1 and being a corner point) having a mask value of a preset value and being a corner point corresponding to an area defined by the original bounding box information may be determined as a candidate pixel point; alternatively, a pixel having a mask value of a predetermined value (e.g., a pixel having a mask value of 1) corresponding to the region defined by the original bounding box information may be determined as a candidate pixel.

Step 406, taking the average value of the optical flow information of the candidate pixel points as the motion information.

Further, an average value of optical flow information of the candidate pixel points may be calculated, and the average value of optical flow information of the candidate pixel points may be used as original motion information of the target in the adjacent image before the current image. For example, the average value of the optical flow information of the candidate pixel points can be obtained by adding the optical flow information of the plurality of candidate pixel points and comparing the addition result with the number of the candidate pixel points.

Step 407, determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information.

And step 408, tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image.

Steps 401 to 403 and steps 407 to 408 may be implemented in any manner in each embodiment of the disclosure, which is not limited and not repeated herein.

In summary, determining the optical flow condition of the adjacent image according to the optical flow information of each pixel point in the adjacent image; determining candidate pixel points according to the optical flow condition of the adjacent images and the mask value of the pixel points in the area defined by the original boundary box information; the average value of the optical flow information of the candidate pixel points is taken as the motion information. Thus, the original motion information of the object in the neighboring image before the current image can be accurately determined.

In order to improve the effectiveness of the candidate boxes of the object, as shown in fig. 5, fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure. In the embodiment of the disclosure, the center point coordinate information of the candidate frame of the target in the current image can be determined according to the original boundary frame information and the original motion information, and meanwhile, the candidate size information of the target in the current image is determined according to the original boundary frame information and the preset multiple, and the candidate center point coordinate information and the candidate size information are used as the candidate frame information. The embodiment shown in fig. 5 may include the following steps:

Step 501, acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by image acquisition equipment.

Step 502, for a current image in an image sequence, determining original bounding box information and original motion information of an object in a neighboring image in the image sequence that precedes the current image.

Step 503, determining candidate center point coordinate information of the target in the current image according to the original center point coordinate information and the motion information.

In the disclosed embodiment, the original bounding box information includes: original center point coordinate information and original size information. Candidate center point coordinate information of the object in the current image can be determined according to original center point coordinate information in original boundary box information of the object in the adjacent image before the current image, in combination with original motion information (e.g., motion speed and direction information of each pixel of the object) of the object in the adjacent image.

And step 504, determining candidate size information of the target in the current image according to the original size information and the preset multiple.

Then, according to the original size information in the original boundary frame information of the target in the adjacent image before the current image, combining with the preset multiple, the candidate size information of the target in the current image can be determined. For example, the original size information in the original bounding box information is expanded by 1.5 times, so that the candidate size information of the target in the current image can be obtained.

And step 505, taking the candidate center point coordinate information and the candidate size information as candidate frame information.

Further, candidate center point coordinate information and candidate size information may be used as candidate frame information.

For example, as shown in fig. 6, an image sequence is input into a retinal large cell pathway model, for an nth frame image in the image sequence, when the nth frame image is a first frame image in a non-image sequence, original boundary frame information of a target in the nth-1 frame image in the image sequence is determined, such as white solid frames (x, y, w, h) of the nth-1 frame image in fig. 6, wherein (x, y) represents a position of a center of the rectangular solid frame, w and h represent a width and height of the rectangular frame respectively, optical flow information of each pixel point of the target in the nth-1 frame image can be determined through an optical flow algorithm, then a moving region mask image can be obtained through the retinal large cell pathway model, a case of the nth-1 frame image can be determined according to the optical flow information of each pixel point in the nth-1 frame image, then a value of a pixel point in an area defined by the original boundary frame information of the nth-1 frame image and the optical flow information of the nth-1 frame image is determined, the candidate pixel points are determined according to the optical flow case of the nth-1 frame image, the optical flow information of the candidate pixel points are taken as an average value of the pixel points in the area defined by the original boundary frame information of the nth-1 frame image, the original boundary frame image in the n-1 frame image, and the optical flow information of the n-1 frame image is combined with the original coordinate information of the original pixel point in the moving region information in the original frame image (the original frame image, as the original coordinate information of the moving region 1) of the target frame image in the frame image, as the original frame 1 image in the original frame image, Velocity and direction information of individual pixels of the object, e.g.) The candidate center point coordinate information of the object in the nth frame image can be determined, the original size information (such as the width and height (w, h) of the white solid frame) in the original boundary frame information is expanded by a preset multiple (such as 2.5 times), the candidate size information of the object in the nth frame image can be obtained, and the candidate center point coordinate information and the candidate size information are used as candidate frame information, such as the white dotted frame of the nth frame image in fig. 6, the candidate frame informationWherein (x ', y') is the center position of the candidate frame of the object in the current frame image, and (w ', h') is the width and height of the candidate frame of the object in the current frame image.

And step 506, tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image.

Steps 501-502, 506 may be implemented in any manner in each embodiment of the disclosure, which is not limited thereto, and is not repeated herein.

In summary, the candidate center point coordinate information of the target in the current image is determined according to the original center point coordinate information and the motion information; according to the original size information and the preset multiple, determining candidate size information of a target in the current image; and taking the candidate center point coordinate information and the candidate size information as candidate frame information. Thereby, the effectiveness of the candidate frame of the object is improved.

In order to improve the success rate of target tracking, fig. 7 is a schematic diagram according to a fifth embodiment of the present disclosure, as shown in fig. 7. In the embodiment of the disclosure, the bounding box information of the target in the current image can be determined according to the correlation degree between each pixel point in the region image defined by the candidate box information and the target. The embodiment shown in fig. 7 may include the following steps:

step 701, acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by image acquisition equipment.

Step 702, for a current image in an image sequence, determining original bounding box information and original motion information of a target in a neighboring image in the image sequence that precedes the current image.

Step 703, determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information.

Step 704, determining an area image defined by the candidate frame information in the current image.

In the embodiment of the present disclosure, after the candidate frame information of the target in the current image is determined, the region image defined by the candidate frame information in the current image may be determined according to the candidate frame information of the target (e.g., the center coordinate point information of the candidate frame of the target and the candidate size information of the target).

Step 705, determining the correlation degree between each pixel point in the area image and the target according to the area image and the kernel correlation filter.

Further, the correlation degree between each pixel point in the area image and the target can be determined by combining the area image with kernel correlation filtering (Kernelized Correlation Filters, abbreviated as KCF).

And step 706, determining current boundary box information according to the correlation degree between each pixel point in the regional image and the target.

In the embodiment of the disclosure, the positions of the pixels with larger correlation with the target in the region image can be obtained, and the current boundary box information can be determined according to the positions of the pixels.

Steps 701-703 may be implemented by any one of the embodiments of the present disclosure, which is not limited thereto, and is not repeated herein.

In order to better achieve the target tracking, as shown in fig. 8, fig. 8 is a schematic diagram according to a sixth embodiment of the disclosure, in an embodiment of the disclosure, rotation control may be performed on the image capturing device to achieve the target tracking, and the embodiment shown in fig. 8 may include the following steps:

step 801, acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by using image acquisition equipment.

Step 802, for a current image in an image sequence, determining original bounding box information and original motion information of an object in a neighboring image in the image sequence that precedes the current image.

Step 803, determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information.

Step 804, tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image.

Step 805, determining current motion information of an object in a current image.

Step 806, determining the predicted central point coordinate information of the target after the delay time period according to the current central point coordinate information and the current motion information in the current boundary frame information.

It should be understood that, in order to make the target in the middle of the image, the image acquisition device needs to be controlled to rotate, and there is time lag in rotating the image acquisition device, when the image acquisition device is directly guided to rotate by using the tracking result of the current target, the motion of the image acquisition device is delayed from the motion of the target, so that the image acquisition device can be guided to rotate according to the predicted central point coordinate information of the target after the delay period.

Optionally, the current center point coordinate information and the current motion information in the current bounding box information are predicted by using kalman filtering, which can be expressed in the following formula:

wherein,represents coordinate information and motion information of a central point at the moment t, < >>Is the central point coordinate information and the motion information at the time t-1, A is a preset state transition matrix, W _t-1 Is a preset control matrix.

Further, according to the current center point coordinate information and the current motion information in the current boundary box information, the predicted center point coordinate information of the target after the delay time period is predicted by using Kalman filtering.

Step 807, determining rotation information of the image acquisition device according to the predicted center point coordinate information of the target after the delay period and the current center point coordinate information.

Optionally, determining movement information of the target in the delay time period according to the predicted central point coordinate information and the current central point coordinate information of the target after the delay time period; and determining rotation information of the image acquisition equipment according to the movement information and the movement information of the image acquisition equipment in a unit angle.

For example, the abscissa and ordinate of the predicted center point coordinate information of the target after the delay period can be subtracted from the abscissa and ordinate of the current center point coordinate information, respectively, and the subtraction result can be compared with the number of moving pixels of the target on the image, so that the movement information of the target in the delay period can be determined (for example, Δx/n and Δy/n, Δx=v) _x Δt，Δy＝v _y Δt), assuming that the movement information of the image capturing apparatus per unit angle is 1 degree, the rotation information of the image capturing apparatus may include the rotation angles of the image capturing apparatus in the x-direction and the y-direction, such as

And step 808, performing rotation control on the image acquisition equipment according to the rotation information to realize tracking of the target.

Furthermore, the rotation of the image acquisition device can be controlled according to the rotation angles of the image acquisition device in the x direction and the y direction so as to track the target.

In order to better illustrate the above embodiments, an example will now be described.

As shown in fig. 9, the image sequence is input into the retinal large cell pathway model and the optical flow algorithm to obtain the motion information of the target, the candidate frame of the target is calculated according to the target position and the motion information at the previous moment, the target search area can be adjusted according to the candidate frame of the target, then the center position and the boundary frame of the target are obtained by adopting the target tracking algorithm (such as KCF), and then the position of the target in the subsequent frame is predicted by utilizing the center position and the kalman filter of the target, and the servo cradle head is rotated to the position.

In summary, determining current motion information of a target in a current image; according to the current central point coordinate information and the current motion information in the current boundary frame information, determining the predicted central point coordinate information of the target after the delay time period; determining rotation information of the image acquisition equipment according to the predicted central point coordinate information of the target after the delay time period and the current central point coordinate information; and according to the rotation information, performing rotation control on the image acquisition equipment to realize tracking of the target. Therefore, the rotation control is carried out on the image acquisition equipment, and the target tracking can be better realized.

To improve usability and feasibility of embodiments of the disclosure, examples will now be described.

For example, as shown in fig. 10, taking an unmanned aerial vehicle as a target and an image acquisition device as a servo holder as an example, according to an embodiment of the present disclosure, the candidate frame information tracks the target in the current image, and the position of the target after the kalman filtering prediction delay period guides the servo holder to rotate, and the result of video acquisition is shown in fig. 10, where the unmanned aerial vehicle is substantially located at the center position.

It should be noted that the experiment was performed by using a self-built unmanned aerial vehicle infrared data set, which has a total of 20 videos, and the data set is mainly divided into three cases, as shown in fig. 11: from left to right are a forest background, a building background and a cloud layer background, respectively. FIG. 12 is a graph showing the results of the proposed algorithm and KCF partial video frame tracking over an infrared dataset. In the optical flow calculation, two algorithms, pyrLK and flownet2.0, were used to perform experiments. The third row and the fourth row in fig. 12 respectively represent a candidate frame for determining the target by calculating the average value of the optical flow by using the PyrLK algorithm and the flownet2.0 algorithm, and the method for obtaining the current boundary frame of the target by using the kernel correlation filtering method can track the target when the motion and the direction of the target change better than the KCF method. Fig. 13 (a) and 13 (b) are graphs of accuracy and success rate of the improved method and KCF method on the Drone dataset, with the horizontal axis being the threshold and the vertical axis being the percentage. Wherein a blue solid line (refer to a line represented by kcf_pyrlk_magno in fig. 13 (a) and 13 (b)) represents a method of calculating a motion region within a target bounding box of a previous frame using a retinal large cell pathway, calculating an average optical flow of corner points within the region using a PyrLK algorithm to determine a candidate box of a target, and determining a current bounding box of the target using a kernel correlation filtering algorithm. The orange dotted line (refer to the line represented by kcf_flownet2.0_magno in fig. 13 (a) and 13 (b)) represents a method of calculating a motion region within a target bounding box of a previous frame using a retinal large cell pathway, calculating an average optical flow of corner points within the region using a flownet2.0 algorithm to determine a candidate box of the target, and determining a current bounding box of the target using a kernel correlation filtering algorithm. The green dotted line and the red dotted line (refer to kcf_flownet2.0 and kcf_pyrlk in fig. 13 (a) and 13 (b)) respectively represent a method for determining a candidate frame by calculating average optical flow in a target bounding box of a previous frame by using a flownet2.0 algorithm and a PyrLK algorithm, and determining a bounding box of a target in a current frame image by using a kernel correlation filtering algorithm, which is an ablation experiment. As can be seen from fig. 13 (a) and 13 (b), when the moving area in the boundary frame of the target of the previous frame is calculated by using the retinal macropocyte pathway, and the average optical flow of the corner points in the area is calculated by using the PyrLK algorithm to determine the candidate frame of the target, and then the boundary frame of the target is determined by using the kernel correlation filtering algorithm, the accuracy and the success rate are respectively improved by 10.3% and 7.1% compared with the KCF baseline method; when the optical flow algorithm is changed into FlowNet2.0, compared with the KCF baseline method, the accuracy and the success rate of the method are respectively improved by 5.9 percent and 4.8 percent. When the retina large cell path model is not used for calculating the motion area, the PyrLK algorithm or the FlowNet2.0 algorithm is directly used for calculating the average optical flow in the target boundary frame of the previous frame, the optical flow is used for obtaining the target candidate frame in the current frame, and the KCF method is used for calculating the more accurate target position, so that compared with the nuclear correlation filtering method before improvement, the accuracy is respectively improved by 2.8 percent, 3.1 percent, and the success rate is respectively improved by 3.5 percent and 2.6 percent. The magnitude of this increase is less than the result of using the retinal large cell pathway model.

According to the target tracking method, an image sequence is obtained, wherein the image sequence is obtained by continuously acquiring images of a target by image acquisition equipment; determining original boundary box information and original motion information of a target in an adjacent image positioned in front of a current image in an image sequence aiming at the current image in the image sequence; determining candidate frame information of a target in the current image according to the original boundary frame information and the original motion information; and tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image. Therefore, the candidate frame information of the target in the current image is determined according to the original boundary frame information and the original motion information of the target in the adjacent image before the current image, the effectiveness of the candidate frame of the target is improved, the target in the image is tracked according to the candidate frame information, and the success rate of target tracking is improved.

In order to implement the above-described embodiments, the present disclosure proposes a further object tracking device.

Fig. 14 is a schematic diagram according to a seventh embodiment of the present disclosure. As shown in fig. 14, the object tracking device 1400 includes: an acquisition module 1410, a first determination module 1420, a tracking module 1430.

The acquiring module 1410 is configured to acquire an image sequence, where the image sequence is obtained by performing continuous image acquisition on a target by using an image acquisition device; a first determining module 1420 configured to determine, for a current image in the image sequence, original bounding box information and original motion information of a target in a neighboring image preceding the current image in the image sequence; the first determining module 1420 is further configured to determine candidate frame information of the target in the current image according to the original bounding box information and the original motion information; and the tracking module 1430 is used for tracking the target in the current image according to the candidate frame information and determining the current boundary frame information of the target in the current image.

As one possible implementation of the embodiments of the present disclosure, the first determining module 1420 is specifically configured to: determining original boundary box information of a target in an adjacent image positioned in front of the current image in the image sequence and optical flow information of each pixel point in the adjacent image aiming at the current image in the image sequence; determining a moving area mask image of an adjacent image, wherein mask values of all pixel points in the moving area mask image represent the moving intensity of a target on the pixel points; the moving area mask image of the adjacent image is an output result obtained by inputting the adjacent image into the retina large cell path model; and determining the original motion information according to the original boundary box information of the target in the adjacent image, the optical flow information of each pixel point in the adjacent image and the mask value of each pixel point.

As one possible implementation of an embodiment of the present disclosure, the first determining module 1420 is further configured to: determining the optical flow condition of the adjacent images according to the optical flow information of each pixel point in the adjacent images; determining candidate pixel points according to the optical flow condition of the adjacent images and the mask value of the pixel points in the area defined by the original boundary box information; the average value of the optical flow information of the candidate pixel points is taken as the motion information.

As one possible implementation of an embodiment of the present disclosure, the first determining module 1420 is further configured to: the pixel points with the mask values corresponding to the areas being preset values and corner points are determined to be candidate pixel points; or determining the pixel point with the mask value corresponding to the region as the preset value as the candidate pixel point.

As one possible implementation of the embodiments of the present disclosure, the raw bounding box information includes: original center point coordinate information and original size information; the first determining module 1420 is further configured to: according to the original center point coordinate information and the motion information, determining candidate center point coordinate information of a target in the current image; according to the original size information and the preset multiple, determining candidate size information of a target in the current image; and taking the candidate center point coordinate information and the candidate size information as candidate frame information.

As one possible implementation of the embodiments of the present disclosure, the tracking module 1430 is specifically configured to: determining an area image defined by candidate frame information in the current image; according to the regional image and the kernel correlation filter, determining the correlation degree between each pixel point in the regional image and the target; and determining the current boundary box information according to the correlation degree between each pixel point in the regional image and the target.

As one possible implementation of the embodiments of the present disclosure, the target tracking apparatus further includes: the device comprises a second determining module, a third determining module, a fourth determining module and a control module.

The second determining module is used for determining current motion information of the target in the current image; the third determining module is used for determining the predicted central point coordinate information of the target after the delay time period according to the current central point coordinate information and the current motion information in the current boundary frame information; the fourth determining module is used for determining rotation information of the image acquisition equipment according to the predicted central point coordinate information of the target after the delay time period and the current central point coordinate information; and the control module is used for carrying out rotation control on the image acquisition equipment according to the rotation information so as to realize the tracking of the target.

As one possible implementation manner of the embodiments of the present disclosure, the fourth determining module is specifically configured to: determining movement information of the target in the delay time period according to the predicted central point coordinate information and the current central point coordinate information of the target after the delay time period; and determining rotation information of the image acquisition equipment according to the movement information and the movement information of the image acquisition equipment in a unit angle.

The object tracking device of the embodiment of the disclosure is obtained by acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of an object by image acquisition equipment; determining original boundary box information and original motion information of a target in an adjacent image positioned in front of a current image in an image sequence aiming at the current image in the image sequence; determining candidate frame information of a target in the current image according to the original boundary frame information and the original motion information; and tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image. Therefore, the candidate frame information of the target in the current image is determined according to the original boundary frame information and the original motion information of the target in the adjacent image before the current image, the effectiveness of the candidate frame of the target is improved, the target in the image is tracked according to the candidate frame information, and the success rate of target tracking is improved.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 15 illustrates a schematic block diagram of an example electronic device 1500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 15, the apparatus 1500 includes a computing unit 1501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1502 or a computer program loaded from a storage unit 1508 into a Random Access Memory (RAM) 1503. In the RAM 1503, various programs and data required for the operation of the device 1500 may also be stored. The computing unit 1501, the ROM 1502, and the RAM 1503 are connected to each other through a bus 1504. An input/output (I/O) interface 1505 is also connected to bus 1504.

Various components in device 1500 are connected to I/O interface 1505, including: an input unit 1506 such as a keyboard, mouse, etc.; an output unit 1507 such as various types of displays, speakers, and the like; a storage unit 1508 such as a magnetic disk, an optical disk, or the like; and a communication unit 1509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1509 allows the device 1500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The computing unit 1501 performs the respective methods and processes described above, for example, the object tracking method. For example, in some embodiments, the target tracking method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1500 via the ROM 1502 and/or the communication unit 1509. When the computer program is loaded into the RAM1503 and executed by the computing unit 1501, one or more steps of the object tracking method described above may be performed. Alternatively, in other embodiments, the computing unit 1501 may be configured to perform the target tracking method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

The computer system may also include a brain-like chip, which may be a non-von neumann system computing device.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A target tracking method, comprising:

acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by image acquisition equipment;

for a current image in the image sequence, determining original bounding box information and original motion information of the target in an adjacent image, which is positioned before the current image, in the image sequence:

Determining original bounding box information of the target in an adjacent image, which is positioned before the current image, in the image sequence and optical flow information of each pixel point in the adjacent image aiming at the current image in the image sequence;

determining a moving area mask image of the adjacent image, wherein mask values of all pixel points in the moving area mask image represent the moving intensity of the target on the pixel points; the moving area mask image of the adjacent image is an output result obtained by inputting the adjacent image into a retina large cell path model;

determining the original motion information according to the original boundary box information of the target in the adjacent image, the optical flow information of each pixel point in the adjacent image and the mask value of each pixel point;

determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information;

and tracking the target in the current image according to the candidate frame information, and determining the current boundary frame information of the target in the current image.

2. The method of claim 1, wherein the determining the original motion information from original bounding box information of the object in the neighboring image, optical flow information of each pixel point in the neighboring image, and a mask value of each pixel point comprises:

Determining the optical flow condition of the adjacent images according to the optical flow information of each pixel point in the adjacent images;

determining candidate pixel points according to the optical flow condition of the adjacent images and the mask value of the pixel points in the area defined by the original boundary box information;

and taking the average value of the optical flow information of the candidate pixel points as the motion information.

3. The method of claim 2, wherein the determining candidate pixels according to the optical flow condition of the adjacent image and the mask value of pixels in the area defined by the original bounding box information comprises:

determining the pixel points which are corresponding to the mask values in the region as preset values and are corner points as candidate pixel points; or determining the pixel point with the mask value corresponding to the region as the preset value as the candidate pixel point.

4. The method of claim 1, wherein the original bounding box information comprises: original center point coordinate information and original size information;

the determining candidate frame information of the target in the current image according to the original boundary frame information and the original motion information comprises the following steps:

According to the original center point coordinate information and the motion information, candidate center point coordinate information of the target in the current image is determined;

according to the original size information and a preset multiple, determining candidate size information of the target in the current image;

and taking the candidate center point coordinate information and the candidate size information as the candidate frame information.

5. The method of claim 1, wherein the tracking the object in the current image from the candidate frame information, determining current bounding frame information for the object in the current image, comprises:

determining an area image defined by the candidate frame information in the current image;

determining the correlation degree between each pixel point in the area image and the target according to the area image and the kernel correlation filter;

and determining the current boundary box information according to the correlation degree between each pixel point in the regional image and the target.

6. The method of any one of claims 1 to 5, wherein the method further comprises:

determining current motion information of the target in the current image;

determining predicted center point coordinate information of the target after a delay time period according to the current center point coordinate information in the current boundary frame information and the current motion information;

Determining rotation information of the image acquisition equipment according to the predicted central point coordinate information and the current central point coordinate information of the target after the delay time period;

and according to the rotation information, performing rotation control on the image acquisition equipment to track the target.

7. The method of claim 6, wherein the determining rotation information of the image capturing device according to the predicted center point coordinate information and the current center point coordinate information of the target after a delay period comprises:

determining movement information of the target in the delay time period according to the predicted central point coordinate information and the current central point coordinate information of the target after the delay time period;

and determining rotation information of the image acquisition equipment according to the movement information and the movement information of the image acquisition equipment in a unit angle.

8. An object tracking device comprising:

the acquisition module is used for acquiring an image sequence, wherein the image sequence is obtained by continuously acquiring images of a target by the image acquisition equipment;

a first determining module, configured to determine, for a current image in the image sequence, original bounding box information and original motion information of the target in an adjacent image located before the current image in the image sequence;

the first determining module is further configured to determine candidate frame information of the target in the current image according to the original bounding box information and the original motion information;

and the tracking module is used for tracking the target in the current image according to the candidate frame information and determining the current boundary frame information of the target in the current image.

9. The apparatus of claim 8, wherein the first determining module is further configured to:

10. The apparatus of claim 9, wherein the first determining module is further configured to:

11. The apparatus of claim 8, wherein the original bounding box information comprises: original center point coordinate information and original size information;

the first determining module is further configured to:

12. The apparatus of claim 8, wherein the tracking module is specifically configured to:

13. The apparatus of any of claims 8-12, wherein the apparatus further comprises:

the second determining module is used for determining current motion information of the target in the current image;

the third determining module is used for determining the predicted central point coordinate information of the target after the delay time period according to the current central point coordinate information in the current boundary frame information and the current motion information;

a fourth determining module, configured to determine rotation information of the image acquisition device according to the predicted center point coordinate information of the target and the current center point coordinate information after the delay period;

And the control module is used for carrying out rotation control on the image acquisition equipment according to the rotation information so as to realize tracking of the target.

14. The apparatus of claim 13, wherein the fourth determining module is specifically configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.