WO2022127180A1 - 目标跟踪方法、装置、电子设备及存储介质 - Google Patents

目标跟踪方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022127180A1
WO2022127180A1 PCT/CN2021/114904 CN2021114904W WO2022127180A1 WO 2022127180 A1 WO2022127180 A1 WO 2022127180A1 CN 2021114904 W CN2021114904 W CN 2021114904W WO 2022127180 A1 WO2022127180 A1 WO 2022127180A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
target
image
information corresponding
frame information
Prior art date
Application number
PCT/CN2021/114904
Other languages
English (en)
French (fr)
Inventor
王智卓
曾卓熙
陈宁
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2022127180A1 publication Critical patent/WO2022127180A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to the field of artificial intelligence, and in particular, to a target tracking method, device, electronic device and storage medium.
  • intelligent video surveillance With the ever-increasing population and the dramatic expansion of cities, intelligent video surveillance has become critical.
  • the so-called intelligent video supervision means that the front-end camera can perform some tasks such as detection and tracking, and directly extract and save the data we need.
  • multi-target tracking models most of these models rely on inter-frame IOU and Hungarian matching to complete the tracking of target trajectories.
  • the IOU becomes unreliable, and multiple detection frames may have similar IOU values, especially in crowded scenes, the detection error rate is relatively high. high.
  • Hungarian matching is an optimal matching algorithm, in this case it will still select a detection frame as the final result, that is, when the optimal detection frame is an error detection frame, it will still select the optimal detection frame.
  • the detection frame will cause the selected detection frame to be an incorrect detection frame, which will bring greater errors to the subsequent tracking algorithm. Therefore, the existing target tracking algorithms have the problem of detection accuracy.
  • the embodiment of the present invention provides a target tracking method, which can improve the detection accuracy of multi-target tracking.
  • an embodiment of the present invention provides a target tracking method, and the method includes:
  • the matching of the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image includes:
  • the matching is a continuous trajectory
  • the matching is a disconnected trajectory.
  • the target detection frame information includes a detection frame parameter and a detection frame image
  • the target prediction frame information includes a prediction frame parameter and a prediction frame image
  • the correlation includes a frame correlation and a feature correlation
  • the calculating The correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image including:
  • the detection frame parameter and the prediction frame parameter calculate the frame correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image;
  • the detection frame image and the prediction frame image calculate the feature correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image;
  • the correlation degree between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image is calculated.
  • the detection frame parameters include the coordinates of the center point of the detection frame, the area of the detection frame, and the aspect ratio of the detection frame
  • the prediction frame parameters include the coordinates of the center point of the prediction frame, the area of the prediction frame, and the aspect ratio of the prediction frame.
  • the detection frame area and the prediction frame area calculate the area correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image;
  • the aspect ratio of the detection frame and the aspect ratio of the prediction frame calculate the shape correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image Spend;
  • the area correlation degree and the shape correlation degree calculate the difference between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image Box correlation.
  • reconnecting the disconnected track includes:
  • the correlation between the first disconnection trajectory and the second disconnection trajectory is calculated and obtained
  • the method further includes:
  • first disconnected track and the second disconnected track do not have the same image frame identifier, the first disconnected track and the second disconnected track are reconnected.
  • the first disconnected track and the second disconnected track are Disconnect traces to reconnect, including:
  • the first disconnected trajectory of the current target and the second disconnected trajectory are compared with each other.
  • the second disconnection trajectory is reconnected to obtain the first reconnection trajectory
  • the filtering rule includes: judging whether the length of the first reconnection track has reached a preset length, judging whether the image quality of the first reconnection track has reached a preset image quality, and judging whether the first reconnection track has reached a preset image quality. Whether the target size in the even trajectory reaches at least one of the preset target sizes.
  • an embodiment of the present invention further provides a target tracking device, the device comprising:
  • the extraction module is used to extract the target detection frame information and target prediction frame information of each frame of image in the image sequence to be processed;
  • a matching module for matching the target detection frame information corresponding to the n+1 frame image and the target prediction frame information corresponding to the n frame image, and judging whether the matching result includes a disconnected track;
  • a reconnection module configured to reconnect the disconnected trajectory if there is the disconnected trajectory, obtain a reconnection result, and obtain a corresponding target tracking sequence based on the reconnection result.
  • an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, when the processor executes the computer program The steps in the target tracking method provided by the embodiment of the present invention are implemented.
  • embodiments of the present invention provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the target tracking method provided by the embodiments of the present invention. step.
  • the target detection frame information and target prediction frame information of each frame image in the image sequence to be processed are extracted; the target detection frame information corresponding to the n+1 th frame image and the target prediction frame corresponding to the n th frame image are extracted. information, and determine whether the matching result includes a disconnected track; if there is the disconnected track, reconnect the disconnected track to obtain a reconnection result, and based on the reconnection result, obtain the corresponding target tracking sequence.
  • the detection frame has a priori information during the trajectory tracking process, which improves the accuracy of the detection frame, and can reconnect the disconnected trajectories to improve the detection accuracy of target tracking.
  • FIG. 1 is a flowchart of a target tracking method provided by an embodiment of the present invention.
  • FIG. 2 is a flowchart of a method for calculating frame correlation provided by an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for reconnecting a trajectory according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a target tracking device provided by an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a matching module provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a first computing submodule provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a first computing unit provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a reconnection module provided by an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of another reconnection module provided by an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a first reconnection submodule provided by an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • FIG. 1 is a flowchart of a target tracking method provided by an embodiment of the present invention. As shown in FIG. 1, the following steps are included:
  • the above-mentioned image sequence to be processed may be a video image captured by a camera in real time, for example, a video image of the specific monitoring scene is captured in real time by a camera installed in a specific monitoring scene. Further, the camera may be set in the specific monitoring scene. At a certain height of the scene, the target in the specific monitoring scene is captured in real time. It may also be a video image uploaded by a user, and the above-mentioned image sequence refers to frame images obtained in time series.
  • the above-mentioned image sequence to be processed includes a target to be tracked, and the above-mentioned target to be tracked may be a moving target, and the above-mentioned moving target may be a target such as a pedestrian, a vehicle, and an animal that can generate a moving trajectory.
  • the above-mentioned target to be tracked may be one or more.
  • the above-mentioned target detection frame information can be used to detect the target to be tracked through the target detection network.
  • the above-mentioned target detection network is already trained.
  • the above-mentioned target detection network can be obtained by training the user through the sample target data set, or it can be obtained by downloading.
  • the network structure and parameters of the target detection network are obtained after fine-tuning training on the sample target data set.
  • the input of the target detection network is a frame image in the image sequence to be processed
  • the output is the detection frame information of the target to be tracked in the corresponding frame image
  • the detection frame information output by the target detection network may include the to-be-tracked frame information.
  • the above position information may be information in the format of det(x, y, w, h), wherein the above x and y represent the coordinates of the center point of the detection frame in the corresponding frame image, and the above w and h respectively represent the detection frame in the corresponding frame.
  • the above confidence level information is used to indicate the degree of confidence that the image content in the detection frame is the target to be tracked.
  • the higher the confidence degree the higher the degree of confidence that the content of the image in the detection frame is the target to be tracked.
  • the above target detection network may be a network constructed based on the CenterNet target detection algorithm.
  • preset processing may be performed on the video image captured by the camera or the video image uploaded by the user.
  • the preset processing may be to extract frame images.
  • every preset frame One frame is taken as a frame image in the image to be processed, so that the redundancy between adjacent frames can be reduced and the calculation speed of target tracking can be improved.
  • the above-mentioned preset number of frames may be set according to user needs.
  • the preset number of frames is 4 frames, that is, one frame is taken every 4 frames as a frame image in the image to be processed.
  • the above-mentioned target prediction frame information can be used to predict the target position of the target to be tracked through a target prediction network, and the above-mentioned target prediction network is already trained, which can be obtained by the user's own training, or can be obtained by downloading
  • the network structure and parameters of the target detection network are obtained after fine-tuning training on the sample target data set, and the above target prediction network may be a network constructed based on the Kalman filter algorithm.
  • the input of the target prediction network is a frame image in the image sequence to be processed
  • the output is the prediction frame information of the target to be tracked in the corresponding frame image in the next frame
  • the prediction frame information output by the target prediction network It may include position information and confidence information of the target to be tracked in the next frame of image.
  • the above position information can be information in the format of pre(x, y, w, h), wherein the above x and y represent the coordinates of the center point of the detection frame in the next frame of image, and the above w and h respectively represent the detection frame in the next frame.
  • the width and height in the image can be information in the format of pre(x, y, w, h), wherein the above x and y represent the coordinates of the center point of the detection frame in the next frame of image, and the above w and h respectively represent the detection frame in the next frame. The width and height in the image.
  • the target detection frame information corresponding to the nth frame image and the target prediction frame information corresponding to the nth frame image will be output;
  • the target detection frame information corresponding to the n+1 frame image and the target prediction frame information corresponding to the n+1 frame image are output.
  • the target prediction frame information corresponding to the nth frame image can be understood as the prediction of the target detection frame information corresponding to the n+1th frame image
  • the target prediction frame information corresponding to the n+1th frame image can be understood as the prediction of the target frame information corresponding to the n+1th frame image. Prediction of target detection frame information corresponding to n+2 frame images.
  • the target prediction frame information corresponding to the nth frame image can be understood as the prediction of the target detection frame information corresponding to the n+1th frame image, and the above-mentioned target detection frame corresponding to the n+1th frame image
  • the purpose of matching the information with the target prediction frame information corresponding to the nth frame image can be understood as whether the detection result is the same or similar to the prediction result, and then it is judged whether a false detection occurs.
  • the above matching results include a continuous track and a disconnected track.
  • the target detection frame information corresponding to the n+1 th frame image matches the target prediction frame information corresponding to the n th frame image, it means that no false detection has occurred, and the matching result It is a continuous trajectory, that is, the target detection frame information corresponding to the n+1 frame image is added to the previous trajectory; when the target detection frame information corresponding to the n+1 frame image does not match the target prediction frame information corresponding to the n frame image, It means that a false detection has occurred, and the matching result is a disconnected trajectory, that is, the connection between the target detection frame information corresponding to the n+1th frame image and the previous trajectory is disconnected, and the target detection frame information corresponding to the n+1th frame image is used as the new one.
  • the starting point of the track At this time, the target detection frame information corresponding to the nth frame image is used as the end point of the previous track.
  • the correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image can be calculated, and the target detection frame corresponding to the n+1th frame image that satisfies the matching conditions can be determined.
  • the matching condition is that the correlation is greater than or equal to the first preset correlation threshold; if the target detection corresponding to the n+1th frame image that satisfies the matching condition If there is a one-to-one correspondence between the frame information and the target prediction frame information corresponding to the nth frame image, the matching is a continuous trajectory; if the target detection frame information corresponding to the n+1th frame image that satisfies the matching conditions is the same as the nth frame image. If there is no one-to-one correspondence between the target prediction frame information, the matching is a disconnected trajectory.
  • the target detection frame information corresponding to the n+1th frame image by judging whether there is a one-to-one correspondence between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image, it can be further judged whether there is a false detection, for example, an nth frame image
  • the correlation between the target detection frame information corresponding to the +1 frame image and the target prediction frame information corresponding to the nth frame images satisfies the matching condition, it can be shown that the target detection frame information corresponding to the n+1th frame image matches the Information about the target prediction frame corresponding to multiple nth frame images.
  • the correlation between the target prediction frame information corresponding to the n-th frame image and the target detection frame information corresponding to multiple n+1-th frame images satisfies the matching condition, then it can indicate the target prediction frame information corresponding to the n-th frame image.
  • Match to the target detection frame information corresponding to multiple n+1th frame images are false detection cases, and the matching is a disconnected track at this time. Only when the correlation between the target prediction frame information corresponding to the n-th frame image and the target detection frame information corresponding to the n+1-th frame image only satisfies the matching condition (ie, one-to-one correspondence), it can be considered that there is no error Check the situation and assign it the same ID.
  • the target detection frame information includes detection frame parameters and detection frame images
  • the target prediction frame information includes prediction frame parameters and prediction frame images
  • the correlation includes frame correlation and feature correlation.
  • the above detection frame parameters are used to represent the position, shape and size of the target detection frame in the corresponding frame image
  • the above detection frame image is used to represent the detection frame content (also referred to as target image) of the target detection frame in the corresponding frame image.
  • the above-mentioned prediction frame parameters are used to represent the position, shape and size of the target prediction frame in the corresponding frame image
  • the above-mentioned prediction frame image is used to represent the prediction frame content of the target prediction frame in the corresponding frame image (also referred to as the target frame image). predicted image).
  • the frame correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image can be calculated according to the above-mentioned detection frame parameters and the above-mentioned prediction frame parameters;
  • the detection frame image and the above prediction frame image calculate the feature correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image;
  • the correlation degree is calculated by calculating the degree of correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the above n th frame image.
  • the above-mentioned detection frame image can be obtained from the corresponding frame image according to the target detection frame information.
  • the detection frame image can be obtained from the corresponding frame image according to the position information in the det(x, y, w, h) format, such as the nth frame.
  • the detection frame image of can be obtained in the nth frame image according to the position information in det(x, y, w, h) format.
  • the above-mentioned prediction frame image can be obtained in the corresponding frame image according to the target prediction frame information.
  • the prediction frame image can be obtained in the corresponding frame image according to the position information in the pre(x, y, w, h) format, such as
  • the prediction frame image of the nth frame can be obtained from the n+1th frame image according to the position information in the pre(x, y, w, h) format.
  • feature extraction can be performed on the detection frame image and the prediction frame image through a feature extraction network to obtain the detection frame image feature and the prediction frame image feature, and the detection frame image feature and the prediction frame image feature are calculated.
  • the similarity between the image features of the prediction frame is taken as the feature correlation.
  • the size of the detection frame image and the prediction frame image can be adjusted, and the detection frame image and the prediction frame image can be adjusted to the predicted size, for example, to a size of 256 ⁇ 128.
  • the above-mentioned feature extraction network may be constructed based on the Re-ID network, and the feature extraction network of the embodiment of the present invention is obtained by reducing the weight of the Re-ID network.
  • the detection frame image and the prediction frame image are respectively input into the above feature extraction network to extract image features.
  • the expression of the above feature extraction network can be as follows:
  • the above f is the image feature
  • the above F is the feature extraction network
  • the above ⁇ b is the parameter of the feature extraction network.
  • the above frame correlation is used to represent the correlation degree between the target prediction frame and the target detection frame in the two dimensions of shape and distance. Compared with the traditional IOU intersection ratio calculation, the frame correlation can be adapted to a longer inter-frame distance. tracking.
  • the traditional IOU intersection ratio is to calculate the detection frame of the nth frame and the detection frame of the n+1th frame. Specifically, the intersection area of the detection frame of the nth frame and the detection frame of the n+1th frame is calculated. The combined area of the detection frame of the n frame and the detection frame of the n+1th frame is measured by the ratio of the intersection area to the combined area.
  • the traditional IOU intersection ratio has a large error in the tracking and detection of multiple targets.
  • the detection frame of the same target in two adjacent frames may change in position and size. At this time, the IOU intersection ratio changes sharply, which is more prone to false detection.
  • the above-mentioned correlation degree may be the sum of the frame correlation degree and the feature correlation degree. In some possible embodiments, the above-mentioned correlation degree may also be the weighted sum of the frame correlation degree and the feature correlation degree. The weighting coefficient can be determined according to actual needs.
  • the detection frame parameters may include detection frame center point coordinates, detection frame area, and detection frame aspect ratio
  • the prediction frame parameters may include prediction frame center point coordinates, prediction frame area, and prediction frame aspect ratio.
  • the above detection frame parameters can be obtained by converting det(x, y, w, h), and the above prediction frame parameters can be obtained by converting pre(x, y, w, h). Specifically, det(x, y , w, h) into det(x, y, s, r), and pre(x, y, s, r) into, where s represents the area of the box and r represents the aspect ratio of the box.
  • the n+1th frame can be calculated by the detection frame parameter det(x, y, s, r) corresponding to the nth frame image and the prediction frame parameter pre(x, y, s, r) corresponding to the n+1th frame image
  • the frame correlation between the target detection frame information corresponding to the frame image and the target prediction frame information corresponding to the nth frame image can be calculated by the detection frame parameter det(x, y, s, r) corresponding to the nth frame image and the prediction frame parameter pre(x, y, s, r) corresponding to the n+1th frame image.
  • FIG. 2 is a flowchart of a frame correlation calculation method provided by an embodiment of the present invention. As shown in FIG. 2, the following steps are included:
  • the above-mentioned distance correlation can be calculated by Euclidean distance.
  • the parameters of the detection frame are det(xdet, ydet, sdet, rdet)
  • the parameters of the prediction frame are pre(xpre, ypre, spre, rpre)
  • the coordinates of the center point of the detection frame are (xdet, ydet)
  • the coordinates of the center point of the prediction frame are (xpre, ypre).
  • the Euclidean distance can be calculated by the following formula:
  • dis represents the Euclidean distance from the coordinates of the center point of the detection frame to the coordinates of the center point of the prediction frame.
  • the distance correlation can be calculated by the following formula:
  • dis_pos is the distance correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image
  • max_dis is a preset distance threshold, which can be set to 0.2 or so.
  • the area correlation degree can be calculated by the area ratio
  • the area of the detection frame is s det
  • the area of the prediction frame is s pre
  • the area ratio can be calculated by the following formula:
  • size represents the area ratio of the detection frame area to the prediction frame area.
  • the area correlation can be calculated by the following formula:
  • dis_size represents the area correlation between the target detection frame information corresponding to the n+1 frame image and the target prediction frame information corresponding to the n frame image
  • max_size is a preset area ratio threshold, which can be set. is around 1.8.
  • the shape correlation degree may be calculated by the shape ratio of the aspect ratio of the detection frame and the aspect ratio of the prediction frame.
  • the aspect ratio of the detection frame is r det
  • the aspect ratio of the prediction frame is r pre , which can be
  • the shape ratio is calculated by the following formula:
  • ratio represents the shape ratio of the aspect ratio of the detection frame and the aspect ratio of the prediction frame.
  • Shape correlation can be calculated by the following formula:
  • dis_ratio represents the shape correlation between the target detection frame information corresponding to the n+1 frame image and the target prediction frame information corresponding to the n frame image
  • max_ratio is a preset shape ratio threshold, which can be set. is around 1.8.
  • the above-mentioned frame similarity may be the sum or weighted sum of distance correlation, area correlation, and shape correlation, and the specific weighting coefficient may be determined according to actual needs.
  • the frame correlation degree and the feature correlation degree are summed to obtain the correlation degree between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image. Specifically, it can be shown in the following formula:
  • dis_all represents the degree of correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image.
  • a track may be formed by a plurality of continuous target detection frame information, and after the current track ends, all disconnected tracks that have ended may be traversed to reconnect.
  • Check box information for reconnection may be used to reconnect.
  • FIG. 3 is a flowchart of a trajectory reconnection method provided by an embodiment of the present invention. As shown in FIG. 3, the following steps are included:
  • the quality evaluation of the target detection frame information in each track may be performed, so that the target detection frame information with the highest quality score is selected as the representative detection frame information.
  • first representative detection frame information and second representative detection frame information are used to distinguish whether the representative detection frame information belongs to the first disconnection track or the second disconnection track.
  • the correlation between the first representative detection frame information and the second representative detection frame information is the correlation between the first disconnection track and the second disconnection track.
  • the correlation between the first representative detection frame information and the second representative detection frame information reference may be made to step 102 for details.
  • the first disconnected track when the correlation between the first disconnected track and the second disconnected track is greater than or equal to the second preset correlation threshold, it means that the first disconnected track and the second disconnected track are the same target. track, the first disconnected track can be reconnected with the second disconnected track.
  • the first disconnected track is reconnected with the second disconnected track.
  • the above-mentioned image frame identifiers may be the frame numbers of the image frames. If they have the same image frame identifiers, it means that the first disconnection track and the second disconnection track overlap and do not belong to the disconnection track of the same target.
  • the above-mentioned first disconnected trajectory of the current target and the above-mentioned second disconnected trajectory is greater than or equal to the above-mentioned second preset correlation threshold, then the above-mentioned first disconnected trajectory of the above-mentioned current target and the above-mentioned The second disconnected track is reconnected to obtain a first reconnection track; the first reconnection track of the current target is filtered according to a preset filtering rule to obtain a second reconnection track as a reconnection result.
  • the above filtering rules include: judging whether the length of the first reconnection track reaches a preset length, judging whether the image quality of the first reconnection track reaches the preset image quality, and judging whether the target size in the first reconnection track reaches a preset length. at least one of the target sizes of .
  • some shorter first reconnection trajectories can be filtered out according to the track length.
  • the targets corresponding to the shorter first reconnection trajectories are usually difficult to be classified into other trajectories, and will greatly affect the tracking performance. Effect.
  • the first reconnection track whose overall image quality is lower than the preset image quality may be filtered out, which will improve the tracking accuracy to a certain extent.
  • the target detection frame information and target prediction frame information of each frame image in the image sequence to be processed are extracted; the target detection frame information corresponding to the n+1th frame image and the target prediction frame corresponding to the nth frame image are extracted.
  • the frame information is matched, and it is judged whether the matching result includes the disconnected track; if the disconnected track exists, the disconnected track is reconnected to obtain the reconnection result, and based on the reconnection result, the corresponding target tracking sequence.
  • the detection frame has a priori information during the trajectory tracking process, which improves the accuracy of the detection frame, and can reconnect the disconnected trajectories to improve the detection accuracy of target tracking.
  • target tracking method provided by the embodiment of the present invention can be applied to devices such as mobile phones, monitors, computers, servers, etc. that can perform target tracking.
  • FIG. 4 is a schematic structural diagram of a target tracking device provided by an embodiment of the present invention. As shown in FIG. 4, the device includes:
  • the extraction module 401 is used to extract the target detection frame information and target prediction frame information of each frame of image in the image sequence to be processed;
  • the matching module 402 is used to match the target detection frame information corresponding to the n+1th frame image with the target prediction frame information corresponding to the nth frame image, and determine whether the matching result includes a disconnected track;
  • the reconnection module 403 is configured to reconnect the disconnected track if there is the disconnected track to obtain a reconnection result, and obtain a corresponding target tracking sequence based on the reconnection result.
  • the matching module 402 includes:
  • the first calculation sub-module 4021 is used to calculate the correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image, and determine all the matching conditions are satisfied. Whether there is a one-to-one correspondence between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image, the matching condition is that the correlation degree is greater than or equal to a first preset correlation degree threshold;
  • the first matching sub-module 4022 is configured to match if there is a one-to-one correspondence between the target detection frame information corresponding to the n+1th frame image that meets the matching condition and the target prediction frame information corresponding to the nth frame image. is a continuous trajectory;
  • the second matching sub-module 4023 is configured to match if there is no one-to-one correspondence between the target detection frame information corresponding to the n+1 th frame image that satisfies the matching condition and the target prediction frame information corresponding to the n th frame image to break the track.
  • the target detection frame information includes detection frame parameters and detection frame images
  • the target prediction frame information includes prediction frame parameters and prediction frame images
  • the correlation includes frame correlation and features.
  • the first calculation sub-module 4021 includes:
  • the first calculation unit 40211 is configured to calculate the difference between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image according to the detection frame parameter and the prediction frame parameter.
  • the second calculation unit 40212 is configured to calculate the difference between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image according to the detection frame image and the prediction frame image feature correlation between
  • the third calculation unit 40213 is configured to calculate the difference between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image according to the frame correlation degree and the feature correlation degree correlation between.
  • the detection frame parameters include the coordinates of the center point of the detection frame, the area of the detection frame, and the aspect ratio of the detection frame
  • the prediction frame parameters include the coordinates of the center point of the prediction frame, the area of the prediction frame, and the area of the prediction frame.
  • Aspect ratio the first calculation unit 40211 includes:
  • the first calculation subunit 402111 is used to calculate the target detection frame information corresponding to the n+1th frame image and the nth frame image corresponding to the center point coordinates of the detection frame and the prediction frame center point coordinates. Distance correlation between target prediction frame information;
  • the second calculation subunit 402112 is configured to calculate the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image according to the detection frame area and the prediction frame area The area correlation between;
  • the third calculation subunit 402113 is configured to calculate the target detection frame information corresponding to the n+1 th frame image and the n th frame image according to the detection frame aspect ratio and the prediction frame aspect ratio. Shape correlation between target prediction box information;
  • the fourth calculation subunit 402114 is configured to calculate the target detection frame information corresponding to the n+1 th frame image and the n th frame image based on the distance correlation degree, the area correlation degree and the shape correlation degree The box correlation between the corresponding target prediction box information.
  • the reconnection module 403 includes:
  • the extraction sub-module 4031 is used to extract the first representative detection frame information and the second representative detection frame information in the first disconnected track and the second disconnected track respectively;
  • the second calculation sub-module 4032 is configured to calculate the correlation between the first disconnection trajectory and the second disconnection trajectory according to the first representative detection frame information and the second representative detection frame information;
  • the first reconnection sub-module 4033 is configured to, when the correlation between the first disconnected track and the second disconnected track is greater than or equal to a second preset correlation threshold, compare the first disconnected track with the second disconnected track.
  • the second disconnected trajectory is reconnected.
  • the reconnection module 403 further includes:
  • Judging sub-module 4034 for judging whether there is the same image frame identifier in the first disconnection track and the second disconnection track;
  • the second reconnection sub-module 4035 is configured to, if the first disconnected track and the second disconnected track do not have the same image frame identifier, perform a reconnection between the first disconnected track and the second disconnected track. Open the track to reconnect.
  • the first reconnection submodule 4033 includes:
  • the reconnection unit 40331 is configured to, if the correlation between the first disconnected trajectory and the second disconnected trajectory of the current target is greater than or equal to the second preset correlation threshold, reconnect all the current target’s
  • the first disconnected trajectory is reconnected with the second disconnected trajectory to obtain a first reconnected trajectory
  • the filtering unit 40332 is configured to filter the first reconnection trajectory of the current target according to a preset filtering rule to obtain a second reconnection trajectory as a reconnection result.
  • the filtering rule includes: judging whether the length of the first reconnection track has reached a preset length, judging whether the image quality of the first reconnection track has reached a preset image quality, and judging whether the first reconnection track has reached a preset image quality. Whether the target size in the even trajectory reaches at least one of the preset target sizes.
  • target tracking apparatus provided by the embodiment of the present invention can be applied to devices such as mobile phones, monitors, computers, servers, etc., which can perform target tracking.
  • the target tracking device provided in the embodiment of the present invention can implement each process implemented by the target tracking method in the above method embodiments, and can achieve the same beneficial effects. To avoid repetition, details are not repeated here.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 11 , it includes: a memory 1102 , a processor 1101 , and a memory 1102 and a processor 1101 .
  • a computer program running on 1101 where:
  • the processor 1101 is used for calling the computer program stored in the memory 1102, and performs the following steps:
  • the described target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image that the processor 1101 executes are matched, including:
  • the matching is a continuous trajectory
  • the matching is a disconnected trajectory.
  • the target detection frame information includes a detection frame parameter and a detection frame image
  • the target prediction frame information includes a prediction frame parameter and a prediction frame image
  • the correlation executed by the processor 1101 includes frame correlation and feature correlation. degree, the calculating the correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image, including:
  • the detection frame parameter and the prediction frame parameter calculate the frame correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image;
  • the detection frame image and the prediction frame image calculate the feature correlation between the target detection frame information corresponding to the n+1th frame image and the target prediction frame information corresponding to the nth frame image;
  • the correlation degree between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image is calculated.
  • the parameters of the detection frame include the coordinates of the center point of the detection frame, the area of the detection frame, and the aspect ratio of the detection frame
  • the parameters of the prediction frame include the coordinates of the center point of the prediction frame, the area of the prediction frame, and the aspect ratio of the prediction frame
  • the processor 1101 calculates the frame between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image Relevance, including:
  • the detection frame area and the prediction frame area calculate the area correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image;
  • the aspect ratio of the detection frame and the aspect ratio of the prediction frame calculate the shape correlation between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image Spend;
  • the area correlation degree and the shape correlation degree calculate the difference between the target detection frame information corresponding to the n+1 th frame image and the target prediction frame information corresponding to the n th frame image Box correlation.
  • the processor 1101 performs reconnection to the disconnected track, including:
  • the correlation between the first disconnection trajectory and the second disconnection trajectory is calculated and obtained
  • processor 1101 further executes the steps of:
  • first disconnected track and the second disconnected track do not have the same image frame identifier, the first disconnected track and the second disconnected track are reconnected.
  • the first disconnected track is reconnecting with the second disconnected trajectory, including:
  • the first disconnected trajectory of the current target and the second disconnected trajectory are compared with each other.
  • the second disconnection trajectory is reconnected to obtain the first reconnection trajectory
  • the filtering rule includes: judging whether the length of the first reconnection track has reached a preset length, judging whether the image quality of the first reconnection track has reached a preset image quality, and judging whether the first reconnection track has reached a preset image quality. Whether the target size in the even trajectory reaches at least one of the preset target sizes.
  • Embodiments of the present invention also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例提供一种目标跟踪方法、装置、电子设备及存储介质,所述方法包括:提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。通过预测框信息与检测框信息进行匹配,使得在轨迹跟踪过程中,检测框具有一个先验信息,提高检测框的准确度,可以将断开轨迹进行重连,提高目标跟踪的检测准确率。

Description

目标跟踪方法、装置、电子设备及存储介质
本申请要求于2021年6月7日提交中国专利局,申请号为202110630720.6、发明名称为“目标跟踪方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及人工智能领域,尤其涉及一种目标跟踪方法、装置、电子设备及存储介质。
背景技术
随着人口的不断增加和城市的急剧扩张,智能视频监控变得至关重要。所谓的智能视频监督,即前端摄像头可以执行一些检测和跟踪等任务,直接将我们需要的数据提取出来并保存。虽然当前已经有多目标跟踪模型,但是这些模型大多依赖于帧间IOU和匈牙利匹配来完成对目标轨迹的跟踪。在真实场景中,目标发生遮挡或者受到干扰的情况下,IOU变得不再可靠,可能出现多个检测框之间都具有类似的IOU值,尤其是在比较拥挤的场景中,检测错误率较高。除此之外,由于匈牙利匹配是一个最优匹配算法,在这种情况下它仍然会选择一个检测框作为最终的结果,即最优检测框是一个错误检测框时,仍然会选择该最优检测框,导致选择出的检测框为错误的检测框,会给后续的跟踪算法带来更大的误差。因此,现有的目标跟踪算法存在检测准确率的问题。
发明内容
本发明实施例提供一种目标跟踪方法,能够提高多目标跟踪的检测准确率。
第一方面,本发明实施例提供一种目标跟踪方法,所述方法包括:
提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;
将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;
若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基 于所述重连结果,得到对应的目标跟踪序列。
可选的,所述将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,包括:
计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度,并判断满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是否一一对应,所述匹配条件为所述相关度大于等于第一预设相关度阈值;
若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是一一对应,则匹配为连续轨迹;
若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间不是一一对应,则匹配为断开轨迹。
可选的,所述目标检测框信息包括检测框参数以及检测框图像,所述目标预测框信息包括预测框参数以及预测框图像,所述相关度包括框相关度与特征相关度,所述计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度,包括:
根据所述检测框参数与所述预测框参数,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度;
根据所述检测框图像与所述预测框图像,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的特征相关度;
根据所述框相关度与所述特征相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度。
可选的,所述检测框参数包括检测框中心点坐标、检测框面积、检测框高宽比,所述预测框参数包括预测框中心点坐标、预测框面积、预测框高宽比,所述根据所述检测框参数与所述预测框参数,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度,包括:
根据所述检测框中心点坐标以及所述预测框中心点坐标,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的距离相关度;
根据所述检测框面积以及所述预测框面积,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的面积相关度;
根据所述检测框高宽比以及所述预测框高宽比,计算所述第n+1帧图像对 应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的形状相关度;
基于所述距离相关度、所述面积相关度以及所述形状相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度。
可选的,所述若存在所述断开轨迹,则对所述断开轨迹进行重连,包括:
分别提取第一断开轨迹与第二断开轨迹中的第一代表检测框信息与第二代表检测框信息;
根据所述第一代表检测框信息与所述第二代表检测框信息,计算得到所述第一断开轨迹与所述第二断开轨迹的相关度;
当所述第一断开轨迹与所述第二断开轨迹的相关度大于等于第二预设相关度阈值时,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
可选的,所述方法还包括:
判断所述第一断开轨迹与所述第二断开轨迹中是否存在相同的图像帧标识;
若所述第一断开轨迹与所述第二断开轨迹中不存在相同的图像帧标识,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
可选的,所述当所述第一断开轨迹与所述第二断开轨迹的相关度大于等于第二预设相关度阈值时,则对所述第一断开轨迹与所述第二断开轨迹进行重连,包括:
若当前目标的所述第一断开轨迹与所述第二断开轨迹的相关度大于等于所述第二预设相关度阈值时,则将所述当前目标的所述第一断开轨迹与所述第二断开轨迹进行重连,得到第一重连轨迹;
按预设的过滤规则,对所述当前目标的所述第一重连轨迹进行过滤,得到第二重连轨迹作为重连结果。
可选的,所述过滤规则包括:判断所述第一重连轨迹的长度是否达到预设长度、判断所述第一重连轨迹的图像质量是否达到预设图像质量、判断所述第一重连轨迹中的目标大小是否达到预设的目标大小中的至少一项。
第二方面,本发明实施例还提供一种目标跟踪装置,所述装置包括:
提取模块,用于提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;
匹配模块,用于将第n+1帧图像对应的目标检测框信息与第n帧图像对应 的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;
重连模块,用于若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。
第三方面,本发明实施例提供一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本发明实施例提供的目标跟踪方法中的步骤。
第四方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现发明实施例提供的目标跟踪方法中的步骤。
本发明实施例中,提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。通过预测框信息与检测框信息进行匹配,使得在轨迹跟踪过程中,检测框具有一个先验信息,提高检测框的准确度,可以将断开轨迹进行重连,提高目标跟踪的检测准确率。
附图说明
图1是本发明实施例提供的一种目标跟踪方法的流程图;
图2是本发明实施例提供的一种框相关度计算方法的流程图;
图3是本发明实施例提供的一种轨迹重连方法的流程图;
图4是本发明实施例提供的一种目标跟踪装置的结构示意图;
图5是本发明实施例提供的一种匹配模块的结构示意图;
图6是本发明实施例提供的一种第一计算子模块的结构示意图;
图7是本发明实施例提供的一种第一计算单元的结构示意图;
图8是本发明实施例提供的一种重连模块的结构示意图;
图9是本发明实施例提供的另一种重连模块的结构示意图;
图10是本发明实施例提供的一种第一重连子模块的结构示意图;
图11是本发明实施例提供的一种电子设备的结构示意图。
具体实施方式
请参见图1,图1是本发明实施例提供的一种目标跟踪方法的流程图,如图1所示,包括以下步骤:
101、提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息。
在本发明实施例中,上述待处理图像序列可以是摄像头实时抓拍的视频图像,比如通过安装在特定监控场景的摄像头实时抓拍该特定监控场景的视频图像,进一步的,摄像头可以设置在该特定监控场景的一定高度之处,对该特定监控场景中的目标进行实时抓拍。也可以是用户上传的视频图像,上述图像序列指的是按时序获取的帧图像。
上述待处理图像序列中包括待跟踪目标,上述待跟踪目标可以是运动目标,上述的运动目标可以是行人、车辆、动物等可以产生运动轨迹的目标。上述待跟踪目标可以是一个或多个。
上述的目标检测框信息可以通过目标检测网络来对待跟踪目标进行目标检测,上述目标检测网络为已经训练好的,上述目标检测网络可以为用户通过样本目标数据集进行训练得到,也可以是下载获取目标检测网络的网络结构与参数,通过样本目标数据集进行微调训练后得到。
在本发明实施例中,上述目标检测网络的输入为待处理图像序列中的帧图像,输出为对应帧图像中待跟踪目标的检测框信息,上述目标检测网络输出的检测框信息可以包括待跟踪目标在对应帧图像的位置信息和置信度信息。上述位置信息可以是det(x,y,w,h)格式的信息,其中,上述的x和y表示检测框在对应帧图像中的中心点坐标,上述w和h分别表示检测框在对应帧图像中的宽和高。上述的置信度信息用于表示检测框内的图像内容为待跟踪目标的可信程度,置信度越高,则检测框内的图像内容为待跟踪目标的可信程度越高。上述目标检测网络可以是基于CenterNet目标检测算法进行构建的网络。
在一种可能的实施例中,可以对上述摄像头抓拍的视频图像或用户上传的视频图像进行预设处理,上述预设处理可以是对帧图像进行抽取,在视频图像中,每隔预设帧数取一帧作为待处理图像中的帧图像,这样,可以减少相邻帧之间的冗余性,提高目标跟踪的计算速度。上述预设帧数可以是根据用户需要进行设定,本发明实施例中预设帧数为4帧,即每隔4帧取一帧作为待处理图像中的帧图像。
在本发明实施例中,上述的目标预测框信息可以通过目标预测网络来对待跟踪目标进行目标位置预测,上述目标预测网络为已经训练好的,具体可以是 用户自行训练得到,也可以是下载获取目标检测网络的网络结构与参数,通过样本目标数据集进行微调训练后得到,上述目标预测网络可以是基于卡尔曼滤波算法进行构建的网络。
在本发明实施例中,上述目标预测网络的输入为待处理图像序列中的帧图像,输出为对应帧图像中待跟踪目标在下一帧中的预测框信息,上述目标预测网络输出的预测框信息可以包括待跟踪目标在下一帧图像的位置信息和置信度信息。上述位置信息可以是pre(x,y,w,h)格式的信息,其中,上述的x和y表示检测框在下一帧图像中的中心点坐标,上述w和h分别表示检测框在下一帧图像中的宽和高。
可以理解的是,通过目标检测网络和目标预测网络,对于第n帧图像作为输入的情况,会输出第n帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息;对于第n+1帧图像作为输入的情况,会输出第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标预测框信息。其中,第n帧图像对应的目标预测框信息可以理解为是对第n+1帧图像对应的目标检测框信息的预测,第n+1帧图像对应的目标预测框信息可以理解为是对第n+2帧图像对应的目标检测框信息的预测。
102、将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹。
在本发明实施例中,第n帧图像对应的目标预测框信息可以理解为是对第n+1帧图像对应的目标检测框信息的预测,上述将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配的目的,可以理解为检测结果是否与预测结果相同或相近,进而判断是否发生误检。具体的,上述匹配结果包括连续轨迹与断开轨迹,当第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息匹配时,则说明没有发生误检,匹配结果为连续轨迹,即将第n+1帧图像对应的目标检测框信息加入之前轨迹中;当第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息不匹配时,则说明发生误检,匹配结果为断开轨迹,即断开第n+1帧图像对应的目标检测框信息与之前轨迹的连接,并以第n+1帧图像对应的目标检测框信息为新轨迹的起点,此时,第n帧图像对应的目标检测框信息作为之前轨迹的终点。
进一步的,可以计算第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的相关度,并判断满足匹配条件的第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间是否一一对应,匹 配条件为相关度大于等于第一预设相关度阈值;若满足匹配条件的第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间是一一对应,则匹配为连续轨迹;若满足匹配条件的第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间不是一一对应,则匹配为断开轨迹。
可以理解的是,通过判断第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间是否一一对应,可以进一步判断是否存在误检,比如,一个第n+1帧图像对应的目标检测框信息与多个第n帧图像对应的目标预测框信息之间的相关度满足匹配条件,则可以说明一个第n+1帧图像对应的目标检测框信息匹配到多个第n帧图像对应的目标预测框信息情况。或者一个第n帧图像对应的目标预测框信息与多个第n+1帧图像对应的目标检测框信息之间的相关度满足匹配条件,则可以说明一个第n帧图像对应的目标预测框信息匹配到多个第n+1帧图像对应的目标检测框信息情况。上述两种情况即为误检情况,此时匹配为断开轨迹。只有在一个第n帧图像对应的目标预测框信息仅与一个第n+1帧图像对应的目标检测框信息之间的相关度满足匹配条件时(即一一对应),可以认为是不存在误检情况,并分配为同一个ID。
进一步的,上述目标检测框信息包括检测框参数以及检测框图像,上述目标预测框信息包括预测框参数以及预测框图像,上述相关度包括框相关度与特征相关度。上述检测框参数用于表示目标检测框在对应帧图像中位置、形状和大小,上述检测框图像用于表示目标检测框在对应帧图像中的检测框内容(也可以称为目标图像)。对应的,上述预测框参数用于表示目标预测框在对应帧图像中的位置、形状和大小,上述预测框图像用于表示目标预测框在对应帧图像中的预测框内容(也可以称为目标预测图像)。
具体的,可以根据上述检测框参数与上述预测框参数,计算上述第n+1帧图像对应的目标检测框信息与上述第n帧图像对应的目标预测框信息之间的框相关度;根据上述检测框图像与上述预测框图像,计算上述第n+1帧图像对应的目标检测框信息与上述第n帧图像对应的目标预测框信息之间的特征相关度;根据上述框相关度与上述特征相关度,计算上述第n+1帧图像对应的目标检测框信息与上述第n帧图像对应的目标预测框信息之间的相关度。
上述的检测框图像可以根据目标检测框信息在对应帧图像中进行获取,具体可以根据det(x,y,w,h)格式的位置信息在对应帧图像中获取检测框图像,比如第n帧的检测框图像可以根据det(x,y,w,h)格式的位置信息在 第n帧图像中进行获取。类似的,上述的预测框图像可以根据目标预测框信息在对应帧图像中进行获取,具体可以根据pre(x,y,w,h)格式的位置信息在对应帧图像中获取预测框图像,比如第n帧的预测框图像可以根据pre(x,y,w,h)格式的位置信息在第n+1帧图像中进行获取。
可选的,在提取到检测框图像与预测框图像后,可以通过特征提取网络对检测框图像与预测框图像进行特征提取,得到检测框图像特征与预测框图像特征,计算检测框图像特征与预测框图像特征之间的相似度作为特征相关度。另外,在提取到检测框图像与预测框图像后,可以对检测框图像与预测框图像进行尺寸调整,将检测框图像与预测框图像调整到预测尺寸,比如调整到256×128的尺寸大小。上述的特征提取网络可以是基于Re-ID网络进行构建的,对Re-ID网络进行轻量化后得到本发明实施例的特征提取网络。分别将检测框图像与预测框图像输入到上述特征提取网络中来提取图像特征,上述特征提取网络的表达可如下所示:
f=F(θ b)
其中,上述f为图像特征,上述F为特征提取网络,上述θ b为特征提取网络的参数。
上述框相关度用于表示目标预测框与目标检测框的在形状和距离两个维度上的相关程度,相较于传统的IOU交并比计算而言,框相关度可以适应更长帧间距离的跟踪。传统的IOU交并比是对第n帧的检测框与第n+1帧的检测框进行计算,具体是计算第n帧的检测框与第n+1帧的检测框的相交面积,计算第n帧的检测框与第n+1帧的检测框的合并面积,通过相交面积与合并面积的比值来衡量是否为同一个目标,可以看出,当两个不同目标的检测框的距离很近且大小也相近时,也会符合同一个目标的判定,因此传统IOU交并比在多个目标的跟踪检测方面误差很大。同时,当帧间距离增大时,同一个目标在相邻两帧图像中的检测框可能存在位置发生变化和大小发生变化,此时的IOU交并比急剧变化,更容易发生误检。
在本发明实施例中,上述相关度可以是框相关度与特征相关度的相加和,在一些可能的实施例中,上述相关度还可以是框相关度与特征相关度的加权和,具体的加权系数可以根据实际需要进行确定。
可选的,上述检测框参数可以包括检测框中心点坐标、检测框面积、检测框高宽比,上述预测框参数可以包括预测框中心点坐标、预测框面积、预测框高宽比。上述检测框参数可以通过det(x,y,w,h)进行转换得到,上述预 测框参数可以通过pre(x,y,w,h)进行转换得到,具体的,可以将det(x,y,w,h)转换为det(x,y,s,r),将pre(x,y,s,r)转换为,其中,s表示框的面积,r表示框的高宽比。具体的转换可以是s=w×h,r=w/h。可以通过第n帧图像对应的检测框参数det(x,y,s,r)与第n+1帧图像对应的预测框参数pre(x,y,s,r),来计算第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的框相关度。
具体的,请参见图2,图2是本发明实施例提供的一种框相关度计算方法的流程图,如图2所示,包括以下步骤:
201、根据检测框中心点坐标以及预测框中心点坐标,计算第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的距离相关度。
在本发明实施例中,上述距离相关度可以通过欧式距离进行计算,比如,假设检测框参数为det(xdet,ydet,sdet,rdet),预测框参数为pre(xpre,ypre,spre,rpre),则检测框中心点坐标为(xdet,ydet),预测框中心点坐标为(xpre,ypre),可以通过下述式子计算欧式距离:
dis=(x det-x pre) 2+(y det-y pre) 2
上述式子中,dis表示检测框中心点坐标到预测框中心点坐标的欧式距离。
可以通过下述式子计算距离相关度:
Figure PCTCN2021114904-appb-000001
上述式子中,dis_pos为第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的距离相关度,上述max_dis为一个预设的距离阈值,可以设置为0.2左右。
202、根据检测框面积以及预测框面积,计算第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的面积相关度。
在本发明实施例中,上述面积相关度可以通过面积比来进行计算,上述检测框面积为s det,上述预测框面积为s pre,可以通过下述式子计算面积比:
size=s det/s pre
上述式子中,size表示检测框面积与预测框面积的面积比。
可以通过下述式子计算面积相关度:
dis_size=(size-1.0)/(max_size-1.0)
上述式子中,dis_size表示第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的面积相关度,上述max_size为一个预设的面 积比阈值,可以设置为1.8左右。
203、根据检测框高宽比以及预测框高宽比,计算第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的形状相关度。
在本发明实施例中,上述形状相关度可以通过检测框高宽比以及预测框高宽比的形状比值进行计算,上述检测框高宽比为r det,上述预测框高宽比r pre,可以通过下述式子计算形状比值:
ratio=r det/r pre
上述式子中,ratio表示检测框高宽比以及预测框高宽比的形状比值。
可以通过下述式子计算形状相关度:
dis_ratio=(ratio-1.0)/(max_ratio-1.0)
上述式子中,dis_ratio表示第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的形状相关度,上述max_ratio为一个预设的形状比阈值,可以设置为1.8左右。
204、基于距离相关度、面积相关度以及形状相关度,计算第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的框相关度。
在本发明实施例中,上述的框相似度可以是距离相关度、面积相关度以及形状相关度的相加和或加权和,具体的加权系数可以根据实际需要进行确定。
将框相关度与特征相关度进行求和,得到第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的相关度。具体可以如下述式子所示:
dis_all=dis_pos+dis_size+dis_ratio+dis_feat
上述式子中,dis_all表示第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息之间的相关度。
103、若存在断开轨迹,则对断开轨迹进行重连,得到重连结果,并基于重连结果,得到对应的目标跟踪序列。
在本发明实施例中,一个轨迹可以是由多个连续的目标检测框信息形成的,可以在当前轨迹结束后,遍历所有已经结束的断开轨迹来进行重连,具体可以根据各个轨迹中目标检测框信息来进行重连。
可选的,请参见图3,图3是本发明实施例提供的一种轨迹重连方法的流程图,如图3所示,包括以下步骤:
301、在多个断开轨迹中分别提取第一断开轨迹与第二断开轨迹中的第一代表检测框信息与第二代表检测框信息。
在本发明实施例中,可以对每个轨迹中的目标检测框信息进行质量评估,从而选取质量评分最高的目标检测框信息作为代表检测框信息。需要说明的是,上述第一代表检测框信息与第二代表检测框信息是用于区分该代表检测框信息属于第一断开轨迹或第二断开轨迹的。
302、根据第一代表检测框信息与第二代表检测框信息,计算得到第一断开轨迹与第二断开轨迹的相关度。
在本发明实施例中,上述第一代表检测框信息与第二代表检测框信息之间的相关度即为第一断开轨迹与第二断开轨迹的相关度。上述第一代表检测框信息与第二代表检测框信息之间的相关度具体可以参考步骤102。
303、当第一断开轨迹与第二断开轨迹的相关度大于等于第二预设相关度阈值时,则对第一断开轨迹与所述第二断开轨迹进行重连。
在本发明实施例中,当第一断开轨迹与第二断开轨迹的相关度大于等于第二预设相关度阈值时,则说明第一断开轨迹与第二断开轨迹为同一目标的轨迹,可以将第一断开轨迹与第二断开轨迹进行重连。
可选的,可以先判断第一断开轨迹与第二断开轨迹中是否存在相同的图像帧标识;若第一断开轨迹与第二断开轨迹中不存在相同的图像帧标识,则对第一断开轨迹与第二断开轨迹进行重连。上述图像帧标识可以是图像帧的帧号,如果具有相同的图像帧标识,则说明第一断开轨迹与第二断开轨迹重合,不属于同一个目标的断开轨迹。
可选的,若当前目标的上述第一断开轨迹与上述第二断开轨迹的相关度大于等于上述第二预设相关度阈值时,则将上述当前目标的上述第一断开轨迹与上述第二断开轨迹进行重连,得到第一重连轨迹;按预设的过滤规则,对上述当前目标的上述第一重连轨迹进行过滤,得到第二重连轨迹作为重连结果。
上述的过滤规则包括:判断第一重连轨迹的长度是否达到预设长度、判断第一重连轨迹的图像质量是否达到预设图像质量、判断第一重连轨迹中的目标大小是否达到预设的目标大小中的至少一项。
可选的,可以根据轨迹长度来过滤掉一些较短的第一重连轨迹,较短的第一重连轨迹对应的目标通常难以归类到其它的轨迹中,而且会极大的影响跟踪的效果。
可选的,可以将整体图像质量小于预设图像质量的第一重连轨迹过滤掉,这样会在一定程度上提升跟踪的精度。
可选的,可以将目标尺寸小于预设的目标大小的目标进行过滤,目标尺寸 过小,容易产生误连,将目标尺寸较小的第一重边轨迹过小掉,可以进一步提升跟踪的精度。
在本发明实施例中,提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。通过预测框信息与检测框信息进行匹配,使得在轨迹跟踪过程中,检测框具有一个先验信息,提高检测框的准确度,可以将断开轨迹进行重连,提高目标跟踪的检测准确率。
需要说明的是,本发明实施例提供的目标跟踪方法可以应用于可以进行目标跟踪的手机、监控器、计算机、服务器等设备。
请参见图4,图4是本发明实施例提供的一种目标跟踪装置的结构示意图,如图4所示,所述装置包括:
提取模块401,用于提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;
匹配模块402,用于将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;
重连模块403,用于若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。
可选的,如图5所示,所述匹配模块402,包括:
第一计算子模块4021,用于计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度,并判断满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是否一一对应,所述匹配条件为所述相关度大于等于第一预设相关度阈值;
第一匹配子模块4022,用于若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是一一对应,则匹配为连续轨迹;
第二匹配子模块4023,用于若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间不是一一对应,则匹配为断开轨迹。
可选的,如图6所示,所述目标检测框信息包括检测框参数以及检测框图 像,所述目标预测框信息包括预测框参数以及预测框图像,所述相关度包括框相关度与特征相关度,所述第一计算子模块4021,包括:
第一计算单元40211,用于根据所述检测框参数与所述预测框参数,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度;
第二计算单元40212,用于根据所述检测框图像与所述预测框图像,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的特征相关度;
第三计算单元40213,用于根据所述框相关度与所述特征相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度。
可选的,如图7所示,所述检测框参数包括检测框中心点坐标、检测框面积、检测框高宽比,所述预测框参数包括预测框中心点坐标、预测框面积、预测框高宽比,所述第一计算单元40211,包括:
第一计算子单元402111,用于根据所述检测框中心点坐标以及所述预测框中心点坐标,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的距离相关度;
第二计算子单元402112,用于根据所述检测框面积以及所述预测框面积,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的面积相关度;
第三计算子单元402113,用于根据所述检测框高宽比以及所述预测框高宽比,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的形状相关度;
第四计算子单元402114,用于基于所述距离相关度、所述面积相关度以及所述形状相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度。
可选的,如图8所示,所述重连模块403,包括:
提取子模块4031,用于分别提取第一断开轨迹与第二断开轨迹中的第一代表检测框信息与第二代表检测框信息;
第二计算子模块4032,用于根据所述第一代表检测框信息与所述第二代表检测框信息,计算得到所述第一断开轨迹与所述第二断开轨迹的相关度;
第一重连子模块4033,用于当所述第一断开轨迹与所述第二断开轨迹的 相关度大于等于第二预设相关度阈值时,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
可选的,如图9所示,所述重连模块403还包括:
判断子模块4034,用于判断所述第一断开轨迹与所述第二断开轨迹中是否存在相同的图像帧标识;
第二重连子模块4035,用于若所述第一断开轨迹与所述第二断开轨迹中不存在相同的图像帧标识,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
可选的,如图10所示,所述第一重连子模块4033,包括:
重连单元40331,用于若当前目标的所述第一断开轨迹与所述第二断开轨迹的相关度大于等于所述第二预设相关度阈值时,则将所述当前目标的所述第一断开轨迹与所述第二断开轨迹进行重连,得到第一重连轨迹;
过滤单元40332,用于按预设的过滤规则,对所述当前目标的所述第一重连轨迹进行过滤,得到第二重连轨迹作为重连结果。
可选的,所述过滤规则包括:判断所述第一重连轨迹的长度是否达到预设长度、判断所述第一重连轨迹的图像质量是否达到预设图像质量、判断所述第一重连轨迹中的目标大小是否达到预设的目标大小中的至少一项。
需要说明的是,本发明实施例提供的目标跟踪装置可以应用于可以进行目标跟踪的手机、监控器、计算机、服务器等设备。
本发明实施例提供的目标跟踪装置能够实现上述方法实施例中目标跟踪方法实现的各个过程,且可以达到相同的有益效果。为避免重复,这里不再赘述。
参见图11,图11是本发明实施例提供的一种电子设备的结构示意图,如图11所示,包括:存储器1102、处理器1101及存储在所述存储器1102上并可在所述处理器1101上运行的计算机程序,其中:
处理器1101用于调用存储器1102存储的计算机程序,执行如下步骤:
提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;
将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;
若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。
可选的,处理器1101执行的所述将第n+1帧图像对应的目标检测框信息 与第n帧图像对应的目标预测框信息进行匹配,包括:
计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度,并判断满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是否一一对应,所述匹配条件为所述相关度大于等于第一预设相关度阈值;
若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是一一对应,则匹配为连续轨迹;
若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间不是一一对应,则匹配为断开轨迹。
可选的,所述目标检测框信息包括检测框参数以及检测框图像,所述目标预测框信息包括预测框参数以及预测框图像,处理器1101执行的所述相关度包括框相关度与特征相关度,所述计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度,包括:
根据所述检测框参数与所述预测框参数,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度;
根据所述检测框图像与所述预测框图像,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的特征相关度;
根据所述框相关度与所述特征相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度。
可选的,所述检测框参数包括检测框中心点坐标、检测框面积、检测框高宽比,所述预测框参数包括预测框中心点坐标、预测框面积、预测框高宽比,处理器1101执行的所述根据所述检测框参数与所述预测框参数,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度,包括:
根据所述检测框中心点坐标以及所述预测框中心点坐标,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的距离相关度;
根据所述检测框面积以及所述预测框面积,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的面积相关度;
根据所述检测框高宽比以及所述预测框高宽比,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的形状相关度;
基于所述距离相关度、所述面积相关度以及所述形状相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度。
可选的,处理器1101执行的所述若存在所述断开轨迹,则对所述断开轨迹进行重连,包括:
分别提取第一断开轨迹与第二断开轨迹中的第一代表检测框信息与第二代表检测框信息;
根据所述第一代表检测框信息与所述第二代表检测框信息,计算得到所述第一断开轨迹与所述第二断开轨迹的相关度;
当所述第一断开轨迹与所述第二断开轨迹的相关度大于等于第二预设相关度阈值时,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
可选的,处理器1101还执行包括:
判断所述第一断开轨迹与所述第二断开轨迹中是否存在相同的图像帧标识;
若所述第一断开轨迹与所述第二断开轨迹中不存在相同的图像帧标识,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
可选的,处理器1101执行的所述当所述第一断开轨迹与所述第二断开轨迹的相关度大于等于第二预设相关度阈值时,则对所述第一断开轨迹与所述第二断开轨迹进行重连,包括:
若当前目标的所述第一断开轨迹与所述第二断开轨迹的相关度大于等于所述第二预设相关度阈值时,则将所述当前目标的所述第一断开轨迹与所述第二断开轨迹进行重连,得到第一重连轨迹;
按预设的过滤规则,对所述当前目标的所述第一重连轨迹进行过滤,得到第二重连轨迹作为重连结果。
可选的,所述过滤规则包括:判断所述第一重连轨迹的长度是否达到预设长度、判断所述第一重连轨迹的图像质量是否达到预设图像质量、判断所述第一重连轨迹中的目标大小是否达到预设的目标大小中的至少一项。
本发明实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例提供的目标跟踪方法的各个过程,且能达到相同的技术效果,这里不再赘述。

Claims (11)

  1. 一种目标跟踪方法,其特征在于,包括以下步骤:
    提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;
    将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;
    若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。
  2. 如权利要求1所述的方法,其特征在于,所述将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,包括:
    计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度,并判断满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是否一一对应,所述匹配条件为所述相关度大于等于第一预设相关度阈值;
    若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间是一一对应,则匹配为连续轨迹;
    若满足匹配条件的所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间不是一一对应,则匹配为断开轨迹。
  3. 如权利要求2所述的方法,其特征在于,所述目标检测框信息包括检测框参数以及检测框图像,所述目标预测框信息包括预测框参数以及预测框图像,所述相关度包括框相关度与特征相关度,所述计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度,包括:
    根据所述检测框参数与所述预测框参数,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度;
    根据所述检测框图像与所述预测框图像,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的特征相关度;
    根据所述框相关度与所述特征相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的相关度。
  4. 如权利要求3所述的方法,其特征在于,所述检测框参数包括检测框中心点坐标、检测框面积、检测框高宽比,所述预测框参数包括预测框中心点坐标、预测框面积、预测框高宽比,所述根据所述检测框参数与所述预测框参数,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度,包括:
    根据所述检测框中心点坐标以及所述预测框中心点坐标,计算所述第n+1 帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的距离相关度;
    根据所述检测框面积以及所述预测框面积,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的面积相关度;
    根据所述检测框高宽比以及所述预测框高宽比,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的形状相关度;
    基于所述距离相关度、所述面积相关度以及所述形状相关度,计算所述第n+1帧图像对应的目标检测框信息与所述第n帧图像对应的目标预测框信息之间的框相关度。
  5. 如权利要求1至4中任一所述的方法,其特征在于,所述若存在所述断开轨迹,则对所述断开轨迹进行重连,包括:
    分别提取第一断开轨迹与第二断开轨迹中的第一代表检测框信息与第二代表检测框信息;
    根据所述第一代表检测框信息与所述第二代表检测框信息,计算得到所述第一断开轨迹与所述第二断开轨迹的相关度;
    当所述第一断开轨迹与所述第二断开轨迹的相关度大于等于第二预设相关度阈值时,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:
    判断所述第一断开轨迹与所述第二断开轨迹中是否存在相同的图像帧标识;
    若所述第一断开轨迹与所述第二断开轨迹中不存在相同的图像帧标识,则对所述第一断开轨迹与所述第二断开轨迹进行重连。
  7. 如权利要求5所述的方法,其特征在于,所述当所述第一断开轨迹与所述第二断开轨迹的相关度大于等于第二预设相关度阈值时,则对所述第一断开轨迹与所述第二断开轨迹进行重连,包括:
    若当前目标的所述第一断开轨迹与所述第二断开轨迹的相关度大于等于所述第二预设相关度阈值时,则将所述当前目标的所述第一断开轨迹与所述第二断开轨迹进行重连,得到第一重连轨迹;
    按预设的过滤规则,对所述当前目标的所述第一重连轨迹进行过滤,得到第二重连轨迹作为重连结果。
  8. 如权利要求7所述的方法,其特征在于,所述过滤规则包括:判断所述第一重连轨迹的长度是否达到预设长度、判断所述第一重连轨迹的图像质量是 否达到预设图像质量、判断所述第一重连轨迹中的目标大小是否达到预设的目标大小中的至少一项。
  9. 一种目标跟踪装置,其特征在于,所述装置包括:
    提取模块,用于提取待处理图像序列中每一帧图像的目标检测框信息与目标预测框信息;
    匹配模块,用于将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配,并判断匹配结果中是否包括断开轨迹;
    重连模块,用于若存在所述断开轨迹,则对所述断开轨迹进行重连,得到重连结果,并基于所述重连结果,得到对应的目标跟踪序列。
  10. 一种电子设备,其特征在于,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至8中任一项所述的目标跟踪方法中的步骤。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8中任一项所述的目标跟踪方法中的步骤。
PCT/CN2021/114904 2020-12-17 2021-08-27 目标跟踪方法、装置、电子设备及存储介质 WO2022127180A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202011498405.4 2020-12-17
CN202011498405.4A CN112634326A (zh) 2020-12-17 2020-12-17 目标跟踪方法、装置、电子设备及存储介质
CN202110630720.6 2021-06-07
CN202110630720.6A CN113284168A (zh) 2020-12-17 2021-06-07 目标跟踪方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022127180A1 true WO2022127180A1 (zh) 2022-06-23

Family

ID=75316489

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114904 WO2022127180A1 (zh) 2020-12-17 2021-08-27 目标跟踪方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (2) CN112634326A (zh)
WO (1) WO2022127180A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063453A (zh) * 2022-06-24 2022-09-16 南京农业大学 植物叶片气孔个体行为检测分析方法、系统及存储介质
CN115695818A (zh) * 2023-01-05 2023-02-03 广东瑞恩科技有限公司 一种基于物联网的园区智能监控数据的高效管理方法
CN115965657A (zh) * 2023-02-28 2023-04-14 安徽蔚来智驾科技有限公司 目标跟踪方法、电子设备、存储介质及车辆
CN117457193A (zh) * 2023-12-22 2024-01-26 之江实验室 一种基于人体关键点检测的体态健康监测方法及系统
CN117671296A (zh) * 2023-12-19 2024-03-08 珠海市欧冶半导体有限公司 目标跟踪法、装置、计算机设备和存储介质
CN117876416A (zh) * 2024-03-12 2024-04-12 浙江芯昇电子技术有限公司 多目标跟踪方法、装置、设备及存储介质
CN117876416B (zh) * 2024-03-12 2024-06-04 浙江芯昇电子技术有限公司 多目标跟踪方法、装置、设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634326A (zh) * 2020-12-17 2021-04-09 深圳云天励飞技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN113223051A (zh) * 2021-05-12 2021-08-06 北京百度网讯科技有限公司 轨迹优化方法、装置、设备、存储介质以及程序产品
CN113989696B (zh) * 2021-09-18 2022-11-25 北京远度互联科技有限公司 目标跟踪方法、装置、电子设备及存储介质
CN113989694B (zh) * 2021-09-18 2022-10-14 北京远度互联科技有限公司 目标跟踪方法、装置、电子设备及存储介质
CN113989695B (zh) * 2021-09-18 2022-05-20 北京远度互联科技有限公司 目标跟踪方法、装置、电子设备及存储介质
CN114882349B (zh) * 2022-03-29 2024-05-24 青岛海尔制冷电器有限公司 冰箱内物品目标同一性判断方法、冰箱和计算机存储介质
US11625909B1 (en) * 2022-05-04 2023-04-11 Motional Ad Llc Track segment cleaning of tracked objects
CN115063741B (zh) * 2022-06-10 2023-08-18 嘉洋智慧安全科技(北京)股份有限公司 目标检测方法、装置、设备、介质及产品
CN116030059B (zh) * 2023-03-29 2023-06-16 南京邮电大学 基于轨迹的目标id重认证匹配方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955688A (zh) * 2014-05-20 2014-07-30 楚雄师范学院 一种基于计算机视觉的斑马鱼群检测与跟踪方法
US20160343146A1 (en) * 2015-05-22 2016-11-24 International Business Machines Corporation Real-time object analysis with occlusion handling
CN110163889A (zh) * 2018-10-15 2019-08-23 腾讯科技(深圳)有限公司 目标跟踪方法、目标跟踪装置、目标跟踪设备
CN111553934A (zh) * 2020-04-24 2020-08-18 哈尔滨工程大学 一种采用多维度融合的多船舶跟踪方法
CN111709975A (zh) * 2020-06-22 2020-09-25 上海高德威智能交通系统有限公司 多目标跟踪方法、装置、电子设备及存储介质
CN112634326A (zh) * 2020-12-17 2021-04-09 深圳云天励飞技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104748750B (zh) * 2013-12-28 2015-12-02 华中科技大学 一种模型约束下的在轨三维空间目标姿态估计方法及系统
CN108509896B (zh) * 2018-03-28 2020-10-13 腾讯科技(深圳)有限公司 一种轨迹跟踪方法、装置和存储介质
CN110443833B (zh) * 2018-05-04 2023-09-26 佳能株式会社 对象跟踪方法和设备
CN110853078B (zh) * 2019-10-30 2023-07-04 同济大学 一种基于遮挡对的在线多目标跟踪方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955688A (zh) * 2014-05-20 2014-07-30 楚雄师范学院 一种基于计算机视觉的斑马鱼群检测与跟踪方法
US20160343146A1 (en) * 2015-05-22 2016-11-24 International Business Machines Corporation Real-time object analysis with occlusion handling
CN110163889A (zh) * 2018-10-15 2019-08-23 腾讯科技(深圳)有限公司 目标跟踪方法、目标跟踪装置、目标跟踪设备
CN111553934A (zh) * 2020-04-24 2020-08-18 哈尔滨工程大学 一种采用多维度融合的多船舶跟踪方法
CN111709975A (zh) * 2020-06-22 2020-09-25 上海高德威智能交通系统有限公司 多目标跟踪方法、装置、电子设备及存储介质
CN112634326A (zh) * 2020-12-17 2021-04-09 深圳云天励飞技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN113284168A (zh) * 2020-12-17 2021-08-20 深圳云天励飞技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063453A (zh) * 2022-06-24 2022-09-16 南京农业大学 植物叶片气孔个体行为检测分析方法、系统及存储介质
CN115063453B (zh) * 2022-06-24 2023-08-29 南京农业大学 植物叶片气孔个体行为检测分析方法、系统及存储介质
CN115695818A (zh) * 2023-01-05 2023-02-03 广东瑞恩科技有限公司 一种基于物联网的园区智能监控数据的高效管理方法
CN115965657A (zh) * 2023-02-28 2023-04-14 安徽蔚来智驾科技有限公司 目标跟踪方法、电子设备、存储介质及车辆
CN115965657B (zh) * 2023-02-28 2023-06-02 安徽蔚来智驾科技有限公司 目标跟踪方法、电子设备、存储介质及车辆
CN117671296A (zh) * 2023-12-19 2024-03-08 珠海市欧冶半导体有限公司 目标跟踪法、装置、计算机设备和存储介质
CN117457193A (zh) * 2023-12-22 2024-01-26 之江实验室 一种基于人体关键点检测的体态健康监测方法及系统
CN117457193B (zh) * 2023-12-22 2024-04-02 之江实验室 一种基于人体关键点检测的体态健康监测方法及系统
CN117876416A (zh) * 2024-03-12 2024-04-12 浙江芯昇电子技术有限公司 多目标跟踪方法、装置、设备及存储介质
CN117876416B (zh) * 2024-03-12 2024-06-04 浙江芯昇电子技术有限公司 多目标跟踪方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112634326A (zh) 2021-04-09
CN113284168A (zh) 2021-08-20

Similar Documents

Publication Publication Date Title
WO2022127180A1 (zh) 目标跟踪方法、装置、电子设备及存储介质
US20210248378A1 (en) Spatiotemporal action detection method
WO2021017291A1 (zh) 基于Darkflow-DeepSort的多目标追踪检测方法、装置及存储介质
CN108388879B (zh) 目标的检测方法、装置和存储介质
CN109035304B (zh) 目标跟踪方法、介质、计算设备和装置
TW202201944A (zh) 保持用於幀中的目標物件的固定尺寸
CN109635686B (zh) 结合人脸与外观的两阶段行人搜索方法
CN109977895B (zh) 一种基于多特征图融合的野生动物视频目标检测方法
US20220174089A1 (en) Automatic identification and classification of adversarial attacks
WO2022142417A1 (zh) 目标跟踪方法、装置、电子设备及存储介质
JP2016507834A (ja) ターゲットオブジェクトをトラッキングし、検出するためのシステムおよび方法
GB2409028A (en) Face detection
WO2020233397A1 (zh) 在视频中对目标进行检测的方法、装置、计算设备和存储介质
CN112614187A (zh) 回环检测方法、装置、终端设备和可读存储介质
CN110688940A (zh) 一种快速的基于人脸检测的人脸追踪方法
CN111753590B (zh) 一种行为识别方法、装置及电子设备
WO2022082999A1 (zh) 一种物体识别方法、装置、终端设备及存储介质
CN112668524A (zh) 多目标跟踪系统及方法
CN110765903A (zh) 行人重识别方法、装置及存储介质
CN112084952B (zh) 一种基于自监督训练的视频点位跟踪方法
CN110610123A (zh) 一种多目标车辆检测方法、装置、电子设备及存储介质
CN115049954B (zh) 目标识别方法、装置、电子设备和介质
CN112116567A (zh) 一种无参考图像质量评价方法、装置及存储介质
Nishimura et al. SDOF-tracker: Fast and accurate multiple human tracking by skipped-detection and optical-flow
WO2024012367A1 (zh) 视觉目标跟踪方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905113

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21905113

Country of ref document: EP

Kind code of ref document: A1