WO2022142417A1 - 目标跟踪方法、装置、电子设备及存储介质 - Google Patents

目标跟踪方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2022142417A1
WO2022142417A1 PCT/CN2021/114903 CN2021114903W WO2022142417A1 WO 2022142417 A1 WO2022142417 A1 WO 2022142417A1 CN 2021114903 W CN2021114903 W CN 2021114903W WO 2022142417 A1 WO2022142417 A1 WO 2022142417A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
detection
frame
missed
image
Prior art date
Application number
PCT/CN2021/114903
Other languages
English (en)
French (fr)
Inventor
王智卓
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2022142417A1 publication Critical patent/WO2022142417A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present invention relates to the field of artificial intelligence, and in particular, to a target tracking method, device, electronic device and storage medium.
  • the multi-target tracking algorithm includes the target detection algorithm and the target ID matching tracking algorithm.
  • the target detection algorithm may be missed in some cases. For example, in a real scene, when the target is occluded or interfered, especially in a crowded scene, the missed detection rate is high, resulting in a low target tracking accuracy. Therefore, the existing target tracking algorithms have the problem of low target detection accuracy.
  • the embodiment of the present invention provides a target tracking method, which can reduce the missed detection rate of target detection, thereby improving the target detection accuracy rate of multi-target tracking.
  • an embodiment of the present invention provides a target tracking method, and the method includes:
  • the target detection frame information corresponding to the n+1 th frame image and the target count information corresponding to the n+1 th frame image it is determined whether the first missed detection situation exists, and the first missed detection situation includes the first missed detection target point;
  • the second missed detection situation includes the second missed detection target point
  • a target tracking trajectory is obtained.
  • the extraction of target count information, target detection frame information and target prediction frame information of each frame of image in the image sequence to be processed includes:
  • the target detection frame information includes the target detection frame and the target detection feature
  • the target prediction frame information of each frame of image is calculated according to the preset prediction network, and the target prediction frame information includes the target prediction frame and the target prediction feature.
  • the detection and re-identification network includes a public network, a detection branch network and a re-identification branch network, the input of the detection branch network is connected to the output of the public network, and the input of the re-identification branch network is connected.
  • the target detection frame information of each frame of image is calculated according to the preset detection and re-identification network, including:
  • the target detection features implicit in the common features are extracted through the re-identification branch network.
  • calculating the first tracking trajectory of each target in the image sequence to be processed according to the target detection frame information and target prediction frame information of each frame of image including:
  • the target detection frame information and the target prediction frame information configure a unique ID for the target detection frame of each target
  • the first tracking trajectory of each target is obtained.
  • the target detection frame information and target prediction frame information configure a unique ID for the target detection frame of each target, including:
  • a unique ID is configured for the target detection frame of each target.
  • the target count information includes the estimated number of targets, and the target detection frame information corresponding to the n+1 th frame image and the target count information corresponding to the n+1 th frame image are used to determine whether the first leak exists. inspections, including:
  • the target detection frame information corresponding to the n+1 frame image count the number of target detection frames in the n+1 frame image
  • the number of the target detection frames is less than the estimated target number, it is determined that there is a first missed detection situation.
  • determining whether there is a second missed detection situation according to the target prediction frame information corresponding to the nth frame image and the first missed detection target point corresponding to the n+1th frame image includes:
  • the first missed detection target point is located within the first target prediction frame, it is determined that there is a second missed detection situation.
  • the first missed detection frame information includes a first missed detection target detection frame
  • the determining the first missed detection frame information according to the first missed detection target point includes:
  • the first target prediction frame is used as the first missed target detection frame of the first missed target point in the n+1 th frame image.
  • the second missed detection frame information includes a second missed detection target detection frame, and the second missed detection frame information is determined according to the second missed detection target point, including:
  • the second missed detection target point As the center, configure a second missed detection target point in the n+1th frame image for the second missed detection target point Object detection box.
  • obtaining the target tracking trajectory based on the first tracking trajectory, the first missed frame information and/or the second missed frame information includes:
  • the first unique ID is a unique ID corresponding to the first tracking track
  • the unique ID of the first missed frame information add the first missed frame information to the first tracking track with the same unique ID; and/or
  • a second unique ID is configured for the corresponding second missed frame information in the n+1 th frame image, and the second unique ID is different from the unique IDs corresponding to all the first tracking tracks.
  • an embodiment of the present invention further provides a target tracking device, the device comprising:
  • the extraction module is used to extract the target count information, target detection frame information and target prediction frame information of each frame image in the image sequence to be processed;
  • a calculation module used for calculating the first tracking track of each target in the image sequence to be processed according to the target detection frame information and target prediction frame information of each frame of image;
  • the first judgment module is configured to judge whether there is a first missed detection situation according to the target detection frame information corresponding to the n+1 th frame image and the target count information corresponding to the n+1 th frame image, and the first missed detection The situation includes the first missed target point;
  • the second judging module is configured to, if the first missed detection situation exists, determine whether there is a second missed detection target point according to the target prediction frame information corresponding to the nth frame image and the first missed detection target point corresponding to the n+1th frame image Missing detection situation, the second missed detection situation includes the second missed detection target point;
  • a first determining module configured to determine first missed frame information according to the first missed target point if there is no second missed detection
  • a second determination module configured to determine the second missed frame information according to the second missed target point if there is a second missed detection situation
  • a processing module configured to obtain a target tracking trajectory based on the first tracking trajectory, the first missed frame information and/or the second missed frame information.
  • an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, when the processor executes the computer program The steps in the target tracking method provided by the embodiment of the present invention are implemented.
  • embodiments of the present invention provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, implements the target tracking method provided by the embodiments of the present invention. step.
  • the target count information, target detection frame information and target prediction frame information of each frame of image in the image sequence to be processed are extracted;
  • the first tracking trajectory of each target in the image sequence according to the target detection frame information corresponding to the n+1th frame image and the target count information corresponding to the n+1th frame image, it is judged whether there is a first missed detection situation, so
  • the first missed detection situation includes the first missed detection target point; if the first missed detection situation exists, the first missed detection target corresponding to the n+1th frame image is predicted according to the target prediction frame information corresponding to the nth frame image.
  • the second missed detection situation includes the second missed detection target point; if there is no second missed detection situation, then according to the first missed detection target point, determine the first missed detection point detection frame information; if there is a second missed detection situation, the second missed detection frame information is determined according to the second missed detection target point; based on the first tracking trajectory, the first missed detection frame information and/or The second missed frame information is used to obtain a target tracking trajectory.
  • the missed detection rate can be effectively reduced, the target detection accuracy of multi-target tracking can be improved, and the accuracy of multi-target tracking can be improved.
  • FIG. 1 is a flowchart of a target tracking method provided by an embodiment of the present invention.
  • FIG. 1a is a structural diagram of a detection and re-identification network provided by an embodiment of the present invention.
  • Fig. 1b is a schematic diagram of a heat map of a feature map provided by an embodiment of the present invention.
  • 1c is a schematic diagram of a center point offset component provided by an embodiment of the present invention.
  • 1d is a schematic diagram of a detection frame size component provided by an embodiment of the present invention.
  • 1e is a schematic diagram of the output of a crowd counting estimation network provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a target tracking device provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an extraction module provided by an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of a second computing submodule provided by an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a computing module provided by an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a first configuration sub-module provided by an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a first judgment module provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a second judgment module provided by an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a processing module provided by an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • FIG. 1 is a flowchart of a target tracking method provided by an embodiment of the present invention. As shown in FIG. 1, the following steps are included:
  • the above-mentioned image sequence to be processed may be a video image captured by a camera in real time, for example, a video image of the specific monitoring scene is captured in real time by a camera installed in a specific monitoring scene. Further, the camera may be set in the specific monitoring scene. At a certain height of the scene, the target in the specific monitoring scene is captured in real time. It may also be a video image uploaded by a user, and the above-mentioned image sequence refers to frame images obtained in time series.
  • the above-mentioned image sequence to be processed includes a target to be tracked, and the above-mentioned target to be tracked may be a moving target, and the above-mentioned moving target may be a target such as a pedestrian, a vehicle, and an animal that can generate a moving trajectory.
  • the above-mentioned target to be tracked may be one or more.
  • the above-mentioned target detection frame information can be used to detect the target to be tracked through the target detection network.
  • the above-mentioned target detection network is already trained.
  • the above-mentioned target detection network can be obtained by training the user through the sample target data set, or it can be obtained by downloading.
  • the network structure and parameters of the target detection network are obtained after fine-tuning training on the sample target data set.
  • the input of the target detection network is a frame image in the image sequence to be processed
  • the output is the detection frame information of the target to be tracked in the corresponding frame image
  • the detection frame information output by the target detection network may include the to-be-tracked frame information.
  • the above position information may be information in the format of det(x, y, w, h), wherein the above x and y represent the coordinates of the center point of the detection frame in the corresponding frame image, and the above w and h respectively represent the detection frame in the corresponding frame.
  • the above confidence level information is used to indicate the degree of confidence that the image content in the detection frame is the target to be tracked.
  • the higher the confidence degree the higher the degree of confidence that the content of the image in the detection frame is the target to be tracked.
  • the above target detection network may be a network constructed based on the CenterNet target detection algorithm.
  • the above-mentioned target detection network is a detection and re-identification network
  • the above-mentioned target detection frame information includes a target detection frame and a target detection feature.
  • FIG. 1a is a structural diagram of a detection and re-identification network provided by an embodiment of the present invention.
  • the detection and re-identification network includes a public network, a detection branch network, and a re-identification branch network, the input of the detection branch network is connected to the output of the public network, and the input of the re-identification branch network is connected to the output of the public network.
  • each frame of images for the detection branch network and the re-identification branch network can be extracted through the above-mentioned public network; the target detection frame implicit in the public features can be extracted through the above-mentioned detection branch network; the hidden target detection frame in the public features can be extracted through the above-mentioned re-identification branch network. target detection features.
  • the above-mentioned embodiment of the present invention also provides a fast and robust public network.
  • the implementation structure of the public network is shown in Table 1 below:
  • Conv2d represents the two-dimensional convolutional layer
  • BatchNorm2d represents the two-dimensional batch normalization
  • ReLU is the activation function
  • MaxPool2d is the two-dimensional maximum upsampling
  • Eps is the complete data set
  • momentum is the momentum update speed
  • heatmap is the corresponding feature map.
  • Hidden features Size is the hidden feature corresponding to the size of the detection frame
  • center is the hidden feature corresponding to the center point offset
  • id is the hidden feature corresponding to the re-identification.
  • the detection branch network may be constructed based on the CenterNet network, and the feature maps, detection frame sizes, and center point offsets of different targets may be output simultaneously.
  • Fig. 1b is a schematic diagram of a heat map of a feature map provided by an embodiment of the present invention
  • Fig. 1c is a schematic diagram of a center point bias component provided by an embodiment of the present invention
  • FIG. 1d is a schematic diagram of a detection frame size component provided by an embodiment of the present invention.
  • the heat map of the feature map includes the center point of the target; in Figure 1c, the center point offset represents the offset of the coordinates of the center point of the target, which can be reduced due to the step size of the feature map.
  • the detection box size components are the detected height and width offsets.
  • the training of the detection branch network can be performed based on a deep learning framework, for example, the training can be performed based on the Pytorch deep learning framework.
  • the hyperparameters used can be shown in Table 2 below:
  • the above-mentioned target prediction frame information can be used to predict the target position of the target to be tracked through a target prediction network, and the above-mentioned target prediction network is already trained, which can be obtained by the user's own training, or can be obtained by downloading
  • the network structure and parameters of the target detection network are obtained after fine-tuning training on the sample target data set, and the above target prediction network may be a network constructed based on the Kalman filter algorithm.
  • the input of the above target prediction network is a frame image in the image sequence to be processed
  • the output is the prediction frame information of the target to be tracked in the corresponding frame image in the next frame
  • the prediction frame information output by the above target prediction network It may include position information and confidence information of the target to be tracked in the next frame of image.
  • the above position information can be information in the format of pre(x, y, w, h), wherein the above x and y represent the coordinates of the center point of the detection frame in the next frame of image, and the above w and h respectively represent the detection frame in the next frame. The width and height in the image.
  • the target detection frame information corresponding to the nth frame image and the target prediction frame information corresponding to the nth frame image will be output;
  • the target detection frame information corresponding to the n+1 frame image and the target prediction frame information corresponding to the n+1 frame image are output.
  • the target prediction frame information corresponding to the nth frame image can be understood as the prediction of the target detection frame information corresponding to the n+1th frame image
  • the target prediction frame information corresponding to the n+1th frame image can be understood as the prediction of the target frame information corresponding to the n+1th frame image. Prediction of target detection frame information corresponding to n+2 frame images.
  • the target count information of each frame of image may be calculated according to a preset target count estimation network.
  • the above target count estimation network may be a target count estimation network based on the C-CNN algorithm or the M-CNN algorithm.
  • the calculation result of the target counting estimation network includes target counting information.
  • Figure 1e is a schematic diagram of the output of a crowd counting estimation network provided by an embodiment of the present invention.
  • GT represents the standard result of the number of people
  • Pred represents the target count estimation result of the target count estimation network.
  • the target prediction frame information corresponding to the nth frame image can be understood as the prediction of the target detection frame information corresponding to the n+1th frame image.
  • the above-mentioned target detection frame corresponding to the n+1th frame image The purpose of matching the information with the target prediction frame information corresponding to the nth frame image can be understood as whether the detection result is the same or similar to the prediction result, and then it is judged whether a false detection occurs.
  • the above-mentioned first tracking trajectory can be obtained by matching and connecting the target detection frame information and the target prediction frame information through the SORT sorting algorithm.
  • a unique ID can be configured for the target detection frame of each target; according to the unique ID of each target, the first tracking trajectory of each target can be obtained, and the target detection can be The frame information is matched with the target prediction frame information, and an ID is set for each target detection frame on the matching, which can be set to the same unique ID for the target detection frame ID of the same target.
  • the target detection frame information includes a target detection frame and a target detection feature
  • the target prediction frame information includes a target prediction frame and a target prediction feature.
  • the above prediction frame feature can be obtained by performing feature extraction after acquiring the corresponding target image in the corresponding frame image according to the target prediction frame.
  • the intersection ratio of the target detection frame of each target in the n+1 frame image and the target prediction frame of each target in the n frame image can be calculated;
  • the feature similarity of the target prediction features of each target based on the intersection ratio and feature similarity, a unique ID is configured for the target detection frame of each target.
  • intersection ratio refers to the ratio of the intersection area of the target detection frame and the target prediction frame to the combined area of the target detection frame and the target prediction frame, where the combined area is the area of the target detection frame plus the area of the target prediction frame minus the area of the target prediction frame.
  • the intersection area of the target detection frame and the target prediction frame can be understood as the similarity of motion features, and the above-mentioned similarity can be understood as the similarity of appearance features, and the total similarity between the similarity of motion features and the similarity of appearance features can be obtained by the following formula:
  • D i and T j represent the information of the i-th target detection frame and the j-th target prediction frame respectively
  • IOU represents the intersection ratio IOU of the target detection frame and the target prediction frame
  • SIM represents the difference between the target detection feature and the target prediction feature.
  • the feature similarity between the two S represents the final similarity
  • is a preset parameter, which can be adjusted according to the user’s prior. When the user believes more in the similarity of appearance features, the parameter ⁇ can be set to a smaller value, When the user believes more in the similarity of motion features, the parameter ⁇ can be set to a larger value.
  • the similarity is greater than the predicted similarity threshold, it can be indicated that the target detection frame information matches the target prediction frame information, and further indicates that the target detection frame information and the target prediction frame information belong to the same target, which is the target detection corresponding to each target.
  • the frame information corresponds to a unique ID
  • a target corresponds to a unique ID. You can use the unique ID.
  • the corresponding target detection frame information is added to the first tracking trajectory of the corresponding target, if there is target detection frame information and target prediction frame If the information does not match, it can be judged that it is a newly added target, a disappeared target or a missed target.
  • the above-mentioned newly added target can be understood as that there is corresponding target detection frame information in the n+1th frame image, and there is no corresponding target prediction frame information in the nth frame image; the above-mentioned disappeared target can be understood as, in the nth frame image There is corresponding target detection frame information in the +1 frame image, and there is no corresponding target prediction frame information in the n+1th frame image; the above missed detection target can be understood as that there is no corresponding target in the n+1th frame image.
  • Target detection frame information there is corresponding target prediction frame information in the nth frame image.
  • the above-mentioned first missed detection situation includes the first missed detection target point.
  • the above-mentioned first missed target point can be understood as a target without corresponding target detection frame information.
  • the target count information includes the estimated number of targets, the above-mentioned n+1th frame image is the current frame, and the number of target detection frames in the n+1th frame image can be counted according to the target detection frame information corresponding to the n+1th frame image. determine whether the number of target detection frames is less than the estimated target number; if the number of target detection frames is less than the estimated number of targets, it is determined that there is a first missed detection situation.
  • the target detection frame information of m targets in the n+1th frame image can be obtained, that is, m target detection frames.
  • the target in the n+1th frame image can be obtained.
  • the estimated number of targets k in the counting information is used to determine whether m is less than k. If m is less than k, it means that the number of target detection frames is less than the estimated number of targets, and there is a first missed detection situation.
  • the above-mentioned second missed detection situation includes a second missed detection target point.
  • the above-mentioned second missed target point can be understood as a target without corresponding target prediction frame information.
  • the above-mentioned first target prediction frame refers to the target prediction frame of each target in the nth frame of image.
  • the first missed target point is located in the first target prediction frame, it means that the first missed target point is the target predicted in the n-frame image, but it is not detected in the n+1-th frame image, A missed target.
  • the first missed target point is not located in any of the first target prediction frames, it means that the target is the newly missed target in the n+1th frame image, and the new target in the n+1th frame image is in n Frame images are not predicted.
  • the first missed detection target point is a target predicted in the n-frame image, but is not detected in the n+1-th frame image , which is a missed target.
  • the first target prediction frame corresponding to the first missed target point may be used as the first missed target detection frame of the first missed target point in the n+1 th frame image.
  • the first missed detection mark may also be marked on the first missed detection target detection frame.
  • a second missed detection target point can be configured in the n+1th frame image for the second missed detection target point Object detection box.
  • a second missed detection mark may also be marked on the second missed detection target detection frame. The first missed detection mark and the second missed detection mark are used to distinguish the first missed detection target detection frame from the second missed detection target detection frame.
  • the above-mentioned first tracking track includes a unique ID corresponding to the target, one target corresponds to one first tracking track, and one first tracking track corresponds to one unique ID.
  • a first unique frame can be configured for the corresponding first missed frame in the n+1 frame image.
  • ID is a unique ID corresponding to the first tracking track; according to the unique ID of the first missed frame, the first missed frame is added to the first tracking track with the same unique ID ; In this way, the information of the first missed detection frame can be added to the first tracking trajectory to complement the missed detection tracking trajectory to obtain the target tracking trajectory.
  • a second unique ID is configured for the corresponding second missed frame in the n+1 th frame image, and the second unique ID is different from the unique IDs corresponding to all the first tracking tracks. Since the second missed detection box information is the missed detection corresponding to the newly added missed detection target, it is equivalent to a new target, and a second unique ID that is not occupied needs to be allocated to the new target.
  • the target count information, target detection frame information and target prediction frame information of each frame of image in the image sequence to be processed are extracted;
  • the first missed detection situation includes the first missed detection target point; if the first missed detection situation exists, the first missed detection target corresponding to the n+1th frame image is predicted according to the target prediction frame information corresponding to the nth frame image.
  • the second missed detection situation includes the second missed detection target point; if there is no second missed detection situation, then according to the first missed detection target point, determine the first missed detection point detection frame information; if there is a second missed detection situation, the second missed detection frame information is determined according to the second missed detection target point; based on the first tracking trajectory, the first missed detection frame information and/or The second missed frame information is used to obtain a target tracking trajectory.
  • the missed detection rate can be effectively reduced, the target detection accuracy of multi-target tracking can be improved, and the accuracy of multi-target tracking can be improved.
  • target tracking method provided by the embodiment of the present invention can be applied to devices such as mobile phones, monitors, computers, servers, etc. that can perform target tracking.
  • FIG. 2 is a schematic structural diagram of a target tracking device provided by an embodiment of the present invention. As shown in FIG. 2, the device includes:
  • the extraction module 201 is used to extract the target count information, target detection frame information and target prediction frame information of each frame of image in the image sequence to be processed;
  • the calculation module 202 is used to calculate the first tracking trajectory of each target in the image sequence to be processed according to the target detection frame information and target prediction frame information of each frame of image;
  • the first judgment module 203 is configured to judge whether there is a first missed detection situation according to the target detection frame information corresponding to the n+1 th frame image and the target count information corresponding to the n+1 th frame image.
  • the inspection situation includes the first missed inspection target point;
  • the second judging module 204 is configured to, if the first missed detection situation exists, determine whether there is the first missed detection target point corresponding to the target prediction frame information corresponding to the nth frame image and the n+1th frame image 2. Missing detection situation, the second missing detection situation includes the second missing detection target point;
  • the first determination module 205 is configured to determine the first missed frame information according to the first missed target point if there is no second missed detection situation;
  • the second determination module 206 is configured to, if there is a second missed detection situation, determine the second missed detection frame information according to the second missed detection target point;
  • the processing module 207 is configured to obtain a target tracking trajectory based on the first tracking trajectory, the first missed frame information and/or the second missed frame information.
  • the extraction module 201 includes:
  • the first calculation submodule 2011 is used to calculate the target count information of each frame of image according to the preset target count estimation network
  • the second calculation submodule 2012 is configured to calculate the target detection frame information of each frame of images according to a preset detection and re-identification network, and the target detection frame information includes the target detection frame and the target detection feature;
  • the second calculation sub-module 2013 is configured to calculate the target prediction frame information of each frame of images according to the preset prediction network, where the target prediction frame information includes the target prediction frame and the target prediction feature.
  • the detection and re-identification network includes a public network, a detection branch network and a re-identification branch network, the input of the detection branch network is connected to the output of the public network, and the re-identification branch network is connected.
  • the input of identifying the branch network is connected to the output of the public network, and the second calculation sub-module 2012 includes:
  • the first extraction unit 20121 is used to extract the common features of each frame of images for detecting branch networks and re-identifying branch networks through the public network;
  • the second extraction unit 20122 is configured to extract the target detection frame implicit in the common feature through the detection branch network;
  • the third extraction unit 20123 is configured to extract the target detection feature implicit in the common feature through the re-identification branch network.
  • the computing module 202 includes:
  • the first configuration submodule 2021 is used to configure a unique ID for the target detection frame of each target according to the target detection frame information and target prediction frame information;
  • the first association sub-module 2022 is configured to obtain the first tracking trajectory of each target according to the unique ID of each target.
  • the first configuration sub-module 2021 includes:
  • the first calculation unit 20211 is used to calculate the intersection ratio of the target detection frame of each target of the n+1th frame image and the target prediction frame of each target in the nth frame image;
  • the second calculation unit 20212 is used to calculate the feature similarity between the target detection feature of each target in the n+1th frame image and the target prediction feature of each target in the nth frame image;
  • the configuration unit 20213 is configured to configure a unique ID for the target detection frame of each target based on the intersection ratio and the feature similarity.
  • the target count information includes the estimated number of targets
  • the first judgment module 203 includes:
  • a statistics sub-module 2031 configured to count the number of target detection frames in the n+1th frame image according to the target detection frame information corresponding to the n+1th frame image;
  • the first judgment submodule 2032 is used for judging whether the number of the target detection frames is less than the estimated number of targets;
  • the first determination sub-module 2033 is configured to determine that there is a first missed detection situation if the number of the target detection frames is less than the estimated target number.
  • the second judgment module 204 includes:
  • the second judgment sub-module 2041 is used for judging whether the first missed target point is located in the first target prediction frame
  • the second determination sub-module 2042 is configured to determine that there is a second missed detection situation if the first missed detection target point is located within the first target prediction frame.
  • the first determination module 205 is further configured to use the first target prediction frame as the first missed target detection frame of the first missed target point in the n+1 th frame image.
  • the second determination module 206 is further configured to, according to the height and width information of the target prediction frame in the nth frame image, take the second missed target point as the center, in the n+1th frame image:
  • the second missed target point is configured with a second missed target detection frame.
  • the processing module 207 includes:
  • the second configuration sub-module 2071 is configured to, according to the corresponding first missed frame information in the n+1 th frame image and the target detection frame information corresponding to the n th frame image, form the n+1 th frame image
  • the corresponding first missed frame information is configured with a first unique ID, and the first unique ID is a unique ID corresponding to the first tracking track;
  • a second association sub-module 2072 configured to add the first missed frame information to the first tracking track with the same unique ID according to the unique ID of the first missed frame information;
  • the third configuration sub-module 2073 is configured to configure a second unique ID for the corresponding second missed frame information in the n+1th frame image, where the second unique ID is the unique ID corresponding to all the first tracking trajectories are different.
  • target tracking apparatus provided by the embodiment of the present invention can be applied to devices such as mobile phones, monitors, computers, servers, etc., which can perform target tracking.
  • the target tracking device provided by the embodiment of the present invention can implement each process implemented by the target tracking method in the above method embodiments, and can achieve the same beneficial effects. In order to avoid repetition, details are not repeated here.
  • FIG. 10 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention. As shown in FIG. 10, it includes: a memory 1002, a processor 1001, and a memory 1002 and a processor A computer program running on 1001, where:
  • the processor 1001 is used for calling the computer program stored in the memory 1002, and performs the following steps:
  • the target detection frame information corresponding to the n+1 th frame image and the target count information corresponding to the n+1 th frame image it is determined whether the first missed detection situation exists, and the first missed detection situation includes the first missed detection target point;
  • the second missed detection situation includes the second missed detection target point
  • a target tracking trajectory is obtained.
  • the extraction of the target count information, target detection frame information and target prediction frame information of each frame of image in the image sequence to be processed performed by the processor 1001 includes:
  • the target detection frame information includes the target detection frame and the target detection feature
  • the target prediction frame information of each frame of image is calculated according to the preset prediction network, and the target prediction frame information includes the target prediction frame and the target prediction feature.
  • the detection and re-identification network includes a public network, a detection branch network and a re-identification branch network, the input of the detection branch network is connected to the output of the public network, and the input of the re-identification branch network is connected.
  • the calculation of the target detection frame information of each frame of image according to the preset detection and re-identification network performed by the processor 1001 includes:
  • the target detection features implicit in the common features are extracted through the re-identification branch network.
  • calculating the first tracking trajectory of each target in the image sequence to be processed according to the target detection frame information and target prediction frame information of each frame of image performed by the processor 1001 includes:
  • the target detection frame information and the target prediction frame information configure a unique ID for the target detection frame of each target
  • the first tracking trajectory of each target is obtained.
  • the processor 1001 configures a unique ID for the target detection frame of each target, including:
  • a unique ID is configured for the target detection frame of each target.
  • the target count information includes the estimated number of targets
  • the processor 1001 determines the target count information according to the target detection frame information corresponding to the n+1th frame image and the target count information corresponding to the n+1th frame image. Whether there is a first missed inspection, including:
  • the target detection frame information corresponding to the n+1 frame image count the number of target detection frames in the n+1 frame image
  • the number of the target detection frames is less than the estimated target number, it is determined that there is a first missed detection situation.
  • the execution of the processor 1001 to determine whether there is a second missed detection situation according to the target prediction frame information corresponding to the nth frame image and the first missed detection target point corresponding to the n+1th frame image includes:
  • the first missed detection target point is located within the first target prediction frame, it is determined that there is a second missed detection situation.
  • the first missed frame information executed by the processor 1001 includes a first missed target detection frame
  • the determining the first missed frame information according to the first missed target point includes:
  • the first target prediction frame is used as the first missed target detection frame of the first missed target point in the n+1 th frame image.
  • the second missed frame information executed by the processor 1001 includes a second missed target detection frame, and the second missed frame information is determined according to the second missed target point, including:
  • the second missed detection target point As the center, configure a second missed detection target point in the n+1th frame image for the second missed detection target point Object detection box.
  • the execution of the processor 1001 to obtain the target tracking trajectory based on the first tracking trajectory, the first missed frame information and/or the second missed frame information includes:
  • the first unique ID is a unique ID corresponding to the first tracking track
  • a second unique ID is configured for the corresponding second missed frame information in the n+1 th frame image, and the second unique ID is different from the unique IDs corresponding to all the first tracking tracks.
  • the above electronic device may be a mobile phone, a monitor, a computer, a server and other devices that can be applied to target tracking.
  • the electronic device provided by the embodiments of the present invention can implement the various processes implemented by the target tracking method in the above method embodiments, and can achieve the same beneficial effects. To avoid repetition, details are not repeated here.
  • Embodiments of the present invention also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本发明实施例提供一种目标跟踪方法,方法包括:提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;计算待处理图像序列中每个目标的第一跟踪轨迹;判断是否存在第一漏检情况;若存在第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况;若不存在第二漏检情况,则根据第一漏检目标点,确定第一漏检框信息;若存在第二漏检情况,则根据第二漏检目标点,确定第二漏检框信息;基于第一跟踪轨迹、第一漏检框信息和/或第二漏检框信息,得到目标跟踪轨迹。可以提高多目标跟踪的准确率。

Description

目标跟踪方法、装置、电子设备及存储介质
本申请要求于2020年12月31日提交中国专利局,申请号为202011639844.2、发明名称为“目标跟踪方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及人工智能领域,尤其涉及一种目标跟踪方法、装置、电子设备及存储介质。
背景技术
随着城市人口的快速增加,越来越多场景需要安装更多的智能摄像头来完成视频监控与分析的任务,然而多目标跟踪算法是智能摄像头中不可或缺的一种算法,一种鲁棒、高精度的多目标跟踪算法可以极大的提升整个智能视频监控系统的性能。多目标跟踪算法包括目标检测算法与目标ID匹配跟踪算法,虽然当前已经有多目标跟踪网络,但是对于目标检测算法而言,在某些情况下会出现目标漏检的情况。比如,在真实场景中,目标发生遮挡或者受到干扰的情况下,尤其是在比较拥挤的场景中,漏检率较高,导致目标跟踪的准确率较低。因此,现有的目标跟踪算法存在目标检测准确率低的问题。
发明内容
本发明实施例提供一种目标跟踪方法,能够降低目标检测的漏检率,进而提高多目标跟踪的目标检测准确率。
第一方面,本发明实施例提供一种目标跟踪方法,所述方法包括:
提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;
根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;
根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标 点;
若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,所述第二漏检情况包括第二漏检目标点;
若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;
若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;
基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。
可选的,所述提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息,包括:
根据预设的目标计数估计网络计算每帧图像的目标计数信息;
根据预设的检测与重识别网络计算每帧图像的目标检测框信息,所述目标检测框信息包括目标检测框与目标检测特征;
根据预设的预测网络计算每帧图像的目标预测框信息,所述目标预测框信息包括目标预测框与目标预测特征。
可选的,所述检测与重识别网络包括公共网络、检测分支网络以及重识别分支网络,所述检测分支网络的输入连接于所述公共网络的输出,以及所述重识别分支网络的输入连接于所述公共网络的输出,所述根据预设的检测与重识别网络计算每帧图像的目标检测框信息,包括:
通过所述公共网络提取每帧图像对于检测分支网络以及重识别分支网络的公共特征;
通过所述检测分支网络提取所述公共特征中隐含的目标检测框;
通过所述重识别分支网络提取所述公共特征中隐含的目标检测特征。
可选的,所述根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹,包括:
根据所述目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID;
根据每个目标的唯一ID,得到每个目标的第一跟踪轨迹。
可选的,所述根据所述目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID,包括:
计算第n+1帧图像各个目标的目标检测框与第n帧图像中各个目标的目标 预测框的交并比;
计算第n+1帧图像各个目标的目标检测特征与第n帧图像中各个目标的目标预测特征的特征相似度;
基于所述交并比与所述特征相似度,为每个目标的目标检测框配置一个唯一ID。
可选的,所述目标计数信息包括目标估计数量,所述根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,包括:
根据所述第n+1帧图像对应的目标检测框信息,统计第n+1帧图像中目标检测框的数量;
判断所述目标检测框的数量是否小于所述目标估计数量;
若所述目标检测框的数量小于所述目标估计数量,则确定存在第一漏检情况。
可选的,所述根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,包括:
判断所述第一漏检目标点是否位于所述第一目标预测框内;
若第一漏检目标点位于所述第一目标预测框内,则确定存在第二漏检情况。
可选的,所述第一漏检框信息包括第一漏检目标检测框,所述根据所述第一漏检目标点,确定第一漏检框信息,包括:
将所述第一目标预测框作为所述第一漏检目标点在第n+1帧图像中的第一漏检目标检测框。
可选的,所述第二漏检框信息包括第二漏检目标检测框,根据所述第二漏检目标点,确定第二漏检框信息,包括:
根据第n帧图像中的目标预测框的高宽信息,以所述第二漏检目标点为中心,在第n+1帧图像中为所述第二漏检目标点配置一个第二漏检目标检测框。
可选的,所述基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹,包括:
根据所述第n+1帧图像中对应的第一漏检框信息与第n帧图像中对应的目标检测框信息,为所述第n+1帧图像中对应的第一漏检框信息配置一个第一唯一ID,所述第一唯一ID为所述第一跟踪轨迹对应的一个唯一ID;
根据所述第一漏检框信息的唯一ID,将所述第一漏检框信息加入到具有 相同唯一ID的第一跟踪轨迹;和/或
为所述第n+1帧图像中对应的第二漏检框信息配置一个第二唯一ID,所述第二唯一ID与所有第一跟踪轨迹对应的唯一ID均不同。
第二方面,本发明实施例还提供一种目标跟踪装置,所述装置包括:
提取模块,用于提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;
计算模块,用于根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;
第一判断模块,用于根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标点;
第二判断模块,用于若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,所述第二漏检情况包括第二漏检目标点;
第一确定模块,用于若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;
第二确定模块,用于若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;
处理模块,用于基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。
第三方面,本发明实施例提供一种电子设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现本发明实施例提供的目标跟踪方法中的步骤。
第四方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现发明实施例提供的目标跟踪方法中的步骤。
本发明实施例中,提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标点;若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第 一漏检目标点,判断是否存在第二漏检情况,所述第二漏检情况包括第二漏检目标点;若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。通过目标检测框信息与目标计数信息,可以判断是否存在第一漏检情况,通过目标预测框信息与第一漏检目标点,可以判断是否存在第二漏检情况,进而可以判断漏检的目标为已有目标或是新出现目标,可以有效降低漏检率,提高多目标跟踪的目标检测准确率,进而使得多目标跟踪的准确率得到提高。
附图说明
图1是本发明实施例提供的一种目标跟踪方法的流程图;
图1a是本发明实施例提供的一种检测与重识别网络的结构图;
图1b是本发明实施例提供的一种特征映射的热图示意图;
图1c是本发明实施例提供的一种中心点偏置分量的示意图;
图1d是本发明实施例提供的一种检测框大小分量的示意图;
图1e是本发明实施例提供的一种人群计数估计网络的输出示意图;
图2是本发明实施例提供的一种目标跟踪装置的结构示意图;
图3是本发明实施例提供的一种提取模块的结构示意图;
图4是本发明实施例提供的一种第二计算子模块的结构示意图;
图5是本发明实施例提供的一种计算模块的结构示意图;
图6是本发明实施例提供的一种第一配置子模块的结构示意图;
图7是本发明实施例提供的一种第一判断模块的结构示意图;
图8是本发明实施例提供的一种第二判断模块的结构示意图;
图9是本发明实施例提供的一种处理模块的结构示意图;
图10是本发明实施例提供的一种电子设备的结构示意图。
具体实施方式
请参见图1,图1是本发明实施例提供的一种目标跟踪方法的流程图,如图1所示,包括以下步骤:
101、提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信 息与目标预测框信息。
在本发明实施例中,上述待处理图像序列可以是摄像头实时抓拍的视频图像,比如通过安装在特定监控场景的摄像头实时抓拍该特定监控场景的视频图像,进一步的,摄像头可以设置在该特定监控场景的一定高度之处,对该特定监控场景中的目标进行实时抓拍。也可以是用户上传的视频图像,上述图像序列指的是按时序获取的帧图像。
上述待处理图像序列中包括待跟踪目标,上述待跟踪目标可以是运动目标,上述的运动目标可以是行人、车辆、动物等可以产生运动轨迹的目标。上述待跟踪目标可以是一个或多个。
上述的目标检测框信息可以通过目标检测网络来对待跟踪目标进行目标检测,上述目标检测网络为已经训练好的,上述目标检测网络可以为用户通过样本目标数据集进行训练得到,也可以是下载获取目标检测网络的网络结构与参数,通过样本目标数据集进行微调训练后得到。
在本发明实施例中,上述目标检测网络的输入为待处理图像序列中的帧图像,输出为对应帧图像中待跟踪目标的检测框信息,上述目标检测网络输出的检测框信息可以包括待跟踪目标在对应帧图像的位置信息和置信度信息。上述位置信息可以是det(x,y,w,h)格式的信息,其中,上述的x和y表示检测框在对应帧图像中的中心点坐标,上述w和h分别表示检测框在对应帧图像中的宽和高。上述的置信度信息用于表示检测框内的图像内容为待跟踪目标的可信程度,置信度越高,则检测框内的图像内容为待跟踪目标的可信程度越高。上述目标检测网络可以是基于CenterNet目标检测算法进行构建的网络。
进一步的,上述的目标检测网络为检测与重识别网络,上述目标检测框信息包括目标检测框与目标检测特征。具体的,请参见图1a,图1a是本发明实施例提供的一种检测与重识别网络的结构图,如图1a所示,检测与重识别网络包括公共网络、检测分支网络以及重识别分支网络,上述检测分支网络的输入连接于上述公共网络的输出,以及上述重识别分支网络的输入连接于上述公共网络的输出。可以通过上述公共网络提取每帧图像对于检测分支网络以及重识别分支网络的公共特征;通过上述检测分支网络提取公共特征中隐含的目标检测框;通过上述重识别分支网络提取公共特征中隐含的目标检测特征。
更进一步的,上述本发明实施例还提供一个快速、鲁棒的公共网络,具体的,公共网络的实现结构如下述表1所示:
索引 网络层名称 滤波器个数 卷积核大小 其它
0 Conv2d 16 7*7/1  
1 BatchNorm2d     Eps=1e-05,momentum=0.1
2 ReLU      
3 Conv2d 16 3*3/2  
4 BatchNorm2d     Eps=1e-05,momentum=0.1
5 ReLU      
6 Conv2d 32 3*3/2  
7 BatchNorm2d     Eps=1e-05,momentum=0.1
8 ReLU      
9 Conv2d 64 3*3/2  
10 BatchNorm2d     Eps=1e-05,momentum=0.1
11 ReLU      
12 Conv2d 64 3*3/1  
13 BatchNorm2d     Eps=1e-05,momentum=0.1
14 Conv2d 64 3*3/2  
15 BatchNorm2d     Eps=1e-05,momentum=0.1
16 ReLU      
17 Conv2d 64 3*3/1  
18 BatchNorm2d     Eps=1e-05,momentum=0.1
19 Conv2d 64 1*1/1  
20 BatchNorm2d     Eps=1e-05,momentum=0.1
21 ReLU      
22 MaxPool2d   2*2/2  
23 Conv2d 64 1*1/1  
24 BatchNorm2d     Eps=1e-05,momentum=0.1
25 Conv2d 128 3*3/2  
26 BatchNorm2d     Eps=1e-05,momentum=0.1
27 ReLU      
28 Conv2d 128 3*3/1  
29 BatchNorm2d     Eps=1e-05,momentum=0.1
30 Conv2d 128 3*3/2  
31 BatchNorm2d     Eps=1e-05,momentum=0.1
32 ReLU      
33 Conv2d 128 3*3/1  
34 BatchNorm2d     Eps=1e-05,momentum=0.1
35 Conv2d 128 3*3/2  
36 BatchNorm2d     Eps=1e-05,momentum=0.1
37 Conv2d 128 1*1/1  
38 BatchNorm2d     Eps=1e-05,momentum=0.1
39 ReLU      
40 MaxPool2d   2*2/2  
41 Conv2d 128 1*1/1  
42 BatchNorm2d     Eps=1e-05,momentum=0.1
43 Conv2d 256 3*3/2  
44 BatchNorm2d     Eps=1e-05,momentum=0.1
45 ReLU      
46 Conv2d 256 3*3/1  
47 BatchNorm2d     Eps=1e-05,momentum=0.1
48 Conv2d 256 3*3/2  
49 BatchNorm2d     Eps=1e-05,momentum=0.1
50 ReLU      
51 Conv2d 256 3*3/1  
52 BatchNorm2d     Eps=1e-05,momentum=0.1
53 Conv2d 256 3*3/2  
54 BatchNorm2d     Eps=1e-05,momentum=0.1
55 Conv2d 256 1*1/1  
56 BatchNorm2d     Eps=1e-05,momentum=0.1
57 ReLU      
58 MaxPool2d   2*2/2  
59 Conv2d 256 1*1/1  
60 BatchNorm2d     Eps=1e-05,momentum=0.1
61 Conv2d 256 3*3/2  
62 Conv2d 256 3*3/1 heatmap
63 ReLU      
64 Conv2d 1 1*1/1  
65 Conv2d 256 3*3/1 Size
66 ReLU      
67 Conv2d 2 1*1/1  
68 Conv2d 256 3*3/1 center
69 ReLU      
70 Conv2d 2 1*1/1  
71 Conv2d 256 3*3/1 id
72 ReLU      
73 Conv2d 256 1*1/1  
表1
其中,上述Conv2d代表二维卷积层,BatchNorm2d代表二维批归一化,ReLU为激活函数,MaxPool2d二维最大上采样,Eps为完整数据集,momentum为动量更新速度,heatmap为特征映射对应的隐含特征,Size为检测框大小尺寸对应的隐含特征,center为中心点偏置对应的隐含特征,id为重识别所对应的隐含特征。
在本发明实施例中,上述检测分支网络可以是基于CenterNet网络进行构建,可以同时输出不同目标的特征映射、检测框大小和中心点偏置。如图1b、图1c、图1d所示,其中,图1b是本发明实施例提供的一种特征映射的热图示意图,图1c是本发明实施例提供的一种中心点偏置分量的示意图,图1d是本发明实施例提供的一种检测框大小分量的示意图。在图1b中,特征映射的热图包括目标的中心点;在图1c中,中心点偏置表示目标中心点坐标的偏移量,该偏移量可以减小由于特征映射的步长所带来的中心点精度影响;在图1d中,检测框大小分量为检测的高和宽偏移量。
上述检测分支网络的训练可以基于深度学习框架进行,比如,可以基于Pytorch深度学习框架来进行训练。在检测分支网络的训练过程中,所采用的超参数可以如下述表2所示:
参数名称 默认值 描述
Input_size 1088*608 输入图片大小
lr 0.0001 学习率
epoch 30 迭代次数
batch_size 12 每次训练使用的图片个数
optimizer Adam 优化器
在本发明实施例中,上述的目标预测框信息可以通过目标预测网络来对待跟踪目标进行目标位置预测,上述目标预测网络为已经训练好的,具体可以是用户自行训练得到,也可以是下载获取目标检测网络的网络结构与参数,通过样本目标数据集进行微调训练后得到,上述目标预测网络可以是基于卡尔曼滤波算法进行构建的网络。
在本发明实施例中,上述目标预测网络的输入为待处理图像序列中的帧图像,输出为对应帧图像中待跟踪目标在下一帧中的预测框信息,上述目标预测网络输出的预测框信息可以包括待跟踪目标在下一帧图像的位置信息和置信度信息。上述位置信息可以是pre(x,y,w,h)格式的信息,其中,上述的x和y表示检测框在下一帧图像中的中心点坐标,上述w和h分别表示检测框在下一帧图像中的宽和高。
可以理解的是,通过目标检测网络和目标预测网络,对于第n帧图像作为输入的情况,会输出第n帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息;对于第n+1帧图像作为输入的情况,会输出第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标预测框信息。其中,第n帧图像对应的目标预测框信息可以理解为是对第n+1帧图像对应的目标检测框信息的预测,第n+1帧图像对应的目标预测框信息可以理解为是对第n+2帧图像对应的目标检测框信息的预测。
在本发明实施例中,可以根据预设的目标计数估计网络计算每帧图像的目标计数信息。上述目标计数估计网络可以是基于C-CNN算法或M-CNN算法的目标计数估计网络。目标计数估计网络的计算结果包括目标计数信息,具体的,可以参见图1e,图1e是本发明实施例提供的一种人群计数估计网络的输出示 意图,在图1e中,GT表示人数标准结果,Pred表示目标计数估计网络的目标计数估计结果。
102、根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹。
在本发明实施例中,第n帧图像对应的目标预测框信息可以理解为是对第n+1帧图像对应的目标检测框信息的预测,上述将第n+1帧图像对应的目标检测框信息与第n帧图像对应的目标预测框信息进行匹配的目的,可以理解为检测结果是否与预测结果相同或相近,进而判断是否发生误检。上述第一跟踪轨迹可以通过SORT排序算法将目标检测框信息与目标预测框信息进行匹配连接得到。
进一步的,可以根据目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID;根据每个目标的唯一ID,得到每个目标的第一跟踪轨迹,可以将目标检测框信息与目标预测框信息进行匹配,给匹配上的每个目标检测框设置一个ID,可以对于同一个目标的目标检测框ID,设置为同一个唯一ID。
进一步的,上述目标检测框信息包括目标检测框与目标检测特征,上述目标预测框信息包括目标预测框与目标预测特征。上述的预测框特征可以根据目标预测框在对应帧图像中获取对应的目标图像后进行特征提取得到。可以计算第n+1帧图像各个目标的目标检测框与第n帧图像中各个目标的目标预测框的交并比;计算第n+1帧图像各个目标的目标检测特征与第n帧图像中各个目标的目标预测特征的特征相似度;基于交并比与特征相似度,为每个目标的目标检测框配置一个唯一ID。上述交并比指的是目标检测框与目标预测框的相交面积比上目标检测框与目标预测框的相并面积,其中,相并面积为目标检测框面积加上目标预测框面积再减去目标检测框与目标预测框的相交面积。上述交并比可以理解为运动特征相似度,上述相似度可以理解为外观特征相似度,可以通过下述式子来得到运动特征相似度与外观特征相似度的总相似度:
S=λIOU(D i,T j)+(1-λ)SIM(D i,T j)
其中,D i和T j分别表示第i个目标检测框信息和第j个目标预测框信息,IOU表示目标检测框和目标预测框的交并比IOU,SIM表示目标检测特征和目标预测特征之间的特征相似度,S表示最终的相似度,λ是一个预设的参数,可以根据用户先验进行调整,当用户更相信外观特征相似度时,则参数λ可以设置为较小的值,当用户更相信运动特征相似度时,则参数λ可以设置为较大 的值。通过上述式子,可以得到每个目标检测框信息与每个目标预测框信息之间的相似度S,可以通过一个相似度矩阵IOU_Matrix进行表示,其中,上述相似度矩阵IOU_Matrix=1–S,相似度矩阵IOU_Matrix,每个单元格的值都表示一个目标检测框信息与一个目标预测框信息之间的相似度。当相似度大于预测的相似度阈值时,则可以说明目标检测框信息与目标预测框信息匹配,进一步说明该目标检测框信息与目标预测框信息同属于一个目标,为每个目标对应的目标检测框信息对应分配一个唯一ID,一个目标对应一个唯一ID,则可以通过该唯一ID,当对应的目标检测框信息加入到对应目标的第一跟踪轨迹中,若存在目标检测框信息与目标预测框信息不匹配的情况,可以判断是新增目标、已消失目标或漏检目标。上述新增目标可以理解为,在第n+1帧图像中存在对应的目标检测框信息,在第n帧图像中不存在对应的目标预测框信息;上述已消失目标可以理解为,在第n+1帧图像中存在对应的目标检测框信息,在第n+1帧图像中不存在对应的目标预测框信息;上述漏检目标可以理解为,在第n+1帧图像中不存在对应的目标检测框信息,在第n帧图像中存在对应的目标预测框信息。
103、根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断是否存在第一漏检情况。
在本发明实施例中,上述第一漏检情况包括第一漏检目标点。上述第一漏检目标点可以理解为没有对应目标检测框信息的目标。
可选的,目标计数信息包括目标估计数量,上述第n+1帧图像为当前帧,可以根据第n+1帧图像对应的目标检测框信息,统计第n+1帧图像中目标检测框的数量;判断目标检测框的数量是否小于所述目标估计数量;若目标检测框的数量小于目标估计数量,则确定存在第一漏检情况。举例来说,通过检测分支网络,可以得到第n+1帧图像中m个目标的目标检测框信息,即m个目标检测框,通过目标计数估计网络,可以得到第n+1帧图像中目标计数信息中目标估计数量k,判断m是否小于k,若m小于k,则说明目标检测框数量小于目标估计数量,存在第一漏检情况。
104、若存在第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况。
在本发明实施例中,上述第二漏检情况包括第二漏检目标点。上述第二漏检目标点可以理解为没有对应目标预测框信息的目标。
进一步的,可以判断第一漏检目标点是否位于第一目标预测框内;若第一 漏检目标点位于第一目标预测框内,则确定存在第二漏检情况。上述第一目标预测框指的是第n帧图像中各个目标的目标预测框。当第一漏检目标点位于第一目标预测框内,则说明该第一漏检目标点为在n帧图像中被预测到的目标,却在第n+1帧图像中没有被检测到,属于漏检目标。当第一漏检目标点没有位于任何一个第一目标预测框内,则说明目标为是第n+1帧图像中新出现的漏检目标,第n+1帧图像中新出现的目标在n帧图像中是不会被预测到的。
105、若不存在第二漏检情况,则根据第一漏检目标点,确定第一漏检框信息。
在本发明实施例中,若不存在第二漏检情况,则说明该第一漏检目标点为在n帧图像中被预测到的目标,却在第n+1帧图像中没有被检测到,属于漏检目标。此时,可以将与第一漏检目标点对应的第一目标预测框作为该第一漏检目标点在第n+1帧图像中的第一漏检目标检测框。另外,还可以对第一漏检目标检测框标记第一漏检标识。
106、若存在第二漏检情况,则根据第二漏检目标点,确定第二漏检框信息。
在本发明实施例中,若存在第二漏检情况,则说明目标为是第n+1帧图像中新出现的漏检目标,第n+1帧图像中新出现的目标在n帧图像中是不会被预测到的,属于新增漏检目标。此时,可以根据第n帧图像中的目标预测框的高宽信息,以第二漏检目标点为中心,在第n+1帧图像中为第二漏检目标点配置一个第二漏检目标检测框。另外,还可以对第二漏检目标检测框标记第二漏检标识。上述第一漏检标识与第二漏检标识用于区分第一漏检目标检测框与第二漏检目标检测框。
107、基于第一跟踪轨迹、第一漏检框信息和/或第二漏检框信息,得到目标跟踪轨迹。
在本发明实施例中,上述第一跟踪轨迹包括与目标对应的唯一ID,一个目标对应一个第一跟踪轨迹,一个第一跟踪轨迹对应一个唯一ID。
可以根据第n+1帧图像中对应的第一漏检框信息与第n帧图像中对应的目标检测框信息,为第n+1帧图像中对应的第一漏检框配置一个第一唯一ID,上述第一唯一ID为上述第一跟踪轨迹对应的一个唯一ID;根据所述第一漏检框的唯一ID,将所述第一漏检框加入到具有相同唯一ID的第一跟踪轨迹;这样,可以将第一漏检框信息加入到第一跟踪轨迹,补全漏检的跟踪轨迹,得到目标跟踪轨迹。
为第n+1帧图像中对应的第二漏检框配置一个第二唯一ID,上述第二唯一ID与所有第一跟踪轨迹对应的唯一ID均不同。由于第二漏检框信息为新增漏检目标对应的漏检,因此,相当于一个是一个新的目标,需要对这个新目标分配一个没有被占用的第二唯一ID。
本发明实施例中,提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标点;若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,所述第二漏检情况包括第二漏检目标点;若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。通过目标检测框信息与目标计数信息,可以判断是否存在第一漏检情况,通过目标预测框信息与第一漏检目标点,可以判断是否存在第二漏检情况,进而可以判断漏检的目标为已有目标或是新出现目标,可以有效降低漏检率,提高多目标跟踪的目标检测准确率,进而使得多目标跟踪的准确率得到提高。
需要说明的是,本发明实施例提供的目标跟踪方法可以应用于可以进行目标跟踪的手机、监控器、计算机、服务器等设备。
请参见图2,图2是本发明实施例提供的一种目标跟踪装置的结构示意图,如图2所示,所述装置包括:
提取模块201,用于提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;
计算模块202,用于根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;
第一判断模块203,用于根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标点;
第二判断模块204,用于若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第 二漏检情况,所述第二漏检情况包括第二漏检目标点;
第一确定模块205,用于若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;
第二确定模块206,用于若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;
处理模块207,用于基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。
可选的,如图3所示,所述提取模块201,包括:
第一计算子模块2011,用于根据预设的目标计数估计网络计算每帧图像的目标计数信息;
第二计算子模块2012,用于根据预设的检测与重识别网络计算每帧图像的目标检测框信息,所述目标检测框信息包括目标检测框与目标检测特征;
第二计算子模块2013,用于根据预设的预测网络计算每帧图像的目标预测框信息,所述目标预测框信息包括目标预测框与目标预测特征。
可选的,如图4所示,所述检测与重识别网络包括公共网络、检测分支网络以及重识别分支网络,所述检测分支网络的输入连接于所述公共网络的输出,以及所述重识别分支网络的输入连接于所述公共网络的输出,所述第二计算子模块2012,包括:
第一提取单元20121,用于通过所述公共网络提取每帧图像对于检测分支网络以及重识别分支网络的公共特征;
第二提取单元20122,用于通过所述检测分支网络提取所述公共特征中隐含的目标检测框;
第三提取单元20123,用于通过所述重识别分支网络提取所述公共特征中隐含的目标检测特征。
可选的,如图5所示,所述计算模块202,包括:
第一配置子模块2021,用于根据所述目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID;
第一关联子模块2022,用于根据每个目标的唯一ID,得到每个目标的第一跟踪轨迹。
可选的,如图6所示,所述第一配置子模块2021,包括:
第一计算单元20211,用于计算第n+1帧图像各个目标的目标检测框与第n帧图像中各个目标的目标预测框的交并比;
第二计算单元20212,用于计算第n+1帧图像各个目标的目标检测特征与第n帧图像中各个目标的目标预测特征的特征相似度;
配置单元20213,用于基于所述交并比与所述特征相似度,为每个目标的目标检测框配置一个唯一ID。
可选的,如图7所示,所述目标计数信息包括目标估计数量,所述第一判断模块203,包括:
统计子模块2031,用于根据所述第n+1帧图像对应的目标检测框信息,统计第n+1帧图像中目标检测框的数量;
第一判断子模块2032,用于判断所述目标检测框的数量是否小于所述目标估计数量;
第一确定子模块2033,用于若所述目标检测框的数量小于所述目标估计数量,则确定存在第一漏检情况。
可选的,如图8所示,所述第二判断模块204,包括:
第二判断子模块2041,用于判断所述第一漏检目标点是否位于所述第一目标预测框内;
第二确定子模块2042,用于若第一漏检目标点位于所述第一目标预测框内,则确定存在第二漏检情况。
可选的,所述第一确定模块205还用于将所述第一目标预测框作为所述第一漏检目标点在第n+1帧图像中的第一漏检目标检测框。
可选的,所述第二确定模块206还用于根据第n帧图像中的目标预测框的高宽信息,以所述第二漏检目标点为中心,在第n+1帧图像中为所述第二漏检目标点配置一个第二漏检目标检测框。
可选的,如图9所示,所述处理模块207,包括:
第二配置子模块2071,用于根据所述第n+1帧图像中对应的第一漏检框信息与第n帧图像中对应的目标检测框信息,为所述第n+1帧图像中对应的第一漏检框信息配置一个第一唯一ID,所述第一唯一ID为所述第一跟踪轨迹对应的一个唯一ID;
第二关联子模块2072,用于根据所述第一漏检框信息的唯一ID,将所述第一漏检框信息加入到具有相同唯一ID的第一跟踪轨迹;和/或
第三配置子模块2073,用于为所述第n+1帧图像中对应的第二漏检框信息配置一个第二唯一ID,所述第二唯一ID与所有第一跟踪轨迹对应的唯一ID均不同。
需要说明的是,本发明实施例提供的目标跟踪装置可以应用于可以进行目标跟踪的手机、监控器、计算机、服务器等设备。
本发明实施例提供的目标跟踪装置能够实现上述方法实施例中目标跟踪方法实现的各个过程,且可以达到相同的有益效果。为避免重复,这里不再赘述。
参见图10,图10是本发明实施例提供的一种电子设备的结构示意图,如图10所示,包括:存储器1002、处理器1001及存储在所述存储器1002上并可在所述处理器1001上运行的计算机程序,其中:
处理器1001用于调用存储器1002存储的计算机程序,执行如下步骤:
提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;
根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;
根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标点;
若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,所述第二漏检情况包括第二漏检目标点;
若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;
若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;
基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。
可选的,处理器1001执行的所述提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息,包括:
根据预设的目标计数估计网络计算每帧图像的目标计数信息;
根据预设的检测与重识别网络计算每帧图像的目标检测框信息,所述目标检测框信息包括目标检测框与目标检测特征;
根据预设的预测网络计算每帧图像的目标预测框信息,所述目标预测框信息包括目标预测框与目标预测特征。
可选的,所述检测与重识别网络包括公共网络、检测分支网络以及重识别 分支网络,所述检测分支网络的输入连接于所述公共网络的输出,以及所述重识别分支网络的输入连接于所述公共网络的输出,处理器1001执行的所述根据预设的检测与重识别网络计算每帧图像的目标检测框信息,包括:
通过所述公共网络提取每帧图像对于检测分支网络以及重识别分支网络的公共特征;
通过所述检测分支网络提取所述公共特征中隐含的目标检测框;
通过所述重识别分支网络提取所述公共特征中隐含的目标检测特征。
可选的,处理器1001执行的所述根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹,包括:
根据所述目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID;
根据每个目标的唯一ID,得到每个目标的第一跟踪轨迹。
可选的,处理器1001执行的所述根据所述目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID,包括:
计算第n+1帧图像各个目标的目标检测框与第n帧图像中各个目标的目标预测框的交并比;
计算第n+1帧图像各个目标的目标检测特征与第n帧图像中各个目标的目标预测特征的特征相似度;
基于所述交并比与所述特征相似度,为每个目标的目标检测框配置一个唯一ID。
可选的,所述目标计数信息包括目标估计数量,处理器1001执行的所述根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,包括:
根据所述第n+1帧图像对应的目标检测框信息,统计第n+1帧图像中目标检测框的数量;
判断所述目标检测框的数量是否小于所述目标估计数量;
若所述目标检测框的数量小于所述目标估计数量,则确定存在第一漏检情况。
可选的,处理器1001执行的所述根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,包括:
判断所述第一漏检目标点是否位于所述第一目标预测框内;
若第一漏检目标点位于所述第一目标预测框内,则确定存在第二漏检情 况。
可选的,处理器1001执行的所述第一漏检框信息包括第一漏检目标检测框,所述根据所述第一漏检目标点,确定第一漏检框信息,包括:
将所述第一目标预测框作为所述第一漏检目标点在第n+1帧图像中的第一漏检目标检测框。
可选的,处理器1001执行的所述第二漏检框信息包括第二漏检目标检测框,根据所述第二漏检目标点,确定第二漏检框信息,包括:
根据第n帧图像中的目标预测框的高宽信息,以所述第二漏检目标点为中心,在第n+1帧图像中为所述第二漏检目标点配置一个第二漏检目标检测框。
可选的,处理器1001执行的所述基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹,包括:
根据所述第n+1帧图像中对应的第一漏检框信息与第n帧图像中对应的目标检测框信息,为所述第n+1帧图像中对应的第一漏检框信息配置一个第一唯一ID,所述第一唯一ID为所述第一跟踪轨迹对应的一个唯一ID;
根据所述第一漏检框信息的唯一ID,将所述第一漏检框信息加入到具有相同唯一ID的第一跟踪轨迹;和/或
为所述第n+1帧图像中对应的第二漏检框信息配置一个第二唯一ID,所述第二唯一ID与所有第一跟踪轨迹对应的唯一ID均不同。
需要说明的是,上述电子设备可以是可以应用于可以进行目标跟踪的手机、监控器、计算机、服务器等设备。
本发明实施例提供的电子设备能够实现上述方法实施例中目标跟踪方法实现的各个过程,且可以达到相同的有益效果,为避免重复,这里不再赘述。
本发明实施例还提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例提供的目标跟踪方法的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。

Claims (13)

  1. 一种目标跟踪方法,其特征在于,包括以下步骤:
    提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;
    根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;
    根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标点;
    若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,所述第二漏检情况包括第二漏检目标点;
    若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;
    若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;
    基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。
  2. 如权利要求1所述的方法,其特征在于,所述提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息,包括:
    根据预设的目标计数估计网络计算每帧图像的目标计数信息;
    根据预设的检测与重识别网络计算每帧图像的目标检测框信息,所述目标检测框信息包括目标检测框与目标检测特征;
    根据预设的目标预测网络计算每帧图像的目标预测框信息,所述目标预测框信息包括目标预测框与目标预测特征。
  3. 如权利要求2所述的方法,其特征在于,所述检测与重识别网络包括公共网络、检测分支网络以及重识别分支网络,所述检测分支网络的输入连接于所述公共网络的输出,以及所述重识别分支网络的输入连接于所述公共网络的输出,所述根据预设的检测与重识别网络计算每帧图像的目标检测框信息,包括:
    通过所述公共网络提取每帧图像对于检测分支网络以及重识别分支网络的公共特征;
    通过所述检测分支网络提取所述公共特征中隐含的目标检测框;
    通过所述重识别分支网络提取所述公共特征中隐含的目标检测特征。
  4. 如权利要求3所述的方法,其特征在于,所述根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹,包括:
    根据所述目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID;
    根据每个目标的唯一ID,得到每个目标的第一跟踪轨迹。
  5. 如权利要求4所述的方法,其特征在于,所述根据所述目标检测框信息与目标预测框信息,为每个目标的目标检测框配置一个唯一ID,包括:
    计算第n+1帧图像各个目标的目标检测框与第n帧图像中各个目标的目标预测框的交并比;
    计算第n+1帧图像各个目标的目标检测特征与第n帧图像中各个目标的目标预测特征的特征相似度;
    基于所述交并比与所述特征相似度,为每个目标的目标检测框配置一个唯一ID。
  6. 如权利要求2至5中任一所述的方法,其特征在于,所述目标计数信息包括目标估计数量,所述根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,包括:
    根据所述第n+1帧图像对应的目标检测框信息,统计第n+1帧图像中目标检测框的数量;
    判断所述目标检测框的数量是否小于所述目标估计数量;若所述目标检测框的数量小于所述目标估计数量,则确定存在第一漏检情况。
  7. 如权利要求6所述的方法,其特征在于,所述根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,包括:
    判断所述第一漏检目标点是否位于所述第一目标预测框内;
    若第一漏检目标点位于所述第一目标预测框内,则确定存在第二漏检情况。
  8. 如权利要求7所述的方法,其特征在于,所述第一漏检框信息包括第一漏检目标检测框,所述根据所述第一漏检目标点,确定第一漏检框信息,包括:
    将所述第一目标预测框作为所述第一漏检目标点在第n+1帧图像中的第一漏检目标检测框。
  9. 如权利要求8所述的方法,其特征在于,所述第二漏检框信息包括第二漏检目标检测框,所述根据所述第二漏检目标点,确定第二漏检框信息,包括:
    根据第n帧图像中的目标预测框的高宽信息,以所述第二漏检目标点为中 心,在第n+1帧图像中为所述第二漏检目标点配置一个第二漏检目标检测框。
  10. 如权利要求9所述的方法,其特征在于,所述基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹,包括:
    根据所述第n+1帧图像中对应的第一漏检框信息与第n帧图像中对应的目标检测框信息,为所述第n+1帧图像中对应的第一漏检框信息配置一个第一唯一ID,所述第一唯一ID为所述第一跟踪轨迹对应的一个唯一ID;
    根据所述第一漏检框信息的唯一ID,将所述第一漏检框信息加入到具有相同唯一ID的第一跟踪轨迹;和/或
    为所述第n+1帧图像中对应的第二漏检框信息配置一个第二唯一ID,所述第二唯一ID与所有第一跟踪轨迹对应的唯一ID均不同。
  11. 一种目标跟踪装置,其特征在于,所述装置包括:
    提取模块,用于提取待处理图像序列中每一帧图像的目标计数信息、目标检测框信息与目标预测框信息;
    计算模块,用于根据每一帧图像的目标检测框信息与目标预测框信息,计算待处理图像序列中每个目标的第一跟踪轨迹;
    第一判断模块,用于根据第n+1帧图像对应的目标检测框信息与第n+1帧图像对应的目标计数信息,判断所述是否存在第一漏检情况,所述第一漏检情况包括第一漏检目标点;
    第二判断模块,用于若存在所述第一漏检情况,则根据第n帧图像对应的目标预测框信息与第n+1帧图像对应的第一漏检目标点,判断是否存在第二漏检情况,所述第二漏检情况包括第二漏检目标点;
    第一确定模块,用于若不存在第二漏检情况,则根据所述第一漏检目标点,确定第一漏检框信息;
    第二确定模块,用于若存在第二漏检情况,则根据所述第二漏检目标点,确定第二漏检框信息;
    处理模块,用于基于所述第一跟踪轨迹、所述第一漏检框信息和/或所述第二漏检框信息,得到目标跟踪轨迹。
  12. 一种电子设备,其特征在于,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至10中任一项所述的目标跟踪方法中的步骤。
  13. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至10中任一项所述的目标跟踪方法中的步骤。
PCT/CN2021/114903 2020-12-31 2021-08-27 目标跟踪方法、装置、电子设备及存储介质 WO2022142417A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011639844.2A CN113191180B (zh) 2020-12-31 2020-12-31 目标跟踪方法、装置、电子设备及存储介质
CN202011639844.2 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022142417A1 true WO2022142417A1 (zh) 2022-07-07

Family

ID=76972799

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/114903 WO2022142417A1 (zh) 2020-12-31 2021-08-27 目标跟踪方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113191180B (zh)
WO (1) WO2022142417A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965657A (zh) * 2023-02-28 2023-04-14 安徽蔚来智驾科技有限公司 目标跟踪方法、电子设备、存储介质及车辆
CN116523970A (zh) * 2023-07-05 2023-08-01 之江实验室 基于二次隐式匹配的动态三维目标跟踪方法及装置
CN117151140A (zh) * 2023-10-27 2023-12-01 安徽容知日新科技股份有限公司 目标物标识码的识别方法、装置及计算机可读存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191180B (zh) * 2020-12-31 2023-05-12 深圳云天励飞技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN116563769B (zh) * 2023-07-07 2023-10-20 南昌工程学院 一种视频目标识别追踪方法、系统、计算机及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132728A1 (en) * 2014-11-12 2016-05-12 Nec Laboratories America, Inc. Near Online Multi-Target Tracking with Aggregated Local Flow Descriptor (ALFD)
CN110472594A (zh) * 2019-08-20 2019-11-19 腾讯科技(深圳)有限公司 目标跟踪方法、信息插入方法及设备
CN111179311A (zh) * 2019-12-23 2020-05-19 全球能源互联网研究院有限公司 多目标跟踪方法、装置及电子设备
CN113191180A (zh) * 2020-12-31 2021-07-30 深圳云天励飞技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132728A1 (en) * 2014-11-12 2016-05-12 Nec Laboratories America, Inc. Near Online Multi-Target Tracking with Aggregated Local Flow Descriptor (ALFD)
CN110472594A (zh) * 2019-08-20 2019-11-19 腾讯科技(深圳)有限公司 目标跟踪方法、信息插入方法及设备
CN111179311A (zh) * 2019-12-23 2020-05-19 全球能源互联网研究院有限公司 多目标跟踪方法、装置及电子设备
CN113191180A (zh) * 2020-12-31 2021-07-30 深圳云天励飞技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965657A (zh) * 2023-02-28 2023-04-14 安徽蔚来智驾科技有限公司 目标跟踪方法、电子设备、存储介质及车辆
CN115965657B (zh) * 2023-02-28 2023-06-02 安徽蔚来智驾科技有限公司 目标跟踪方法、电子设备、存储介质及车辆
CN116523970A (zh) * 2023-07-05 2023-08-01 之江实验室 基于二次隐式匹配的动态三维目标跟踪方法及装置
CN116523970B (zh) * 2023-07-05 2023-10-20 之江实验室 基于二次隐式匹配的动态三维目标跟踪方法及装置
CN117151140A (zh) * 2023-10-27 2023-12-01 安徽容知日新科技股份有限公司 目标物标识码的识别方法、装置及计算机可读存储介质
CN117151140B (zh) * 2023-10-27 2024-02-06 安徽容知日新科技股份有限公司 目标物标识码的识别方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
CN113191180A (zh) 2021-07-30
CN113191180B (zh) 2023-05-12

Similar Documents

Publication Publication Date Title
WO2022142417A1 (zh) 目标跟踪方法、装置、电子设备及存储介质
WO2022127180A1 (zh) 目标跟踪方法、装置、电子设备及存储介质
CN109344725B (zh) 一种基于时空关注度机制的多行人在线跟踪方法
CN109035304B (zh) 目标跟踪方法、介质、计算设备和装置
CN110008867B (zh) 一种基于人物异常行为的预警方法、装置及存储介质
WO2019218824A1 (zh) 一种移动轨迹获取方法及其设备、存储介质、终端
US9767570B2 (en) Systems and methods for computer vision background estimation using foreground-aware statistical models
CN109544592B (zh) 针对相机移动的运动目标检测算法
WO2021139049A1 (zh) 检测方法、检测装置、监控设备和计算机可读存储介质
CN112926410A (zh) 目标跟踪方法、装置、存储介质及智能视频系统
CN105631418A (zh) 一种人数统计的方法和装置
CN110688940A (zh) 一种快速的基于人脸检测的人脸追踪方法
CN110992378B (zh) 基于旋翼飞行机器人的动态更新视觉跟踪航拍方法及系统
CN111783524A (zh) 一种场景变换检测方法、装置、存储介质及终端设备
CN111402293A (zh) 面向智能交通的一种车辆跟踪方法及装置
CN107590431B (zh) 一种基于图像识别的数量统计方法及装置
CN110866428A (zh) 目标跟踪方法、装置、电子设备及存储介质
WO2022142416A1 (zh) 目标跟踪方法及相关设备
CN111241943A (zh) 自动驾驶场景下基于背景目标检测与三元组损失的场景识别与回环检测方法
KR20140141239A (ko) 평균이동 알고리즘을 적용한 실시간 객체 추적방법 및 시스템
CN115984780B (zh) 工业固体废物出入库判别方法、装置、电子设备及介质
CN112070035A (zh) 基于视频流的目标跟踪方法、装置及存储介质
CN111476132A (zh) 视频场景识别方法、装置及电子设备、存储介质
CN116523957A (zh) 一种多目标跟踪方法、系统、电子设备及存储介质
Khan et al. Foreground detection using motion histogram threshold algorithm in high-resolution large datasets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913164

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21913164

Country of ref document: EP

Kind code of ref document: A1