WO2020151167A1 - Target tracking method and device, computer device and readable storage medium - Google Patents

Target tracking method and device, computer device and readable storage medium Download PDF

Info

Publication number
WO2020151167A1
WO2020151167A1 PCT/CN2019/091160 CN2019091160W WO2020151167A1 WO 2020151167 A1 WO2020151167 A1 WO 2020151167A1 CN 2019091160 W CN2019091160 W CN 2019091160W WO 2020151167 A1 WO2020151167 A1 WO 2020151167A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
frame
current image
prediction
target frame
Prior art date
Application number
PCT/CN2019/091160
Other languages
French (fr)
Chinese (zh)
Inventor
杨国青
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151167A1 publication Critical patent/WO2020151167A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments

Definitions

  • This application relates to the field of image processing technology, and in particular to a target tracking method, device, computer device and non-volatile readable storage medium.
  • Target tracking refers to tracking moving objects in a video or image sequence (for example, cars and pedestrians in traffic videos) to obtain the position of the moving object in each frame.
  • Target tracking has a wide range of applications in the fields of video surveillance, autonomous driving and video entertainment.
  • the current target tracking mainly adopts the track by detection architecture.
  • the position information of each target is detected by the detector on each frame of the video or image sequence, and then the target position information of the current frame and the target position information of the previous frame are match.
  • the existing schemes are not robust in target tracking, and if the illumination changes, the tracking effect is not good.
  • the first aspect of the present application provides a target tracking method, the method includes:
  • the position of the target is updated in the current image.
  • a second aspect of the present application provides a target tracking device, the device includes:
  • the detection module is configured to use a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
  • the prediction module is used to obtain the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the second target frame in the current image. Describe the prediction frame in the current image;
  • a matching module configured to match a first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame;
  • the update module is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
  • a third aspect of the present application provides a computer device that includes a processor, and the processor is configured to implement the target tracking method when executing computer-readable instructions stored in a memory.
  • a fourth aspect of the present application provides a non-volatile readable storage medium having computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the target tracking method is implemented.
  • This application uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image; obtains the second target frame in the previous frame of the current image, and uses a predictor to predict the The position of the second target frame in the current image is used to obtain the predicted frame of the second target frame in the current image; the first target frame in the current image is matched with the predicted frame to obtain The matching result of the first target frame and the prediction frame; and updating the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
  • This application improves the robustness and scene adaptability of target tracking.
  • Fig. 1 is a flowchart of a target tracking method provided by an embodiment of the present application.
  • Figure 2 is a structural diagram of a target tracking device provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a computer device provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the SiamFC model.
  • the target tracking method of the present application is applied to one or more computer devices.
  • the computer device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC). , Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC application specific integrated circuit
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • FIG. 1 is a flowchart of a target tracking method provided in Embodiment 1 of the present application.
  • the target tracking method is applied to a computer device.
  • the target tracking method of the present application tracks a specific type of moving object (such as a pedestrian) in a video or image sequence, and obtains the position of the moving object in each frame of the image.
  • the target tracking method can solve the shortcoming that the existing solutions cannot track high-speed moving targets, and improve the robustness of target tracking.
  • the target tracking method includes:
  • Step 101 Use a target detector to detect a predetermined type of target in a current image to obtain a first target frame in the current image.
  • the predetermined type of target may include pedestrians, cars, airplanes, ships, and so on.
  • the predetermined type of target may be one type of target (for example, pedestrians) or multiple types of targets (for example, pedestrians and cars).
  • the target detector may be a neural network model with classification and regression functions.
  • the target detector may be a Faster Region-Based Convolutional Neural Network (Faster RCNN) model.
  • the Faster RCNN model includes the Region Proposal Network (RPN) and the Fast Region-based Convolution Neural Network (Fast RCNN).
  • RPN Region Proposal Network
  • Fast RCNN Fast Region-based Convolution Neural Network
  • the region suggestion network and the fast region convolutional neural network have a shared convolutional layer, and the convolutional layer is used to extract a feature map of an image.
  • the region suggestion network generates a candidate frame of the image according to the feature map, and inputs the generated candidate frame into the fast regional convolutional neural network.
  • the fast area convolutional neural network screens and adjusts the candidate frame according to the feature map to obtain the target frame of the image.
  • the target detector Before using the target detector to detect the first target frame of the predetermined type target in the current image, the target detector is trained using the first training sample set.
  • the convolutional layer extracts a feature map of each sample image in the first training sample set
  • the region suggestion network obtains candidate frames in each sample image according to the feature map
  • the fast region convolutional neural The network screens and adjusts the candidate frames according to the feature map to obtain the target frame of each sample image.
  • the target frame may include target frames of different types of targets (for example, pedestrians, cars, airplanes, ships, etc.).
  • the accelerated regional convolutional neural network model adopts a ZF framework, and the regional suggestion network and the fast regional convolutional neural network share 5 convolutional layers.
  • the ZF framework is a commonly used network structure, proposed by Matthew D Zeiler and Rob Fergus in the "Visualizing and Understanding Convolutional Networks" paper in 2013, and it is a variant of the AlexNet network. ZF is fine-tuned based on AlexNet, using ReLU activation function and cross-entropy cost function, and using a smaller convolution kernel to retain more original pixel information.
  • the first training sample set can be used to train the accelerated regional convolutional neural network model according to the following steps:
  • the regional suggestion network selects many candidate boxes, and several candidate boxes with the highest scores can be screened according to the target classification score of the candidate boxes and input to the fast regional convolutional neural network to speed up training and detection.
  • the backpropagation algorithm can be used to train the region suggestion network, and the network parameters of the region suggestion network can be adjusted during the training process to minimize the loss function.
  • the loss function indicates the difference between the prediction confidence of the candidate frame predicted by the region suggestion network and the true confidence.
  • the loss function can include two parts: target classification loss and regression loss.
  • the loss function can be defined as:
  • i is the index of the candidate frame in a training batch (mini-batch).
  • N cls is the size of the training batch, such as 256.
  • p i is the predicted probability of the i-th candidate frame as the target.
  • Is the GT label if the candidate box is positive (that is, the assigned label is a positive label, called a positive candidate box), Is 1; if the candidate box is negative (that is, the assigned label is a negative label, called a negative candidate box), Is 0.
  • is the balance weight, which can be taken as 10.
  • N reg is the number of candidate frames.
  • R is a robust loss function (smoothL1), defined as:
  • the training method of the fast regional convolutional network can refer to the training method of the regional suggestion network, which will not be repeated here.
  • the method of Hard Negative Mining is added to the training of the fast area convolutional network.
  • HNM Hard Negative Mining
  • the target detector may also be other neural network models, such as a regional convolutional neural network (RCNN) model, or a Faster Convolutional Neural Network (RCNN) model.
  • RCNN regional convolutional neural network
  • RCNN Faster Convolutional Neural Network
  • a target detector When a target detector is used to detect a predetermined type of target in an image, the image is input to the target detector, and the target detector detects the predetermined type of target in the image, and outputs the second target of the predetermined type of target in the image.
  • the position of a target frame For example, the target detector outputs 6 first target frames in the image.
  • the first target frame is presented in the form of a rectangular frame.
  • the position of the first target frame may be represented by position coordinates, and the position coordinates may include upper left corner coordinates (x, y) and width and height (w, h).
  • the target detector may also output the type of each first target frame, for example, output 5 first target frames of pedestrian type and 1 first target frame of automobile type.
  • Step 102 Obtain a second target frame in the previous frame of the current image, and use a predictor to predict the position of the second target frame in the current image to obtain the second target frame in the current image.
  • the prediction box in the image Obtain a second target frame in the previous frame of the current image, and use a predictor to predict the position of the second target frame in the current image to obtain the second target frame in the current image.
  • the second target frame in the previous frame is a target frame obtained by using a target detector to detect a predetermined type of target in the previous frame.
  • Predict the position of the second target frame in the current image to obtain the prediction frame of the second target frame in the current image is to predict the position of each second target frame in the current image, and obtain each The prediction frame of the second target frame in the current image. For example, if four pedestrian target frames are detected in the previous frame of the current image, the positions of the four pedestrian target frames in the current image are predicted (that is, the four pedestrian target frames corresponding to the four pedestrian target frames are predicted The position in the current image) to obtain the prediction frames of the four pedestrian target frames in the current image.
  • the predictor may be a deep neural network model.
  • the predictor Before using the predictor to predict the second target frame, the predictor is trained using the second training sample set.
  • the features learned by the predictor are depth features, in which color features account for a relatively small proportion and are limited by illumination. Therefore, the predictor can overcome the impact of illumination to a certain extent, and improve the robustness and scene adaptability of target tracking.
  • the second training sample set may include a large number of sample images of objects with different illumination, deformation, and high-speed motion. Therefore, the predictor can further overcome the influence of illumination, and can overcome the influence of deformation and high-speed movement to a certain extent, so that this application realizes the tracking of high-speed moving targets and improves the robustness of target tracking.
  • a feature pyramid network (Feature Pyramid Network, FPN) can be constructed in the deep neural network model, and the deep neural network model with the feature pyramid network can be used to predict that the second target frame is The position in the current image.
  • the feature pyramid network connects the high-level features of low-resolution, high-semantic information and the low-level features of high-resolution, low-semantic information from top to bottom, so that features at all scales have rich semantic information.
  • the connection method of the feature pyramid network is to upsample the high-level features twice, and then combine them with the corresponding features of the previous layer (the previous layer has to go through a 1*1 convolution kernel), and the combination method is to add between pixels.
  • the feature maps used in each layer of prediction are fused with features of different resolutions and different semantic strengths, and the fused feature maps of different resolutions are used for object detection with corresponding resolutions. This ensures that each layer has appropriate resolution and strong semantic features. Constructing a feature pyramid network in the deep neural network model can improve the performance of predicting the second target frame, so that the deformed second target frame can still be better predicted.
  • the predictor may be a SiamFC network (Fully-Convolutional Siamese Network) model, for example, a SiamFC network model constructed with a feature pyramid network.
  • SiamFC network Fely-Convolutional Siamese Network
  • Figure 4 is a schematic diagram of the SiamFC model.
  • z represents the template image, that is, the second target frame in the previous frame of image
  • x represents the search area, that is, the current image
  • the convolutional layer and pooling layer in CNN can be used; 6*6*128 represents z passing The resulting feature is a 128-channel 6*6 feature.
  • 22*22*128 is x passing The latter feature; * represents the convolution operation, the 22*22*128 feature is convolved by the 6*6*128 convolution kernel, and a 17*17 score map is obtained, which represents the similarity of each position in the search area to the template image degree.
  • the position in the search area with the highest similarity to the template image is the position of the prediction frame.
  • Step 103 Match the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame.
  • the matching result of the first target frame and the prediction frame may include a match between the first target frame and the prediction frame, a mismatch between the first target frame and any prediction frame, and the prediction frame and any prediction frame.
  • the first target frame does not match.
  • the overlap area ratio (Intersection over Union, IOU) of the first target frame and the prediction frame may be calculated, and each pair of matching first target frame and the prediction frame may be determined according to the overlap area ratio.
  • the prediction box may be calculated, and each pair of matching first target frame and the prediction frame may be determined according to the overlap area ratio.
  • the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and a prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4.
  • the prediction frame P1 corresponds to the second target frame B1
  • the prediction frame P2 corresponds to the second target frame B2
  • the prediction frame P3 corresponds to the second target frame B3
  • the prediction frame P4 corresponds to the second target frame B4.
  • a preset threshold for example, 70%
  • the first target frame A2 calculates the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, the first target frame A2 and the prediction frame P4 If the overlap area ratio of the first target frame A2 and the prediction frame P2 is the largest and greater than or equal to the preset threshold (for example, 70%), it is determined that the first target frame A2 matches the prediction frame P2; Target frame A3, calculate the overlap area ratio of the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame P4, if The ratio of the overlapping area of the first target frame A3 and the prediction frame P3 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A3 matches the prediction frame P3; for the first target frame A4, the first target frame A4 is calculated.
  • the preset threshold for example, 70%
  • a preset threshold for example, 70%
  • the distance between the center point of the first target frame and the prediction frame may be calculated, and each pair of matched first target frame and the prediction frame may be determined according to the distance.
  • the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and the prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4.
  • the first target frame A1 calculate the first target frame A1 and the prediction frame P1, the first target frame A1 and the prediction frame P2, the first target frame A1 and the prediction frame P3, the first target frame A1 and the prediction frame The distance between the center point of P4.
  • the distance between the center point of the first target frame A1 and the prediction frame P1 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A1 and the prediction frame P1 are the same match.
  • the first target frame A2 calculate the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, and the first target frame A2 and the prediction frame P3. The distance between the center point of the prediction frame P4.
  • the first target frame A2 and the prediction frame are determined P2 matches; for the first target frame A3, calculate the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame The distance between the center point of P4.
  • the preset distance for example, 10 pixels
  • the distance between the center point of the first target frame A3 and the prediction frame P3 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A3 and the prediction frame P3 are the same Match; for the first target frame A4, calculate the first target frame A4 and the prediction frame P1, the first target frame A4 and the prediction frame P2, the first target frame A4 and the prediction frame P3, the first target frame A4 and the prediction frame P4 The distance of the center point, if the distance between the center point of the first target frame A4 and the prediction frame P4 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A4 matches the prediction frame P4.
  • the preset distance for example, 10 pixels
  • Step 104 According to the matching result of the first target frame and the prediction frame, update the position of the target in the current image.
  • updating the position of the target in the current image may include:
  • the first target frame matches the prediction frame, use the position of the first target frame in the current image as the updated position of the target corresponding to the prediction frame;
  • the first target frame does not match any of the prediction frames, use the position of the first target frame in the current image as the position of the new target;
  • the target corresponding to the prediction frame is taken as the lost target in the current image.
  • the target tracking method of the first embodiment uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image; to obtain the second target frame in the previous frame of the current image, use The predictor predicts the position of the second target frame in the current image to obtain the predicted frame of the second target frame in the current image; compare the first target frame in the current image with the predicted frame The frames are matched to obtain a matching result of the first target frame and the predicted frame; and the position of the target is updated in the current image according to the matching result of the first target frame and the predicted frame.
  • the first embodiment improves the robustness and scene adaptability of target tracking.
  • Fig. 2 is a structural diagram of a target tracking device provided in the second embodiment of the present application.
  • the target tracking device 20 is applied to a computer device.
  • the target tracking of this device tracks specific types of moving objects (such as pedestrians) in a video or image sequence, and obtains the position of the moving object in each frame of the image.
  • the target tracking device 20 can improve the robustness and scene adaptability of target tracking.
  • the target tracking device 20 may include a detection module 201, a prediction module 202, a matching module 203, and an update module 204.
  • the detection module 201 is configured to use a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image.
  • the predetermined type of target may include pedestrians, cars, airplanes, ships, and so on.
  • the predetermined type of target may be one type of target (for example, pedestrians) or multiple types of targets (for example, pedestrians and cars).
  • the target detector may be a neural network model with classification and regression functions.
  • the target detector may be a Faster Region-Based Convolutional Neural Network (Faster RCNN) model.
  • the Faster RCNN model includes the Region Proposal Network (RPN) and the Fast Region-based Convolution Neural Network (Fast RCNN).
  • RPN Region Proposal Network
  • Fast RCNN Fast Region-based Convolution Neural Network
  • the region suggestion network and the fast region convolutional neural network have a shared convolutional layer, and the convolutional layer is used to extract a feature map of an image.
  • the region suggestion network generates a candidate frame of the image according to the feature map, and inputs the generated candidate frame into the fast regional convolutional neural network.
  • the fast area convolutional neural network screens and adjusts the candidate frame according to the feature map to obtain the target frame of the image.
  • the target detector Before using the target detector to detect the first target frame of the predetermined type target in the current image, the target detector is trained using the first training sample set.
  • the convolutional layer extracts a feature map of each sample image in the first training sample set
  • the region suggestion network obtains candidate frames in each sample image according to the feature map
  • the fast region convolutional neural The network screens and adjusts the candidate frames according to the feature map to obtain the target frame of each sample image.
  • the target frame may include target frames of different types of targets (for example, pedestrians, cars, airplanes, ships, etc.).
  • the accelerated regional convolutional neural network model adopts a ZF framework, and the regional suggestion network and the fast regional convolutional neural network share 5 convolutional layers.
  • the first training sample set can be used to train the accelerated regional convolutional neural network model according to the following steps:
  • the regional suggestion network selects many candidate boxes, and several candidate boxes with the highest scores can be screened according to the target classification score of the candidate boxes and input to the fast regional convolutional neural network to speed up training and detection.
  • the backpropagation algorithm can be used to train the region suggestion network, and the network parameters of the region suggestion network can be adjusted during the training process to minimize the loss function.
  • the loss function indicates the difference between the prediction confidence of the candidate frame predicted by the region suggestion network and the true confidence.
  • the loss function can include two parts: target classification loss and regression loss.
  • i is the index of the candidate frame in a training batch (mini-batch).
  • N cls is the size of the training batch, such as 256.
  • p i is the predicted probability of the i-th candidate frame as the target.
  • Is the GT label if the candidate box is positive (that is, the assigned label is a positive label, called a positive candidate box), Is 1; if the candidate box is negative (that is, the assigned label is a negative label, called a negative candidate box), Is 0.
  • is the balance weight, which can be taken as 10.
  • N reg is the number of candidate frames.
  • R is a robust loss function (smoothL1), defined as:
  • the training method of the fast regional convolutional network can refer to the training method of the regional suggestion network, which will not be repeated here.
  • the method of Hard Negative Mining is added to the training of the fast area convolutional network.
  • HNM Hard Negative Mining
  • the target detector may also be other neural network models, such as a regional convolutional neural network (RCNN) model, or a Faster Convolutional Neural Network (RCNN) model.
  • RCNN regional convolutional neural network
  • RCNN Faster Convolutional Neural Network
  • a target detector When a target detector is used to detect a predetermined type of target in an image, the image is input to the target detector, and the target detector detects the predetermined type of target in the image, and outputs the second target of the predetermined type of target in the image.
  • the position of a target frame For example, the target detector outputs 6 first target frames in the image.
  • the first target frame is presented in the form of a rectangular frame.
  • the position of the first target frame may be represented by position coordinates, and the position coordinates may include upper left corner coordinates (x, y) and width and height (w, h).
  • the target detector may also output the type of each first target frame, for example, output 5 first target frames of pedestrian type and 1 first target frame of automobile type.
  • the prediction module 202 is configured to obtain the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the second target frame at The prediction box in the current image.
  • the second target frame in the previous frame is a target frame obtained by using a target detector to detect a predetermined type of target in the previous frame.
  • Predict the position of the second target frame in the current image to obtain the prediction frame of the second target frame in the current image is to predict the position of each second target frame in the current image, and obtain each The prediction frame of the second target frame in the current image. For example, if four pedestrian target frames are detected in the previous frame of the current image, the positions of the four pedestrian target frames in the current image are predicted (that is, the four pedestrian target frames corresponding to the four pedestrian target frames are predicted The position in the current image) to obtain the prediction frames of the four pedestrian target frames in the current image.
  • the predictor may be a deep neural network model.
  • the predictor Before using the predictor to predict the second target frame, the predictor is trained using the second training sample set.
  • the features learned by the predictor are depth features, in which color features account for a relatively small proportion and are limited by illumination. Therefore, the predictor can overcome the impact of illumination to a certain extent, and improve the robustness and scene adaptability of target tracking.
  • the second training sample set may include a large number of sample images of objects with different illumination, deformation, and high-speed motion. Therefore, the predictor can further overcome the influence of illumination, and can overcome the influence of deformation and high-speed movement to a certain extent, so that this application realizes the tracking of high-speed moving targets and improves the robustness of target tracking.
  • a feature pyramid network (Feature Pyramid Network, FPN) can be constructed in the deep neural network model, and the deep neural network model with the feature pyramid network can be used to predict that the second target frame is The position in the current image.
  • the feature pyramid network connects the high-level features of low-resolution, high-semantic information and the low-level features of high-resolution, low-semantic information from top to bottom, so that features at all scales have rich semantic information.
  • the connection method of the feature pyramid network is to upsample the high-level features twice, and then combine them with the corresponding features of the previous layer (the previous layer has to go through a 1*1 convolution kernel), and the combination method is to add between pixels.
  • the feature maps used in each layer of prediction are fused with features of different resolutions and different semantic strengths, and the fused feature maps of different resolutions are used for object detection with corresponding resolutions. This ensures that each layer has appropriate resolution and strong semantic features. Constructing a feature pyramid network in the deep neural network model can improve the performance of predicting the second target frame, so that the deformed second target frame can still be better predicted.
  • the predictor may be a SiamFC network (Fully-Convolutional Siamese Network) model, for example, a SiamFC network model constructed with a feature pyramid network.
  • Figure 4 is a schematic diagram of the SiamFC model.
  • z represents the template image, that is, the second target frame in the previous frame of image
  • x represents the search area, that is, the current image
  • represents a feature mapping operation, which maps the original image to a specific The feature space of, you can use the convolutional layer and pooling layer in CNN
  • 6*6*128 represents the feature obtained after z passes through ⁇ , which is a 128-channel 6*6 feature, the same, 22*22*128 Is the feature of x after ⁇
  • * represents the convolution operation, the feature of 22*22*128 is convolved by the 6*6*128 convolution kernel, and a 17*17 score map is obtained, which represents each position in the search area and The similarity of the template image.
  • the position in the search area with the highest similarity to the template image is the position of the prediction frame.
  • the matching module 203 is configured to match the first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame.
  • the matching result of the first target frame and the prediction frame may include a match between the first target frame and the prediction frame, a mismatch between the first target frame and any prediction frame, and the prediction frame and any prediction frame.
  • the first target frame does not match.
  • the overlap area ratio (Intersection over Union, IOU) of the first target frame and the prediction frame may be calculated, and each pair of matching first target frame and the prediction frame may be determined according to the overlap area ratio.
  • the prediction box may be calculated, and each pair of matching first target frame and the prediction frame may be determined according to the overlap area ratio.
  • the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and a prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4.
  • the prediction frame P1 corresponds to the second target frame B1
  • the prediction frame P2 corresponds to the second target frame B2
  • the prediction frame P3 corresponds to the second target frame B3
  • the prediction frame P4 corresponds to the second target frame B4.
  • a preset threshold for example, 70%
  • the first target frame A2 calculates the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, the first target frame A2 and the prediction frame P4 If the overlap area ratio of the first target frame A2 and the prediction frame P2 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A2 matches the prediction frame P2; Target frame A3, calculate the overlap area ratio of the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame P4, if The ratio of the overlapping area of the first target frame A3 and the prediction frame P3 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A3 matches the prediction frame P3; for the first target frame A4, the first target frame A4 is calculated.
  • a preset threshold for
  • a preset threshold for example, 70%
  • the distance between the center point of the first target frame and the prediction frame may be calculated, and each pair of matched first target frame and the prediction frame may be determined according to the distance.
  • the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and the prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4.
  • the first target frame A1 calculate the first target frame A1 and the prediction frame P1, the first target frame A1 and the prediction frame P2, the first target frame A1 and the prediction frame P3, the first target frame A1 and the prediction frame The distance between the center point of P4.
  • the distance between the center point of the first target frame A1 and the prediction frame P1 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A1 and the prediction frame P1 are the same match.
  • the first target frame A2 calculate the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, and the first target frame A2 and the prediction frame P3. The distance between the center point of the prediction frame P4.
  • the first target frame A2 and the prediction frame are determined P2 matches; for the first target frame A3, calculate the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame The distance between the center point of P4.
  • the preset distance for example, 10 pixels
  • the distance between the center point of the first target frame A3 and the prediction frame P3 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A3 and the prediction frame P3 are the same Match; for the first target frame A4, calculate the first target frame A4 and the prediction frame P1, the first target frame A4 and the prediction frame P2, the first target frame A4 and the prediction frame P3, the first target frame A4 and the prediction frame P4 The distance of the center point, if the distance between the center point of the first target frame A4 and the prediction frame P4 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A4 matches the prediction frame P4.
  • the preset distance for example, 10 pixels
  • the update module 204 is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
  • updating the position of the target in the current image may include:
  • the first target frame matches the prediction frame, use the position of the first target frame in the current image as the updated position of the target corresponding to the prediction frame;
  • the first target frame does not match any of the prediction frames, use the position of the first target frame in the current image as the position of the new target;
  • the target corresponding to the prediction frame is taken as the lost target in the current image.
  • This embodiment provides a target tracking device 20.
  • the target tracking is to track a specific type of moving object (such as a pedestrian) in a video or image sequence to obtain the position of the moving object in each frame of the image.
  • the target tracking device 20 uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image; obtains the second target frame in the previous frame of the current image, and uses prediction The device predicts the position of the second target frame in the current image to obtain the predicted frame of the second target frame in the current image; and compares the first target frame in the current image with the predicted frame Matching is performed to obtain a matching result of the first target frame and the prediction frame; according to the matching result of the first target frame and the prediction frame, the position of the target is updated in the current image.
  • This embodiment improves the robustness and scene adaptability of target tracking.
  • This embodiment provides a non-volatile readable storage medium having computer readable instructions stored on the non-volatile readable storage medium.
  • the computer readable instructions are executed by a processor, the target tracking method in the above embodiment Steps, such as steps 101-104 shown in Figure 1:
  • Step 101 Use a target detector to detect a predetermined type of target in a current image to obtain a first target frame in the current image;
  • Step 102 Obtain a second target frame in the previous frame of the current image, and use a predictor to predict the position of the second target frame in the current image to obtain the second target frame in the current image.
  • Step 103 Match the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
  • Step 104 According to the matching result of the first target frame and the prediction frame, update the position of the target in the current image.
  • the detection module 201 uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
  • the prediction module 202 obtains the second target frame in the previous frame of the current image, uses a predictor to predict the position of the second target frame in the current image, and obtains that the second target frame is in the The prediction box in the current image;
  • the matching module 203 is configured to match the first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame;
  • the update module 204 is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
  • FIG. 3 is a schematic diagram of a computer device provided in Embodiment 4 of this application.
  • the computer device 30 includes a memory 301, a processor 302, and computer-readable instructions 303 stored in the memory 301 and running on the processor 302, such as a target tracking program.
  • the processor 302 executes the computer-readable instruction 303, the steps in the embodiment of the target tracking method are implemented, for example, steps 101-104 shown in FIG.
  • Step 101 Use a target detector to detect a predetermined type of target in a current image to obtain a first target frame in the current image;
  • Step 102 Obtain a second target frame in the previous frame of the current image, and use a predictor to predict the position of the second target frame in the current image to obtain the second target frame in the current image.
  • Step 103 Match the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
  • Step 104 According to the matching result of the first target frame and the prediction frame, update the position of the target in the current image.
  • each module in the foregoing device embodiment is realized, for example, the modules 201-204 in FIG. 2:
  • the detection module 201 uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
  • the prediction module 202 obtains the second target frame in the previous frame of the current image, uses a predictor to predict the position of the second target frame in the current image, and obtains that the second target frame is in the The prediction box in the current image;
  • the matching module 203 is configured to match the first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame;
  • the update module 204 is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
  • the computer-readable instruction 303 may be divided into one or more modules, and the one or more modules are stored in the memory 301 and executed by the processor 302 to complete the method .
  • the computer-readable instruction 303 can be divided into the detection module 201, the prediction module 202, the matching module 203, and the update module 204 in FIG. 2.
  • the specific functions of each module refer to the second embodiment.
  • the computer device 30 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the schematic diagram 3 is only an example of the computer device 30, and does not constitute a limitation on the computer device 30. It may include more or less components than those shown in the figure, or combine certain components, or different components.
  • the computer device 30 may also include input and output devices, network access devices, buses, etc.
  • the so-called processor 302 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor 302 may also be any conventional processor, etc.
  • the processor 302 is the control center of the computer device 30, which connects the entire computer device 30 through various interfaces and lines. Various parts.
  • the memory 301 may be used to store the computer-readable instructions 303, and the processor 302 executes or executes the computer-readable instructions or modules stored in the memory 301 and calls data stored in the memory 301 to implement Various functions of the computer device 30.
  • the memory 302 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.;
  • the data (such as audio data, phone book, etc.) created according to the use of the computer device 30 and the like are stored.
  • the memory 301 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated module of the computer device 30 is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a non-volatile readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions.
  • the computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, the computer-readable instructions, when executed by the processor, can implement the steps of the foregoing method embodiments.
  • the computer-readable medium may include: any entity or device capable of carrying the computer-readable instructions, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), random access memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media any entity or device capable of carrying the computer-readable instructions
  • recording medium U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), random access memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of hardware plus software functional modules.
  • the above-mentioned software function module is stored in a non-volatile readable storage medium, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute each of this application Part of the steps of the method described in the embodiment.
  • a computer device which may be a personal computer, a server, or a network device, etc.
  • a processor processor

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the technical field of image processing, and provided therein are a target tracking method and device, a computer device, and a storage medium. The target tracking method comprises: using a target detector to detect a predetermined type of target in a current image to obtain a first target frame in the current image; acquiring a second target frame in a previous frame image of the current image, and using a predictor to predict the position of the second target frame in the current image so as to obtain a prediction frame of the second target frame in the current image; matching the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame; and according to the matching result of the first target frame and the prediction frame, updating the position of the target in the current image. The present application improves the robustness and scene adaptability of target tracking.

Description

目标跟踪方法、装置、计算机装置及可读存储介质Target tracking method, device, computer device and readable storage medium
本申请要求于2019年01月23日提交中国专利局,申请号为201910064675.5,发明名称为“目标跟踪方法、装置、计算机装置及计算机存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 23, 2019, the application number is 201910064675.5, and the invention title is "target tracking method, device, computer device and computer storage medium", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及图像处理技术领域,具体涉及一种目标跟踪方法、装置、计算机装置及非易失性可读存储介质。This application relates to the field of image processing technology, and in particular to a target tracking method, device, computer device and non-volatile readable storage medium.
背景技术Background technique
目标跟踪是指对视频或图像序列中的运动物体(例如交通视频中的汽车和行人)进行跟踪,得到运动物体在每一帧的位置。目标跟踪在视频监控、自动驾驶和视频娱乐等领域有广泛的应用。Target tracking refers to tracking moving objects in a video or image sequence (for example, cars and pedestrians in traffic videos) to obtain the position of the moving object in each frame. Target tracking has a wide range of applications in the fields of video surveillance, autonomous driving and video entertainment.
目前的目标跟踪主要采用了track by detection架构,在视频或图像序列的每帧图像上通过检测器检测出各个目标的位置信息,然后将当前帧的目标位置信息和前一帧的目标位置信息进行匹配。然而,现有方案目标跟踪的鲁棒性不高,如果光照发生变化,则跟踪效果不佳。The current target tracking mainly adopts the track by detection architecture. The position information of each target is detected by the detector on each frame of the video or image sequence, and then the target position information of the current frame and the target position information of the previous frame are match. However, the existing schemes are not robust in target tracking, and if the illumination changes, the tracking effect is not good.
发明内容Summary of the invention
鉴于以上内容,有必要提出一种目标跟踪方法、装置、计算机装置及非易失性可读存储介质,其可以提高目标跟踪的鲁棒性和场景适应性。In view of the above, it is necessary to propose a target tracking method, device, computer device, and non-volatile readable storage medium, which can improve the robustness and scene adaptability of target tracking.
本申请的第一方面提供一种目标跟踪方法,所述方法包括:The first aspect of the present application provides a target tracking method, the method includes:
利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;Using a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;Acquire the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the position of the second target frame in the current image Prediction box
将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;Matching the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。According to the matching result of the first target frame and the prediction frame, the position of the target is updated in the current image.
本申请的第二方面提供一种目标跟踪装置,所述装置包括:A second aspect of the present application provides a target tracking device, the device includes:
检测模块,用于利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;The detection module is configured to use a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
预测模块,用于获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;The prediction module is used to obtain the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the second target frame in the current image. Describe the prediction frame in the current image;
匹配模块,用于将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;A matching module, configured to match a first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame;
更新模块,用于根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。The update module is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
本申请的第三方面提供一种计算机装置,所述计算机装置包括处理器,所述处理器用于执行存储器中存储的计算机可读指令时实现所述目标跟踪方法。A third aspect of the present application provides a computer device that includes a processor, and the processor is configured to implement the target tracking method when executing computer-readable instructions stored in a memory.
本申请的第四方面提供一种非易失性可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现所述目标跟踪方法。A fourth aspect of the present application provides a non-volatile readable storage medium having computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the target tracking method is implemented.
本申请利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。本申请提高了目标跟踪的鲁棒性和场景适应性。This application uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image; obtains the second target frame in the previous frame of the current image, and uses a predictor to predict the The position of the second target frame in the current image is used to obtain the predicted frame of the second target frame in the current image; the first target frame in the current image is matched with the predicted frame to obtain The matching result of the first target frame and the prediction frame; and updating the position of the target in the current image according to the matching result of the first target frame and the prediction frame. This application improves the robustness and scene adaptability of target tracking.
附图说明Description of the drawings
图1是本申请实施例提供的目标跟踪方法的流程图。Fig. 1 is a flowchart of a target tracking method provided by an embodiment of the present application.
图2是本申请实施例提供的目标跟踪装置的结构图。Figure 2 is a structural diagram of a target tracking device provided by an embodiment of the present application.
图3是本申请实施例提供的计算机装置的示意图。Fig. 3 is a schematic diagram of a computer device provided by an embodiment of the present application.
图4是SiamFC模型的示意图。Figure 4 is a schematic diagram of the SiamFC model.
具体实施方式detailed description
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to be able to understand the above objectives, features and advantages of the application more clearly, the application will be described in detail below with reference to the drawings and specific embodiments. It should be noted that the embodiments of the application and the features in the embodiments can be combined with each other if there is no conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following description, many specific details are set forth in order to fully understand the present application. The described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the specification of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.
优选地,本申请的目标跟踪方法应用在一个或者多个计算机装置中。所述计算机装置是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路 (Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。Preferably, the target tracking method of the present application is applied to one or more computer devices. The computer device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC). , Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded devices, etc.
所述计算机装置可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机装置可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
实施例一Example one
图1是本申请实施例一提供的目标跟踪方法的流程图。所述目标跟踪方法应用于计算机装置。FIG. 1 is a flowchart of a target tracking method provided in Embodiment 1 of the present application. The target tracking method is applied to a computer device.
本申请目标跟踪方法对视频或图像序列中特定类型的运动物体(例如行人)进行跟踪,得到运动物体在每一帧图像中的位置。所述目标跟踪方法可以解决现有方案无法对高速运动的目标进行跟踪的缺点,提高了目标跟踪的鲁棒性。The target tracking method of the present application tracks a specific type of moving object (such as a pedestrian) in a video or image sequence, and obtains the position of the moving object in each frame of the image. The target tracking method can solve the shortcoming that the existing solutions cannot track high-speed moving targets, and improve the robustness of target tracking.
如图1所示,所述目标跟踪方法包括:As shown in Figure 1, the target tracking method includes:
步骤101,利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框。Step 101: Use a target detector to detect a predetermined type of target in a current image to obtain a first target frame in the current image.
所述预定类型目标可以包括行人、汽车、飞机、船只等。所述预定类型目标可以是一种类型的目标(例如行人),也可以是多种类型的目标(例如行人和汽车)。The predetermined type of target may include pedestrians, cars, airplanes, ships, and so on. The predetermined type of target may be one type of target (for example, pedestrians) or multiple types of targets (for example, pedestrians and cars).
所述目标检测器可以是具有分类和回归功能的神经网络模型。在本实施例中,所述目标检测器可以是加快区域卷积神经网络(Faster Region-Based Convolutional Neural Network,Faster RCNN)模型。The target detector may be a neural network model with classification and regression functions. In this embodiment, the target detector may be a Faster Region-Based Convolutional Neural Network (Faster RCNN) model.
Faster RCNN模型包括区域建议网络(Region Proposal Network,RPN)和快速区域卷积神经网络(Fast Region-based Convolution Neural Network,Fast RCNN)。The Faster RCNN model includes the Region Proposal Network (RPN) and the Fast Region-based Convolution Neural Network (Fast RCNN).
所述区域建议网络和所述快速区域卷积神经网络有共享的卷积层,所述卷积层用于提取图像的特征图。所述区域建议网络根据所述特征图生成图像的候选框,并将生成的候选框输入所述快速区域卷积神经网络。所述快速区域卷积神经网络根据所述特征图对所述候选框进行筛选和调整,得到图像的目标框。The region suggestion network and the fast region convolutional neural network have a shared convolutional layer, and the convolutional layer is used to extract a feature map of an image. The region suggestion network generates a candidate frame of the image according to the feature map, and inputs the generated candidate frame into the fast regional convolutional neural network. The fast area convolutional neural network screens and adjusts the candidate frame according to the feature map to obtain the target frame of the image.
在利用目标检测器检测当前图像中的预定类型目标的第一目标框之前,使用第一训练样本集对所述目标检测器进行训练。在训练时,所述卷积层提取第一训练样本集中各个样本图像的特征图,所述区域建议网络根据所述特征图获取所述各个样本图像中的候选框,所述快速区域卷积神经网络根据所述特征图对所述候选框进行筛选和调整,得到所述各个样本图像的目标框。所述目标框可以包括不同类型目标(例如行人、汽车、飞机、船只等)的目标框。Before using the target detector to detect the first target frame of the predetermined type target in the current image, the target detector is trained using the first training sample set. During training, the convolutional layer extracts a feature map of each sample image in the first training sample set, the region suggestion network obtains candidate frames in each sample image according to the feature map, and the fast region convolutional neural The network screens and adjusts the candidate frames according to the feature map to obtain the target frame of each sample image. The target frame may include target frames of different types of targets (for example, pedestrians, cars, airplanes, ships, etc.).
在一较佳实施例中,所述加快区域卷积神经网络模型采用ZF框架,所述区域建议网络和所述快速区域卷积神经网络共享5个卷积层。ZF框架是一 种常用的网络结构,由Matthew D Zeiler和Rob Fergus于2013年在“Visualizing and Understanding Convolutional Networks”论文中提出,属于AlexNet网络的变体。ZF基于AlexNet进行了微调,使用ReLU激活函数和交叉熵代价函数,使用较小的卷积核以保留更多原始像素信息。In a preferred embodiment, the accelerated regional convolutional neural network model adopts a ZF framework, and the regional suggestion network and the fast regional convolutional neural network share 5 convolutional layers. The ZF framework is a commonly used network structure, proposed by Matthew D Zeiler and Rob Fergus in the "Visualizing and Understanding Convolutional Networks" paper in 2013, and it is a variant of the AlexNet network. ZF is fine-tuned based on AlexNet, using ReLU activation function and cross-entropy cost function, and using a smaller convolution kernel to retain more original pixel information.
在一具体实施例中,可以按照以下步骤使用第一训练样本集对加快区域卷积神经网络模型进行训练:In a specific embodiment, the first training sample set can be used to train the accelerated regional convolutional neural network model according to the following steps:
(1)使用Imagenet模型初始化所述区域建议网络,使用所述第一训练样本集训练所述区域建议网络;(1) Use the Imagenet model to initialize the area suggestion network, and use the first training sample set to train the area suggestion network;
(2)使用(1)中训练后的区域建议网络生成第一训练样本集中各个样本图像的候选框,利用所述候选框训练所述快速区域卷积神经网络。此时,区域建议网络和快速区域卷积神经网络还没有共享卷积层;(2) Use the region suggestion network trained in (1) to generate candidate frames of each sample image in the first training sample set, and use the candidate frames to train the fast regional convolutional neural network. At this time, the regional suggestion network and the fast regional convolutional neural network have not shared the convolutional layer;
(3)使用(2)中训练后的快速区域卷积神经网络初始化所述区域建议网络,使用第一训练样本集训练所述区域建议网络;(3) Use the fast regional convolutional neural network trained in (2) to initialize the region suggestion network, and use the first training sample set to train the region suggestion network;
(4)使用(3)中训练后的区域建议网络初始化所述快速区域卷积神经网络,并保持所述卷积层固定,使用第一训练样本集训练所述快速区域卷积神经网络。此时,区域建议网络和快速区域卷积神经网络共享相同的卷积层,构成了一个统一的网络模型。(4) Use the trained region suggestion network in (3) to initialize the fast region convolutional neural network, keep the convolution layer fixed, and train the fast region convolutional neural network using the first training sample set. At this time, the regional proposal network and the fast regional convolutional neural network share the same convolutional layer, forming a unified network model.
区域建议网络选取的候选框较多,可以根据候选框的目标分类得分筛选了若干个得分最高的候选框输入到快速区域卷积神经网络,以加快训练和检测的速度。The regional suggestion network selects many candidate boxes, and several candidate boxes with the highest scores can be screened according to the target classification score of the candidate boxes and input to the fast regional convolutional neural network to speed up training and detection.
可以使用反向传播算法对区域建议网络进行训练,训练过程中调整区域建议网络的网络参数,使损失函数最小化。损失函数指示区域建议网络预测的候选框的预测置信度与真实置信度的差异。损失函数可以包括目标分类损失和回归损失两部分。The backpropagation algorithm can be used to train the region suggestion network, and the network parameters of the region suggestion network can be adjusted during the training process to minimize the loss function. The loss function indicates the difference between the prediction confidence of the candidate frame predicted by the region suggestion network and the true confidence. The loss function can include two parts: target classification loss and regression loss.
损失函数可以定义为:The loss function can be defined as:
Figure PCTCN2019091160-appb-000001
Figure PCTCN2019091160-appb-000001
其中,i为一个训练批量(mini-batch)中候选框的索引。Among them, i is the index of the candidate frame in a training batch (mini-batch).
Figure PCTCN2019091160-appb-000002
是候选框的目标分类损失。N cls为训练批量的大小,例如256。p i是第i个候选框为目标的预测概率。
Figure PCTCN2019091160-appb-000003
是GT标签,若候选框为正(即分配的标签为正标签,称为正候选框),
Figure PCTCN2019091160-appb-000004
为1;若候选框为负(即分配的标签为负标签,称为负候选框),
Figure PCTCN2019091160-appb-000005
为0。
Figure PCTCN2019091160-appb-000006
可以计算为
Figure PCTCN2019091160-appb-000007
Figure PCTCN2019091160-appb-000008
Figure PCTCN2019091160-appb-000002
Is the target classification loss of the candidate box. N cls is the size of the training batch, such as 256. p i is the predicted probability of the i-th candidate frame as the target.
Figure PCTCN2019091160-appb-000003
Is the GT label, if the candidate box is positive (that is, the assigned label is a positive label, called a positive candidate box),
Figure PCTCN2019091160-appb-000004
Is 1; if the candidate box is negative (that is, the assigned label is a negative label, called a negative candidate box),
Figure PCTCN2019091160-appb-000005
Is 0.
Figure PCTCN2019091160-appb-000006
Can be calculated as
Figure PCTCN2019091160-appb-000007
Figure PCTCN2019091160-appb-000008
Figure PCTCN2019091160-appb-000009
是候选框的回归损失。λ为平衡权重,可以取为10。N reg为候选框的数量。
Figure PCTCN2019091160-appb-000010
可以计算为
Figure PCTCN2019091160-appb-000011
t i是一个坐标向量,即t i=(t x,t y,t w,t h),表示候选框的4个参数化坐标(例如候选框左上角的坐标以及宽度、高度)。
Figure PCTCN2019091160-appb-000012
是与正候选框对应的GT边界框的坐标向量,即
Figure PCTCN2019091160-appb-000013
Figure PCTCN2019091160-appb-000014
(例如真实目标框左上角的坐标以及宽度、高度)。R为具有鲁棒性的损失函数(smoothL1),定义为:
Figure PCTCN2019091160-appb-000009
Is the regression loss of the candidate box. λ is the balance weight, which can be taken as 10. N reg is the number of candidate frames.
Figure PCTCN2019091160-appb-000010
Can be calculated as
Figure PCTCN2019091160-appb-000011
t i is a coordinate vector, that is, t i =(t x , t y , t w , t h ), which represents the 4 parameterized coordinates of the candidate box (for example, the coordinates of the upper left corner of the candidate box and the width and height).
Figure PCTCN2019091160-appb-000012
Is the coordinate vector of the GT bounding box corresponding to the positive candidate box, namely
Figure PCTCN2019091160-appb-000013
Figure PCTCN2019091160-appb-000014
(For example, the coordinates, width and height of the upper left corner of the real target box). R is a robust loss function (smoothL1), defined as:
Figure PCTCN2019091160-appb-000015
Figure PCTCN2019091160-appb-000015
快速区域卷积网络的训练方法可以参照区域建议网络的训练方法,此处不再赘述。The training method of the fast regional convolutional network can refer to the training method of the regional suggestion network, which will not be repeated here.
在本实施例中,在快速区域卷积网络的训练中加入负样本难例挖掘(Hard Negative Mining,HNM)方法。对于被快速区域卷积网络错误地分类为正样本的负样本(即难例),将这些负样本的信息记录下来,在下次迭代训练的过程中,将这些负样本再次输入到第一训练样本集中,并且加大其损失的权重,增强其对分类器的影响,这样能够保证不停的针对更难的负样本进行分类,使得分类器学到的特征由易到难,涵盖的样本分布也更具多样性。In this embodiment, the method of Hard Negative Mining (HNM) is added to the training of the fast area convolutional network. For negative samples (ie difficult cases) that are incorrectly classified as positive samples by the fast area convolutional network, record the information of these negative samples, and input these negative samples into the first training sample again during the next iteration of training Concentrate, and increase the weight of its loss, and enhance its impact on the classifier. This can ensure that the more difficult negative samples are continuously classified, so that the features learned by the classifier are from easy to difficult, and the distribution of samples covered is also More diversity.
在其他的实施例中,所述目标检测器还可以是其他的神经网络模型,例如区域卷积神经网络(RCNN)模型、加快卷积神经网络(Faster RCNN)模型。In other embodiments, the target detector may also be other neural network models, such as a regional convolutional neural network (RCNN) model, or a Faster Convolutional Neural Network (RCNN) model.
利用目标检测器检测图像中的预定类型目标时,将所述图像输入所述目标检测器,所述目标检测器对图像中的预定类型目标进行检测,输出所述图像中的预定类型目标的第一目标框的位置。例如,所述目标检测器输出所述图像中的6个第一目标框。第一目标框以矩形框的形式呈现。第一目标框的位置可以用位置坐标表示,所述位置坐标可以包括左上角坐标(x,y)和宽高(w,h)。When a target detector is used to detect a predetermined type of target in an image, the image is input to the target detector, and the target detector detects the predetermined type of target in the image, and outputs the second target of the predetermined type of target in the image. The position of a target frame. For example, the target detector outputs 6 first target frames in the image. The first target frame is presented in the form of a rectangular frame. The position of the first target frame may be represented by position coordinates, and the position coordinates may include upper left corner coordinates (x, y) and width and height (w, h).
所述目标检测器还可以输出每个第一目标框的类型,例如输出5个行人类型的第一目标框和1个汽车类型的第一目标框。The target detector may also output the type of each first target frame, for example, output 5 first target frames of pedestrian type and 1 first target frame of automobile type.
步骤102,获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框。Step 102: Obtain a second target frame in the previous frame of the current image, and use a predictor to predict the position of the second target frame in the current image to obtain the second target frame in the current image. The prediction box in the image.
前一帧图像中的第二目标框是利用目标检测器检测前一帧图像中的预定类型目标得到的目标框。The second target frame in the previous frame is a target frame obtained by using a target detector to detect a predetermined type of target in the previous frame.
预测第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框是预测每个第二目标框在所述当前图像中的位置,得到每个第二目标框在所述当前图像中的预测框。例如,当前图像的前一帧图像中检测到4个行人目标框,则预测所述4个行人目标框在所述当前图像中的位置(也就是预测4个行人目标框对应的4个行人在所述当前图像中的位置),得到所述4个行人目标框在所述当前图像中的预测框。Predict the position of the second target frame in the current image to obtain the prediction frame of the second target frame in the current image is to predict the position of each second target frame in the current image, and obtain each The prediction frame of the second target frame in the current image. For example, if four pedestrian target frames are detected in the previous frame of the current image, the positions of the four pedestrian target frames in the current image are predicted (that is, the four pedestrian target frames corresponding to the four pedestrian target frames are predicted The position in the current image) to obtain the prediction frames of the four pedestrian target frames in the current image.
所述预测器可以是深度神经网络模型。The predictor may be a deep neural network model.
在利用预测器对第二目标框进行预测之前,使用第二训练样本集对所述预测器进行训练。所述预测器学习到的特征为深度特征,其中的颜色特征占比较小,受光照的影响有限。因此,所述预测器在一定程度上可以克服光照带来的影响,提高了目标跟踪的鲁棒性和场景适应性。在本实施例中,第二训练样本集可以包括大量的不同光照、形变和高速运动物体的样本图像。因 此,所述预测器可以进一步克服光照的影响,并且可以在一定程度克服形变、高速运动带来的影响,从而本申请实现了对高速运动的目标进行跟踪,提高了目标跟踪的鲁棒性。Before using the predictor to predict the second target frame, the predictor is trained using the second training sample set. The features learned by the predictor are depth features, in which color features account for a relatively small proportion and are limited by illumination. Therefore, the predictor can overcome the impact of illumination to a certain extent, and improve the robustness and scene adaptability of target tracking. In this embodiment, the second training sample set may include a large number of sample images of objects with different illumination, deformation, and high-speed motion. Therefore, the predictor can further overcome the influence of illumination, and can overcome the influence of deformation and high-speed movement to a certain extent, so that this application realizes the tracking of high-speed moving targets and improves the robustness of target tracking.
在本实施例中,可以在所述深度神经网络模型中构建特征金字塔网络(Feature Pyramid Network,FPN),利用构建有所述特征金字塔网络的所述深度神经网络模型预测所述第二目标框在所述当前图像中的位置。特征金字塔网络把低分辨率、高语义信息的高层特征和高分辨率、低语义信息的低层特征进行自上而下的侧边连接,使得所有尺度下的特征都有丰富的语义信息。特征金字塔网络的连接方法是把高层特征做2倍上采样,然后和对应的前一层特征结合(前一层要经过1*1的卷积核),结合方式就是做像素间的加法。通过这样的连接,每一层预测所用的特征图都融合了不同分辨率、不同语义强度的特征,融合的不同分辨率的特征图分别做对应分辨率大小的物体检测。这样保证了每一层都有合适的分辨率以及强语义特征。在所述深度神经网络模型中构建特征金字塔网络可以提高对第二目标框预测的性能,使得对发生形变的第二目标框仍然得到较好的预测。In this embodiment, a feature pyramid network (Feature Pyramid Network, FPN) can be constructed in the deep neural network model, and the deep neural network model with the feature pyramid network can be used to predict that the second target frame is The position in the current image. The feature pyramid network connects the high-level features of low-resolution, high-semantic information and the low-level features of high-resolution, low-semantic information from top to bottom, so that features at all scales have rich semantic information. The connection method of the feature pyramid network is to upsample the high-level features twice, and then combine them with the corresponding features of the previous layer (the previous layer has to go through a 1*1 convolution kernel), and the combination method is to add between pixels. Through this connection, the feature maps used in each layer of prediction are fused with features of different resolutions and different semantic strengths, and the fused feature maps of different resolutions are used for object detection with corresponding resolutions. This ensures that each layer has appropriate resolution and strong semantic features. Constructing a feature pyramid network in the deep neural network model can improve the performance of predicting the second target frame, so that the deformed second target frame can still be better predicted.
在一具体实施例中,所述预测器可以是SiamFC网络(Fully-Convolutional Siamese Network)模型,例如构建有特征金字塔网络的SiamFC网络模型。In a specific embodiment, the predictor may be a SiamFC network (Fully-Convolutional Siamese Network) model, for example, a SiamFC network model constructed with a feature pyramid network.
图4是SiamFC模型的示意图。图4中,z代表的是模板图像,即前一帧图像中的第二目标框;x代表的是搜索区域,即当前图像;
Figure PCTCN2019091160-appb-000016
代表的是一种特征映射操作,将原始图像映射到特定的特征空间,可以采用CNN中的卷积层和池化层;6*6*128代表z经过
Figure PCTCN2019091160-appb-000017
后得到的特征,是一个128通道6*6大小的特征,同理,22*22*128是x经过
Figure PCTCN2019091160-appb-000018
后的特征;*代表卷积操作,22*22*128的特征被6*6*128的卷积核卷积,得到一个17*17的得分图,代表搜索区域中各个位置与模板图像的相似度。搜索区域中与模板图像的相似度相似度最高的位置就是预测框的位置。
Figure 4 is a schematic diagram of the SiamFC model. In Figure 4, z represents the template image, that is, the second target frame in the previous frame of image; x represents the search area, that is, the current image;
Figure PCTCN2019091160-appb-000016
Represents a feature mapping operation, which maps the original image to a specific feature space. The convolutional layer and pooling layer in CNN can be used; 6*6*128 represents z passing
Figure PCTCN2019091160-appb-000017
The resulting feature is a 128-channel 6*6 feature. Similarly, 22*22*128 is x passing
Figure PCTCN2019091160-appb-000018
The latter feature; * represents the convolution operation, the 22*22*128 feature is convolved by the 6*6*128 convolution kernel, and a 17*17 score map is obtained, which represents the similarity of each position in the search area to the template image degree. The position in the search area with the highest similarity to the template image is the position of the prediction frame.
步骤103,将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果。Step 103: Match the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame.
所述第一目标框与所述预测框的匹配结果可以包括所述第一目标框与所述预测框匹配、所述第一目标框与任意所述预测框不匹配、所述预测框与任意所述第一目标框不匹配。The matching result of the first target frame and the prediction frame may include a match between the first target frame and the prediction frame, a mismatch between the first target frame and any prediction frame, and the prediction frame and any prediction frame. The first target frame does not match.
在本实施例中,可以计算所述第一目标框与所述预测框的重叠面积比例(Intersection over Union,IOU),根据所述重叠面积比例确定每一对匹配的所述第一目标框与所述预测框。In this embodiment, the overlap area ratio (Intersection over Union, IOU) of the first target frame and the prediction frame may be calculated, and each pair of matching first target frame and the prediction frame may be determined according to the overlap area ratio. The prediction box.
例如,第一目标框包括第一目标框A1、第一目标框A2、第一目标框A3、第一目标框A4,预测框包括预测框P1、预测框P2、预测框P3、预测框P4。预测框P1对应第二目标框B1、预测框P2对应第二目标框B2、预测框P3对应第二目标框B3、预测框P4对应第二目标框B4。对于第一目标框A1,计算第一目标框A1与预测框P1、第一目标框A1与预测框P2、第一目标框A1与预测框P3、第一目标框A1与预测框P4的重叠面积比例,若第一目标框A1与预测框P1的重叠面积比例最大且大于或等于预设阈值(例如70%), 则确定第一目标框A1与预测框P1相匹配。类似地,对于第一目标框A2,计算第一目标框A2与预测框P1、第一目标框A2与预测框P2、第一目标框A2与预测框P3、第一目标框A2与预测框P4的重叠面积比例,若第一目标框A2与预测框P2的重叠面积比例最大且大于或等于预设阈值(例如70%),则确定第一目标框A2与预测框P2相匹配;对于第一目标框A3,计算第一目标框A3与预测框P1、第一目标框A3与预测框P2、第一目标框A3与预测框P3、第一目标框A3与预测框P4的重叠面积比例,若第一目标框A3与预测框P3的重叠面积比例最大且大于或等于预设阈值(例如70%),则确定第一目标框A3与预测框P3相匹配;对于第一目标框A4,计算第一目标框A4与预测框P1、第一目标框A4与预测框P2、第一目标框A4与预测框P3、第一目标框A4与预测框P4的重叠面积比例,若第一目标框A4与预测框P4的重叠面积比例最大且大于或等于预设阈值(例如70%),则确定第一目标框A4与预测框P4相匹配。For example, the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and a prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4. The prediction frame P1 corresponds to the second target frame B1, the prediction frame P2 corresponds to the second target frame B2, the prediction frame P3 corresponds to the second target frame B3, and the prediction frame P4 corresponds to the second target frame B4. For the first target frame A1, calculate the overlap area of the first target frame A1 and the prediction frame P1, the first target frame A1 and the prediction frame P2, the first target frame A1 and the prediction frame P3, and the first target frame A1 and the prediction frame P4 If the ratio of the overlapping area of the first target frame A1 and the prediction frame P1 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A1 matches the prediction frame P1. Similarly, for the first target frame A2, calculate the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, the first target frame A2 and the prediction frame P4 If the overlap area ratio of the first target frame A2 and the prediction frame P2 is the largest and greater than or equal to the preset threshold (for example, 70%), it is determined that the first target frame A2 matches the prediction frame P2; Target frame A3, calculate the overlap area ratio of the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame P4, if The ratio of the overlapping area of the first target frame A3 and the prediction frame P3 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A3 matches the prediction frame P3; for the first target frame A4, the first target frame A4 is calculated. A target frame A4 and a prediction frame P1, a first target frame A4 and a prediction frame P2, a first target frame A4 and a prediction frame P3, a first target frame A4 and a prediction frame P4 overlap area ratio, if the first target frame A4 and If the overlap area ratio of the prediction frame P4 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A4 matches the prediction frame P4.
或者,可以计算所述第一目标框与所述预测框的中心点的距离,根据所述距离确定每一对匹配的所述第一目标框与所述预测框。Alternatively, the distance between the center point of the first target frame and the prediction frame may be calculated, and each pair of matched first target frame and the prediction frame may be determined according to the distance.
例如,在第一目标框包括第一目标框A1、第一目标框A2、第一目标框A3、第一目标框A4,预测框包括预测框P1、预测框P2、预测框P3、预测框P4的例子中,对于第一目标框A1,计算第一目标框A1与预测框P1、第一目标框A1与预测框P2、第一目标框A1与预测框P3、第一目标框A1与预测框P4的中心点的距离,若第一目标框A1与预测框P1的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A1与预测框P1相匹配。类似地,类似地,对于第一目标框A2,计算第一目标框A2与预测框P1、第一目标框A2与预测框P2、第一目标框A2与预测框P3、第一目标框A2与预测框P4的中心点的距离,若第一目标框A2与预测框P2的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A2与预测框P2相匹配;对于第一目标框A3,计算第一目标框A3与预测框P1、第一目标框A3与预测框P2、第一目标框A3与预测框P3、第一目标框A3与预测框P4的中心点的距离,若第一目标框A3与预测框P3的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A3与预测框P3相匹配;对于第一目标框A4,计算第一目标框A4与预测框P1、第一目标框A4与预测框P2、第一目标框A4与预测框P3、第一目标框A4与预测框P4的中心点的距离,若第一目标框A4与预测框P4的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A4与预测框P4相匹配。For example, the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and the prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4. In the example, for the first target frame A1, calculate the first target frame A1 and the prediction frame P1, the first target frame A1 and the prediction frame P2, the first target frame A1 and the prediction frame P3, the first target frame A1 and the prediction frame The distance between the center point of P4. If the distance between the center point of the first target frame A1 and the prediction frame P1 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A1 and the prediction frame P1 are the same match. Similarly, similarly, for the first target frame A2, calculate the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, and the first target frame A2 and the prediction frame P3. The distance between the center point of the prediction frame P4. If the distance between the first target frame A2 and the center point of the prediction frame P2 is the smallest and less than or equal to the preset distance (for example, 10 pixels), the first target frame A2 and the prediction frame are determined P2 matches; for the first target frame A3, calculate the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame The distance between the center point of P4. If the distance between the center point of the first target frame A3 and the prediction frame P3 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A3 and the prediction frame P3 are the same Match; for the first target frame A4, calculate the first target frame A4 and the prediction frame P1, the first target frame A4 and the prediction frame P2, the first target frame A4 and the prediction frame P3, the first target frame A4 and the prediction frame P4 The distance of the center point, if the distance between the center point of the first target frame A4 and the prediction frame P4 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A4 matches the prediction frame P4.
步骤104,根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。Step 104: According to the matching result of the first target frame and the prediction frame, update the position of the target in the current image.
根据得到所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置可以包括:According to the obtained matching result of the first target frame and the prediction frame, updating the position of the target in the current image may include:
若所述第一目标框与所述预测框匹配,则在所述当前图像中将所述第一目标框的位置作为所述预测框对应的目标更新后的位置;If the first target frame matches the prediction frame, use the position of the first target frame in the current image as the updated position of the target corresponding to the prediction frame;
若所述第一目标框与任意所述预测框不匹配,则在所述当前图像中将所述第一目标框的位置作为新的目标的位置;If the first target frame does not match any of the prediction frames, use the position of the first target frame in the current image as the position of the new target;
若所述预测框与任意所述第一目标框不匹配,则在所述当前图像中将所述预测框对应的目标作为丢失的目标。If the prediction frame does not match any of the first target frames, the target corresponding to the prediction frame is taken as the lost target in the current image.
实施例一的目标跟踪方法利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。实施例一提高了目标跟踪的鲁棒性和场景适应性。The target tracking method of the first embodiment uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image; to obtain the second target frame in the previous frame of the current image, use The predictor predicts the position of the second target frame in the current image to obtain the predicted frame of the second target frame in the current image; compare the first target frame in the current image with the predicted frame The frames are matched to obtain a matching result of the first target frame and the predicted frame; and the position of the target is updated in the current image according to the matching result of the first target frame and the predicted frame. The first embodiment improves the robustness and scene adaptability of target tracking.
实施例二Example two
图2是本申请实施例二提供的目标跟踪装置的结构图。所述目标跟踪装置20应用于计算机装置。本装置的目标跟踪对视频或图像序列中特定类型的运动物体(例如行人)进行跟踪,得到运动物体在每一帧图像中的位置。所述目标跟踪装置20可以提高目标跟踪的鲁棒性和场景适应性。如图2所示,所述目标跟踪装置20可以包括检测模块201、预测模块202、匹配模块203、更新模块204。Fig. 2 is a structural diagram of a target tracking device provided in the second embodiment of the present application. The target tracking device 20 is applied to a computer device. The target tracking of this device tracks specific types of moving objects (such as pedestrians) in a video or image sequence, and obtains the position of the moving object in each frame of the image. The target tracking device 20 can improve the robustness and scene adaptability of target tracking. As shown in FIG. 2, the target tracking device 20 may include a detection module 201, a prediction module 202, a matching module 203, and an update module 204.
检测模块201,用于利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框。The detection module 201 is configured to use a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image.
所述预定类型目标可以包括行人、汽车、飞机、船只等。所述预定类型目标可以是一种类型的目标(例如行人),也可以是多种类型的目标(例如行人和汽车)。The predetermined type of target may include pedestrians, cars, airplanes, ships, and so on. The predetermined type of target may be one type of target (for example, pedestrians) or multiple types of targets (for example, pedestrians and cars).
所述目标检测器可以是具有分类和回归功能的神经网络模型。在本实施例中,所述目标检测器可以是加快区域卷积神经网络(Faster Region-Based Convolutional Neural Network,Faster RCNN)模型。The target detector may be a neural network model with classification and regression functions. In this embodiment, the target detector may be a Faster Region-Based Convolutional Neural Network (Faster RCNN) model.
Faster RCNN模型包括区域建议网络(Region Proposal Network,RPN)和快速区域卷积神经网络(Fast Region-based Convolution Neural Network,Fast RCNN)。The Faster RCNN model includes the Region Proposal Network (RPN) and the Fast Region-based Convolution Neural Network (Fast RCNN).
所述区域建议网络和所述快速区域卷积神经网络有共享的卷积层,所述卷积层用于提取图像的特征图。所述区域建议网络根据所述特征图生成图像的候选框,并将生成的候选框输入所述快速区域卷积神经网络。所述快速区域卷积神经网络根据所述特征图对所述候选框进行筛选和调整,得到图像的目标框。The region suggestion network and the fast region convolutional neural network have a shared convolutional layer, and the convolutional layer is used to extract a feature map of an image. The region suggestion network generates a candidate frame of the image according to the feature map, and inputs the generated candidate frame into the fast regional convolutional neural network. The fast area convolutional neural network screens and adjusts the candidate frame according to the feature map to obtain the target frame of the image.
在利用目标检测器检测当前图像中的预定类型目标的第一目标框之前,使用第一训练样本集对所述目标检测器进行训练。在训练时,所述卷积层提取第一训练样本集中各个样本图像的特征图,所述区域建议网络根据所述特征图获取所述各个样本图像中的候选框,所述快速区域卷积神经网络根据所述特征图对所述候选框进行筛选和调整,得到所述各个样本图像的目标框。 所述目标框可以包括不同类型目标(例如行人、汽车、飞机、船只等)的目标框。Before using the target detector to detect the first target frame of the predetermined type target in the current image, the target detector is trained using the first training sample set. During training, the convolutional layer extracts a feature map of each sample image in the first training sample set, the region suggestion network obtains candidate frames in each sample image according to the feature map, and the fast region convolutional neural The network screens and adjusts the candidate frames according to the feature map to obtain the target frame of each sample image. The target frame may include target frames of different types of targets (for example, pedestrians, cars, airplanes, ships, etc.).
在一较佳实施例中,所述加快区域卷积神经网络模型采用ZF框架,所述区域建议网络和所述快速区域卷积神经网络共享5个卷积层。In a preferred embodiment, the accelerated regional convolutional neural network model adopts a ZF framework, and the regional suggestion network and the fast regional convolutional neural network share 5 convolutional layers.
在一具体实施例中,可以按照以下步骤使用第一训练样本集对加快区域卷积神经网络模型进行训练:In a specific embodiment, the first training sample set can be used to train the accelerated regional convolutional neural network model according to the following steps:
(1)使用Imagenet模型初始化所述区域建议网络,使用所述第一训练样本集训练所述区域建议网络;(1) Use the Imagenet model to initialize the area suggestion network, and use the first training sample set to train the area suggestion network;
(2)使用(1)中训练后的区域建议网络生成第一训练样本集中各个样本图像的候选框,利用所述候选框训练所述快速区域卷积神经网络。此时,区域建议网络和快速区域卷积神经网络还没有共享卷积层;(2) Use the region suggestion network trained in (1) to generate candidate frames of each sample image in the first training sample set, and use the candidate frames to train the fast regional convolutional neural network. At this time, the regional suggestion network and the fast regional convolutional neural network have not shared the convolutional layer;
(3)使用(2)中训练后的快速区域卷积神经网络初始化所述区域建议网络,使用第一训练样本集训练所述区域建议网络;(3) Use the fast regional convolutional neural network trained in (2) to initialize the region suggestion network, and use the first training sample set to train the region suggestion network;
(4)使用(3)中训练后的区域建议网络初始化所述快速区域卷积神经网络,并保持所述卷积层固定,使用第一训练样本集训练所述快速区域卷积神经网络。此时,区域建议网络和快速区域卷积神经网络共享相同的卷积层,构成了一个统一的网络模型。(4) Use the trained region suggestion network in (3) to initialize the fast region convolutional neural network, keep the convolution layer fixed, and train the fast region convolutional neural network using the first training sample set. At this time, the regional proposal network and the fast regional convolutional neural network share the same convolutional layer, forming a unified network model.
区域建议网络选取的候选框较多,可以根据候选框的目标分类得分筛选了若干个得分最高的候选框输入到快速区域卷积神经网络,以加快训练和检测的速度。The regional suggestion network selects many candidate boxes, and several candidate boxes with the highest scores can be screened according to the target classification score of the candidate boxes and input to the fast regional convolutional neural network to speed up training and detection.
可以使用反向传播算法对区域建议网络进行训练,训练过程中调整区域建议网络的网络参数,使损失函数最小化。损失函数指示区域建议网络预测的候选框的预测置信度与真实置信度的差异。损失函数可以包括目标分类损失和回归损失两部分。The backpropagation algorithm can be used to train the region suggestion network, and the network parameters of the region suggestion network can be adjusted during the training process to minimize the loss function. The loss function indicates the difference between the prediction confidence of the candidate frame predicted by the region suggestion network and the true confidence. The loss function can include two parts: target classification loss and regression loss.
Figure PCTCN2019091160-appb-000019
Figure PCTCN2019091160-appb-000019
其中,i为一个训练批量(mini-batch)中候选框的索引。Among them, i is the index of the candidate frame in a training batch (mini-batch).
Figure PCTCN2019091160-appb-000020
是候选框的目标分类损失。N cls为训练批量的大小,例如256。p i是第i个候选框为目标的预测概率。
Figure PCTCN2019091160-appb-000021
是GT标签,若候选框为正(即分配的标签为正标签,称为正候选框),
Figure PCTCN2019091160-appb-000022
为1;若候选框为负(即分配的标签为负标签,称为负候选框),
Figure PCTCN2019091160-appb-000023
为0。
Figure PCTCN2019091160-appb-000024
可以计算为
Figure PCTCN2019091160-appb-000025
Figure PCTCN2019091160-appb-000026
Figure PCTCN2019091160-appb-000020
Is the target classification loss of the candidate box. N cls is the size of the training batch, such as 256. p i is the predicted probability of the i-th candidate frame as the target.
Figure PCTCN2019091160-appb-000021
Is the GT label, if the candidate box is positive (that is, the assigned label is a positive label, called a positive candidate box),
Figure PCTCN2019091160-appb-000022
Is 1; if the candidate box is negative (that is, the assigned label is a negative label, called a negative candidate box),
Figure PCTCN2019091160-appb-000023
Is 0.
Figure PCTCN2019091160-appb-000024
Can be calculated as
Figure PCTCN2019091160-appb-000025
Figure PCTCN2019091160-appb-000026
Figure PCTCN2019091160-appb-000027
是候选框的回归损失。λ为平衡权重,可以取为10。N reg为候选框的数量。
Figure PCTCN2019091160-appb-000028
可以计算为
Figure PCTCN2019091160-appb-000029
t i是一个坐标向量,即t i=(t x,t y,t w,t h),表示候选框的4个参数化坐标(例如候选框左上角的坐标以及宽度、高度)。
Figure PCTCN2019091160-appb-000030
是与正候选框对应的GT边界框的坐标向量,即
Figure PCTCN2019091160-appb-000031
Figure PCTCN2019091160-appb-000032
(例如真实目标框左上角的坐标以及宽度、高度)。R为具有鲁棒性的损失函数(smoothL1),定义为:
Figure PCTCN2019091160-appb-000027
Is the regression loss of the candidate box. λ is the balance weight, which can be taken as 10. N reg is the number of candidate frames.
Figure PCTCN2019091160-appb-000028
Can be calculated as
Figure PCTCN2019091160-appb-000029
t i is a coordinate vector, that is, t i =(t x , t y , t w , t h ), which represents the 4 parameterized coordinates of the candidate box (for example, the coordinates of the upper left corner of the candidate box and the width and height).
Figure PCTCN2019091160-appb-000030
Is the coordinate vector of the GT bounding box corresponding to the positive candidate box, namely
Figure PCTCN2019091160-appb-000031
Figure PCTCN2019091160-appb-000032
(For example, the coordinates, width and height of the upper left corner of the real target box). R is a robust loss function (smoothL1), defined as:
Figure PCTCN2019091160-appb-000033
Figure PCTCN2019091160-appb-000033
快速区域卷积网络的训练方法可以参照区域建议网络的训练方法,此处不再赘述。The training method of the fast regional convolutional network can refer to the training method of the regional suggestion network, which will not be repeated here.
在本实施例中,在快速区域卷积网络的训练中加入负样本难例挖掘(Hard Negative Mining,HNM)方法。对于被快速区域卷积网络错误地分类为正样本的负样本(即难例),将这些负样本的信息记录下来,在下次迭代训练的过程中,将这些负样本再次输入到第一训练样本集中,并且加大其损失的权重,增强其对分类器的影响,这样能够保证不停的针对更难的负样本进行分类,使得分类器学到的特征由易到难,涵盖的样本分布也更具多样性。In this embodiment, the method of Hard Negative Mining (HNM) is added to the training of the fast area convolutional network. For negative samples (ie difficult cases) that are incorrectly classified as positive samples by the fast area convolutional network, record the information of these negative samples, and input these negative samples into the first training sample again during the next iteration of training Concentrate, and increase the weight of its loss, and enhance its impact on the classifier. This can ensure that the more difficult negative samples are continuously classified, so that the features learned by the classifier are from easy to difficult, and the distribution of samples covered is also More diversity.
在其他的实施例中,所述目标检测器还可以是其他的神经网络模型,例如区域卷积神经网络(RCNN)模型、加快卷积神经网络(Faster RCNN)模型。In other embodiments, the target detector may also be other neural network models, such as a regional convolutional neural network (RCNN) model, or a Faster Convolutional Neural Network (RCNN) model.
利用目标检测器检测图像中的预定类型目标时,将所述图像输入所述目标检测器,所述目标检测器对图像中的预定类型目标进行检测,输出所述图像中的预定类型目标的第一目标框的位置。例如,所述目标检测器输出所述图像中的6个第一目标框。第一目标框以矩形框的形式呈现。第一目标框的位置可以用位置坐标表示,所述位置坐标可以包括左上角坐标(x,y)和宽高(w,h)。When a target detector is used to detect a predetermined type of target in an image, the image is input to the target detector, and the target detector detects the predetermined type of target in the image, and outputs the second target of the predetermined type of target in the image. The position of a target frame. For example, the target detector outputs 6 first target frames in the image. The first target frame is presented in the form of a rectangular frame. The position of the first target frame may be represented by position coordinates, and the position coordinates may include upper left corner coordinates (x, y) and width and height (w, h).
所述目标检测器还可以输出每个第一目标框的类型,例如输出5个行人类型的第一目标框和1个汽车类型的第一目标框。The target detector may also output the type of each first target frame, for example, output 5 first target frames of pedestrian type and 1 first target frame of automobile type.
预测模块202,用于获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框。The prediction module 202 is configured to obtain the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the second target frame at The prediction box in the current image.
前一帧图像中的第二目标框是利用目标检测器检测前一帧图像中的预定类型目标得到的目标框。The second target frame in the previous frame is a target frame obtained by using a target detector to detect a predetermined type of target in the previous frame.
预测第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框是预测每个第二目标框在所述当前图像中的位置,得到每个第二目标框在所述当前图像中的预测框。例如,当前图像的前一帧图像中检测到4个行人目标框,则预测所述4个行人目标框在所述当前图像中的位置(也就是预测4个行人目标框对应的4个行人在所述当前图像中的位置),得到所述4个行人目标框在所述当前图像中的预测框。Predict the position of the second target frame in the current image to obtain the prediction frame of the second target frame in the current image is to predict the position of each second target frame in the current image, and obtain each The prediction frame of the second target frame in the current image. For example, if four pedestrian target frames are detected in the previous frame of the current image, the positions of the four pedestrian target frames in the current image are predicted (that is, the four pedestrian target frames corresponding to the four pedestrian target frames are predicted The position in the current image) to obtain the prediction frames of the four pedestrian target frames in the current image.
所述预测器可以是深度神经网络模型。The predictor may be a deep neural network model.
在利用预测器对第二目标框进行预测之前,使用第二训练样本集对所述预测器进行训练。所述预测器学习到的特征为深度特征,其中的颜色特征占比较小,受光照的影响有限。因此,所述预测器在一定程度上可以克服光照带来的影响,提高了目标跟踪的鲁棒性和场景适应性。在本实施例中,第二训练样本集可以包括大量的不同光照、形变和高速运动物体的样本图像。因 此,所述预测器可以进一步克服光照的影响,并且可以在一定程度克服形变、高速运动带来的影响,从而本申请实现了对高速运动的目标进行跟踪,提高了目标跟踪的鲁棒性。Before using the predictor to predict the second target frame, the predictor is trained using the second training sample set. The features learned by the predictor are depth features, in which color features account for a relatively small proportion and are limited by illumination. Therefore, the predictor can overcome the impact of illumination to a certain extent, and improve the robustness and scene adaptability of target tracking. In this embodiment, the second training sample set may include a large number of sample images of objects with different illumination, deformation, and high-speed motion. Therefore, the predictor can further overcome the influence of illumination, and can overcome the influence of deformation and high-speed movement to a certain extent, so that this application realizes the tracking of high-speed moving targets and improves the robustness of target tracking.
在本实施例中,可以在所述深度神经网络模型中构建特征金字塔网络(Feature Pyramid Network,FPN),利用构建有所述特征金字塔网络的所述深度神经网络模型预测所述第二目标框在所述当前图像中的位置。特征金字塔网络把低分辨率、高语义信息的高层特征和高分辨率、低语义信息的低层特征进行自上而下的侧边连接,使得所有尺度下的特征都有丰富的语义信息。特征金字塔网络的连接方法是把高层特征做2倍上采样,然后和对应的前一层特征结合(前一层要经过1*1的卷积核),结合方式就是做像素间的加法。通过这样的连接,每一层预测所用的特征图都融合了不同分辨率、不同语义强度的特征,融合的不同分辨率的特征图分别做对应分辨率大小的物体检测。这样保证了每一层都有合适的分辨率以及强语义特征。在所述深度神经网络模型中构建特征金字塔网络可以提高对第二目标框预测的性能,使得对发生形变的第二目标框仍然得到较好的预测。In this embodiment, a feature pyramid network (Feature Pyramid Network, FPN) can be constructed in the deep neural network model, and the deep neural network model with the feature pyramid network can be used to predict that the second target frame is The position in the current image. The feature pyramid network connects the high-level features of low-resolution, high-semantic information and the low-level features of high-resolution, low-semantic information from top to bottom, so that features at all scales have rich semantic information. The connection method of the feature pyramid network is to upsample the high-level features twice, and then combine them with the corresponding features of the previous layer (the previous layer has to go through a 1*1 convolution kernel), and the combination method is to add between pixels. Through this connection, the feature maps used in each layer of prediction are fused with features of different resolutions and different semantic strengths, and the fused feature maps of different resolutions are used for object detection with corresponding resolutions. This ensures that each layer has appropriate resolution and strong semantic features. Constructing a feature pyramid network in the deep neural network model can improve the performance of predicting the second target frame, so that the deformed second target frame can still be better predicted.
在一具体实施例中,所述预测器可以是SiamFC网络(Fully-Convolutional Siamese Network)模型,例如构建有特征金字塔网络的SiamFC网络模型。图4是SiamFC模型的示意图。In a specific embodiment, the predictor may be a SiamFC network (Fully-Convolutional Siamese Network) model, for example, a SiamFC network model constructed with a feature pyramid network. Figure 4 is a schematic diagram of the SiamFC model.
图4中,z代表的是模板图像,即前一帧图像中的第二目标框;x代表的是搜索区域,即当前图像;φ代表的是一种特征映射操作,将原始图像映射到特定的特征空间,可以采用CNN中的卷积层和池化层;6*6*128代表z经过φ后得到的特征,是一个128通道6*6大小的特征,同理,22*22*128是x经过φ后的特征;*代表卷积操作,22*22*128的特征被6*6*128的卷积核卷积,得到一个17*17的得分图,代表搜索区域中各个位置与模板图像的相似度。搜索区域中与模板图像的相似度相似度最高的位置就是预测框的位置。In Figure 4, z represents the template image, that is, the second target frame in the previous frame of image; x represents the search area, that is, the current image; φ represents a feature mapping operation, which maps the original image to a specific The feature space of, you can use the convolutional layer and pooling layer in CNN; 6*6*128 represents the feature obtained after z passes through φ, which is a 128-channel 6*6 feature, the same, 22*22*128 Is the feature of x after φ; * represents the convolution operation, the feature of 22*22*128 is convolved by the 6*6*128 convolution kernel, and a 17*17 score map is obtained, which represents each position in the search area and The similarity of the template image. The position in the search area with the highest similarity to the template image is the position of the prediction frame.
匹配模块203,用于将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果。The matching module 203 is configured to match the first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame.
所述第一目标框与所述预测框的匹配结果可以包括所述第一目标框与所述预测框匹配、所述第一目标框与任意所述预测框不匹配、所述预测框与任意所述第一目标框不匹配。The matching result of the first target frame and the prediction frame may include a match between the first target frame and the prediction frame, a mismatch between the first target frame and any prediction frame, and the prediction frame and any prediction frame. The first target frame does not match.
在本实施例中,可以计算所述第一目标框与所述预测框的重叠面积比例(Intersection over Union,IOU),根据所述重叠面积比例确定每一对匹配的所述第一目标框与所述预测框。In this embodiment, the overlap area ratio (Intersection over Union, IOU) of the first target frame and the prediction frame may be calculated, and each pair of matching first target frame and the prediction frame may be determined according to the overlap area ratio. The prediction box.
例如,第一目标框包括第一目标框A1、第一目标框A2、第一目标框A3、第一目标框A4,预测框包括预测框P1、预测框P2、预测框P3、预测框P4。预测框P1对应第二目标框B1、预测框P2对应第二目标框B2、预测框P3对应第二目标框B3、预测框P4对应第二目标框B4。对于第一目标框A1,计算第一目标框A1与预测框P1、第一目标框A1与预测框P2、第一目标框A1与预测框P3、第一目标框A1与预测框P4的重叠面积比例,若第一目标框A1与预测框P1的重叠面积比例最大且大于或等于预设阈值(例如70%), 则确定第一目标框A1与预测框P1相匹配。类似地,对于第一目标框A2,计算第一目标框A2与预测框P1、第一目标框A2与预测框P2、第一目标框A2与预测框P3、第一目标框A2与预测框P4的重叠面积比例,若第一目标框A2与预测框P2的重叠面积比例最大且大于或等于预设阈值(例如70%),则确定第一目标框A2与预测框P2相匹配;对于第一目标框A3,计算第一目标框A3与预测框P1、第一目标框A3与预测框P2、第一目标框A3与预测框P3、第一目标框A3与预测框P4的重叠面积比例,若第一目标框A3与预测框P3的重叠面积比例最大且大于或等于预设阈值(例如70%),则确定第一目标框A3与预测框P3相匹配;对于第一目标框A4,计算第一目标框A4与预测框P1、第一目标框A4与预测框P2、第一目标框A4与预测框P3、第一目标框A4与预测框P4的重叠面积比例,若第一目标框A4与预测框P4的重叠面积比例最大且大于或等于预设阈值(例如70%),则确定第一目标框A4与预测框P4相匹配。For example, the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and a prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4. The prediction frame P1 corresponds to the second target frame B1, the prediction frame P2 corresponds to the second target frame B2, the prediction frame P3 corresponds to the second target frame B3, and the prediction frame P4 corresponds to the second target frame B4. For the first target frame A1, calculate the overlap area of the first target frame A1 and the prediction frame P1, the first target frame A1 and the prediction frame P2, the first target frame A1 and the prediction frame P3, and the first target frame A1 and the prediction frame P4 If the ratio of the overlapping area of the first target frame A1 and the prediction frame P1 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A1 matches the prediction frame P1. Similarly, for the first target frame A2, calculate the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, the first target frame A2 and the prediction frame P4 If the overlap area ratio of the first target frame A2 and the prediction frame P2 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A2 matches the prediction frame P2; Target frame A3, calculate the overlap area ratio of the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame P4, if The ratio of the overlapping area of the first target frame A3 and the prediction frame P3 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A3 matches the prediction frame P3; for the first target frame A4, the first target frame A4 is calculated. A target frame A4 and the prediction frame P1, the first target frame A4 and the prediction frame P2, the first target frame A4 and the prediction frame P3, the first target frame A4 and the prediction frame P4 overlap area ratio, if the first target frame A4 and If the overlap area ratio of the prediction frame P4 is the largest and is greater than or equal to a preset threshold (for example, 70%), it is determined that the first target frame A4 matches the prediction frame P4.
或者,可以计算所述第一目标框与所述预测框的中心点的距离,根据所述距离确定每一对匹配的所述第一目标框与所述预测框。Alternatively, the distance between the center point of the first target frame and the prediction frame may be calculated, and each pair of matched first target frame and the prediction frame may be determined according to the distance.
例如,在第一目标框包括第一目标框A1、第一目标框A2、第一目标框A3、第一目标框A4,预测框包括预测框P1、预测框P2、预测框P3、预测框P4的例子中,对于第一目标框A1,计算第一目标框A1与预测框P1、第一目标框A1与预测框P2、第一目标框A1与预测框P3、第一目标框A1与预测框P4的中心点的距离,若第一目标框A1与预测框P1的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A1与预测框P1相匹配。类似地,类似地,对于第一目标框A2,计算第一目标框A2与预测框P1、第一目标框A2与预测框P2、第一目标框A2与预测框P3、第一目标框A2与预测框P4的中心点的距离,若第一目标框A2与预测框P2的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A2与预测框P2相匹配;对于第一目标框A3,计算第一目标框A3与预测框P1、第一目标框A3与预测框P2、第一目标框A3与预测框P3、第一目标框A3与预测框P4的中心点的距离,若第一目标框A3与预测框P3的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A3与预测框P3相匹配;对于第一目标框A4,计算第一目标框A4与预测框P1、第一目标框A4与预测框P2、第一目标框A4与预测框P3、第一目标框A4与预测框P4的中心点的距离,若第一目标框A4与预测框P4的中心点的距离最小且小于或等于预设距离(例如10个像素点),则确定第一目标框A4与预测框P4相匹配。For example, the first target frame includes a first target frame A1, a first target frame A2, a first target frame A3, and a first target frame A4, and the prediction frame includes a prediction frame P1, a prediction frame P2, a prediction frame P3, and a prediction frame P4. In the example, for the first target frame A1, calculate the first target frame A1 and the prediction frame P1, the first target frame A1 and the prediction frame P2, the first target frame A1 and the prediction frame P3, the first target frame A1 and the prediction frame The distance between the center point of P4. If the distance between the center point of the first target frame A1 and the prediction frame P1 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A1 and the prediction frame P1 are the same match. Similarly, similarly, for the first target frame A2, calculate the first target frame A2 and the prediction frame P1, the first target frame A2 and the prediction frame P2, the first target frame A2 and the prediction frame P3, and the first target frame A2 and the prediction frame P3. The distance between the center point of the prediction frame P4. If the distance between the first target frame A2 and the center point of the prediction frame P2 is the smallest and less than or equal to the preset distance (for example, 10 pixels), the first target frame A2 and the prediction frame are determined P2 matches; for the first target frame A3, calculate the first target frame A3 and the prediction frame P1, the first target frame A3 and the prediction frame P2, the first target frame A3 and the prediction frame P3, the first target frame A3 and the prediction frame The distance between the center point of P4. If the distance between the center point of the first target frame A3 and the prediction frame P3 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A3 and the prediction frame P3 are the same Match; for the first target frame A4, calculate the first target frame A4 and the prediction frame P1, the first target frame A4 and the prediction frame P2, the first target frame A4 and the prediction frame P3, the first target frame A4 and the prediction frame P4 The distance of the center point, if the distance between the center point of the first target frame A4 and the prediction frame P4 is the smallest and is less than or equal to the preset distance (for example, 10 pixels), it is determined that the first target frame A4 matches the prediction frame P4.
更新模块204,用于根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。The update module 204 is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
根据得到所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置可以包括:According to the obtained matching result of the first target frame and the prediction frame, updating the position of the target in the current image may include:
若所述第一目标框与所述预测框匹配,则在所述当前图像中将所述第一目标框的位置作为所述预测框对应的目标更新后的位置;If the first target frame matches the prediction frame, use the position of the first target frame in the current image as the updated position of the target corresponding to the prediction frame;
若所述第一目标框与任意所述预测框不匹配,则在所述当前图像中将所述第一目标框的位置作为新的目标的位置;If the first target frame does not match any of the prediction frames, use the position of the first target frame in the current image as the position of the new target;
若所述预测框与任意所述第一目标框不匹配,则在所述当前图像中将所述预测框对应的目标作为丢失的目标。If the prediction frame does not match any of the first target frames, the target corresponding to the prediction frame is taken as the lost target in the current image.
本实施例供了一种目标跟踪装置20。所述目标跟踪是对视频或图像序列中特定类型的运动物体(例如行人)进行跟踪,得到运动物体在每一帧图像中的位置。所述目标跟踪装置20利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。本实施例提高了目标跟踪的鲁棒性和场景适应性。This embodiment provides a target tracking device 20. The target tracking is to track a specific type of moving object (such as a pedestrian) in a video or image sequence to obtain the position of the moving object in each frame of the image. The target tracking device 20 uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image; obtains the second target frame in the previous frame of the current image, and uses prediction The device predicts the position of the second target frame in the current image to obtain the predicted frame of the second target frame in the current image; and compares the first target frame in the current image with the predicted frame Matching is performed to obtain a matching result of the first target frame and the prediction frame; according to the matching result of the first target frame and the prediction frame, the position of the target is updated in the current image. This embodiment improves the robustness and scene adaptability of target tracking.
实施例三Example three
本实施例提供一种非易失性可读存储介质,该非易失性可读存储介质上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述目标跟踪方法实施例中的步骤,例如图1所示的步骤101-104:This embodiment provides a non-volatile readable storage medium having computer readable instructions stored on the non-volatile readable storage medium. When the computer readable instructions are executed by a processor, the target tracking method in the above embodiment Steps, such as steps 101-104 shown in Figure 1:
步骤101,利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;Step 101: Use a target detector to detect a predetermined type of target in a current image to obtain a first target frame in the current image;
步骤102,获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;Step 102: Obtain a second target frame in the previous frame of the current image, and use a predictor to predict the position of the second target frame in the current image to obtain the second target frame in the current image. The prediction box in the image;
步骤103,将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;Step 103: Match the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
步骤104,根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。Step 104: According to the matching result of the first target frame and the prediction frame, update the position of the target in the current image.
或者,该计算机可读指令被处理器执行时实现上述装置实施例中各模块的功能,例如图2中的模块201-204:Or, when the computer-readable instruction is executed by the processor, the function of each module in the foregoing device embodiment is realized, for example, the modules 201-204 in FIG.
检测模块201,利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;The detection module 201 uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
预测模块202,获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;The prediction module 202 obtains the second target frame in the previous frame of the current image, uses a predictor to predict the position of the second target frame in the current image, and obtains that the second target frame is in the The prediction box in the current image;
匹配模块203,用于将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;The matching module 203 is configured to match the first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame;
更新模块204,用于根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。The update module 204 is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
实施例四Example four
图3为本申请实施例四提供的计算机装置的示意图。所述计算机装置30包括存储器301、处理器302以及存储在所述存储器301中并可在所述处理器302上运行的计算机可读指令303,例如目标跟踪程序。所述处理器302执行所述计算机可读指令303时实现上述目标跟踪方法实施例中的步骤,例如图1所示的步骤101-104:FIG. 3 is a schematic diagram of a computer device provided in Embodiment 4 of this application. The computer device 30 includes a memory 301, a processor 302, and computer-readable instructions 303 stored in the memory 301 and running on the processor 302, such as a target tracking program. When the processor 302 executes the computer-readable instruction 303, the steps in the embodiment of the target tracking method are implemented, for example, steps 101-104 shown in FIG.
步骤101,利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;Step 101: Use a target detector to detect a predetermined type of target in a current image to obtain a first target frame in the current image;
步骤102,获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;Step 102: Obtain a second target frame in the previous frame of the current image, and use a predictor to predict the position of the second target frame in the current image to obtain the second target frame in the current image. The prediction box in the image;
步骤103,将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;Step 103: Match the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
步骤104,根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。Step 104: According to the matching result of the first target frame and the prediction frame, update the position of the target in the current image.
或者,该计算机可读指令303被处理器执行时实现上述装置实施例中各模块的功能,例如图2中的模块201-204:Or, when the computer-readable instruction 303 is executed by the processor, the function of each module in the foregoing device embodiment is realized, for example, the modules 201-204 in FIG. 2:
检测模块201,利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;The detection module 201 uses a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
预测模块202,获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;The prediction module 202 obtains the second target frame in the previous frame of the current image, uses a predictor to predict the position of the second target frame in the current image, and obtains that the second target frame is in the The prediction box in the current image;
匹配模块203,用于将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;The matching module 203 is configured to match the first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame;
更新模块204,用于根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。The update module 204 is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
示例性的,所述计算机可读指令303可以被分割成一个或多个模块,所述一个或者多个模块被存储在所述存储器301中,并由所述处理器302执行,以完成本方法。例如,所述计算机可读指令303可以被分割成图2中的检测模块201、预测模块202、匹配模块203、更新模块204,各模块具体功能参见实施例二。Exemplarily, the computer-readable instruction 303 may be divided into one or more modules, and the one or more modules are stored in the memory 301 and executed by the processor 302 to complete the method . For example, the computer-readable instruction 303 can be divided into the detection module 201, the prediction module 202, the matching module 203, and the update module 204 in FIG. 2. For the specific functions of each module, refer to the second embodiment.
所述计算机装置30可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图3仅仅是计算机装置30的示例,并不构成对计算机装置30的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机装置30还可以包括输入输出设备、网络接入设备、总线等。The computer device 30 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. Those skilled in the art can understand that the schematic diagram 3 is only an example of the computer device 30, and does not constitute a limitation on the computer device 30. It may include more or less components than those shown in the figure, or combine certain components, or different components. For example, the computer device 30 may also include input and output devices, network access devices, buses, etc.
所称处理器302可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该 处理器302也可以是任何常规的处理器等,所述处理器302是所述计算机装置30的控制中心,利用各种接口和线路连接整个计算机装置30的各个部分。The so-called processor 302 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor 302 may also be any conventional processor, etc. The processor 302 is the control center of the computer device 30, which connects the entire computer device 30 through various interfaces and lines. Various parts.
所述存储器301可用于存储所述计算机可读指令303,所述处理器302通过运行或执行存储在所述存储器301内的计算机可读指令或模块,以及调用存储在存储器301内的数据,实现所述计算机装置30的各种功能。所述存储器302可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机装置30的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器301可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 301 may be used to store the computer-readable instructions 303, and the processor 302 executes or executes the computer-readable instructions or modules stored in the memory 301 and calls data stored in the memory 301 to implement Various functions of the computer device 30. The memory 302 may mainly include a program storage area and a data storage area. The program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; The data (such as audio data, phone book, etc.) created according to the use of the computer device 30 and the like are stored. In addition, the memory 301 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
所述计算机装置30集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。所述计算机可读介质可以包括:能够携带所述计算机可读指令的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。If the integrated module of the computer device 30 is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a non-volatile readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, the computer-readable instructions, when executed by the processor, can implement the steps of the foregoing method embodiments. The computer-readable medium may include: any entity or device capable of carrying the computer-readable instructions, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), random access memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media. It should be noted that the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable medium Does not include electrical carrier signals and telecommunication signals.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of hardware plus software functional modules.
上述软件功能模块存储在一个非易失性可读存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分步骤。The above-mentioned software function module is stored in a non-volatile readable storage medium, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute each of this application Part of the steps of the method described in the embodiment.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节, 而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他模块或步骤,单数不排除复数。系统权利要求中陈述的多个模块或装置也可以由一个模块或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, from any point of view, the embodiments should be regarded as exemplary and non-restrictive. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any associated diagram marks in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "include" does not exclude other modules or steps, and the singular does not exclude the plural. Multiple modules or devices stated in the system claims can also be implemented by one module or device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.

Claims (20)

  1. 一种目标跟踪方法,其特征在于,所述方法包括:A target tracking method, characterized in that the method includes:
    利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;Using a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
    获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;Acquire the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the position of the second target frame in the current image Prediction box
    将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;Matching the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
    根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。According to the matching result of the first target frame and the prediction frame, the position of the target is updated in the current image.
  2. 如权利要求1所述的方法,其特征在于,所述目标检测器是加快区域卷积神经网络模型,所述加快区域卷积神经网络模型包括区域建议网络和快速区域卷积神经网络,所述加快区域卷积神经网络模型在检测所述图像中的预定类型目标之前按照以下步骤进行训练:The method of claim 1, wherein the target detector is an accelerated regional convolutional neural network model, and the accelerated regional convolutional neural network model includes a regional suggestion network and a fast regional convolutional neural network. The accelerated regional convolutional neural network model is trained according to the following steps before detecting a predetermined type of target in the image:
    第一训练步骤,使用Imagenet模型初始化所述区域建议网络,使用第一训练样本集训练所述区域建议网络;The first training step is to use the Imagenet model to initialize the region suggestion network, and use the first training sample set to train the region suggestion network;
    第二训练步骤,使用所述第一训练步骤中训练后的区域建议网络生成所述第一训练样本集中各个样本图像的候选框,利用所述候选框训练所述快速区域卷积神经网络;In the second training step, the region suggestion network trained in the first training step is used to generate candidate frames of each sample image in the first training sample set, and the candidate frames are used to train the fast regional convolutional neural network;
    第三训练步骤,使用所述第二训练步骤中训练后的快速区域卷积神经网络初始化所述区域建议网络,使用所述第一训练样本集训练所述区域建议网络;A third training step, using the fast regional convolutional neural network trained in the second training step to initialize the region suggestion network, and using the first training sample set to train the region suggestion network;
    第四训练步骤,使用所述第三训练步骤中训练后的区域建议网络初始化所述快速区域卷积神经网络,并保持所述卷积层固定,使用所述第一训练样本集训练所述快速区域卷积神经网络。In the fourth training step, the fast regional convolutional neural network is initialized using the region proposal network trained in the third training step, and the convolutional layer is kept fixed, and the first training sample set is used to train the fast Area convolutional neural network.
  3. 如权利要求2所述的方法,其特征在于,所述加快区域卷积神经网络模型采用ZF框架,所述区域建议网络和所述快速区域卷积神经网络共享5个卷积层。The method according to claim 2, wherein the accelerated regional convolutional neural network model adopts a ZF framework, and the regional suggestion network and the fast regional convolutional neural network share 5 convolutional layers.
  4. 如权利要求1所述的方法,其特征在于,所述预测器是构建有特征金字塔网络的深度神经网络模型。The method of claim 1, wherein the predictor is a deep neural network model constructed with a feature pyramid network.
  5. 如权利要求1所述的方法,其特征在于,在所述利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框之前,所述方法还包括:The method according to claim 1, characterized in that, in said using a predictor to predict the position of the second target frame in the current image, the prediction of the second target frame in the current image is obtained Before the frame, the method also includes:
    使用第二训练样本集对所述预测器进行训练,所述第二训练样本集包括不同光照、形变和高速运动物体的样本图像。The predictor is trained using a second training sample set, and the second training sample set includes sample images of objects with different illumination, deformation, and high-speed moving.
  6. 如权利要求1所述的方法,其特征在于,所述将所述当前图像中的第一目标框与所述预测框进行匹配包括:The method of claim 1, wherein the matching the first target frame in the current image with the prediction frame comprises:
    计算所述第一目标框与所述预测框的重叠面积比例,根据所述重叠面积比例确定每一对匹配的所述第一目标框与所述预测框;或Calculate the overlap area ratio of the first target frame and the prediction frame, and determine each pair of matching first target frame and the prediction frame according to the overlap area ratio; or
    计算所述第一目标框与所述预测框的中心点的距离,根据所述距离确定每一对匹配的所述第一目标框与所述预测框。The distance between the center point of the first target frame and the prediction frame is calculated, and each pair of matching first target frame and the prediction frame is determined according to the distance.
  7. 如权利要求1所述的方法,其特征在于,所述根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置包括:The method according to claim 1, wherein the updating the position of the target in the current image according to the matching result of the first target frame and the prediction frame comprises:
    若所述第一目标框与所述预测框匹配,则在所述当前图像中将所述第一目标框的位置作为所述预测框对应的目标更新后的位置;If the first target frame matches the prediction frame, use the position of the first target frame in the current image as the updated position of the target corresponding to the prediction frame;
    若所述第一目标框与任意所述预测框不匹配,则在所述当前图像中将所述第一目标框的位置作为新的目标的位置;If the first target frame does not match any of the prediction frames, use the position of the first target frame in the current image as the position of the new target;
    若所述预测框与任意所述第一目标框不匹配,则在所述当前图像中将所述预测框对应的目标作为丢失的目标。If the prediction frame does not match any of the first target frames, the target corresponding to the prediction frame is taken as the lost target in the current image.
  8. 一种目标跟踪装置,其特征在于,所述装置包括:A target tracking device, characterized in that the device comprises:
    检测模块,用于利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;The detection module is configured to use a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
    预测模块,用于获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;The prediction module is used to obtain the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the second target frame in the current image. Describe the prediction frame in the current image;
    匹配模块,用于将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;A matching module, configured to match a first target frame in the current image with the prediction frame to obtain a matching result between the first target frame and the prediction frame;
    更新模块,用于根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。The update module is configured to update the position of the target in the current image according to the matching result of the first target frame and the prediction frame.
  9. 一种计算机装置,其特征在于,所述计算机装置包括存储器和处理器,所述存储器存储有至少一条计算机可读指令,所述处理器执行所述至少一条计算机可读指令以实现以下步骤:A computer device, wherein the computer device includes a memory and a processor, the memory stores at least one computer readable instruction, and the processor executes the at least one computer readable instruction to implement the following steps:
    利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;Using a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
    获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;Acquire the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the position of the second target frame in the current image Prediction box
    将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;Matching the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
    根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。According to the matching result of the first target frame and the prediction frame, the position of the target is updated in the current image.
  10. 如权利要求9所述的计算机装置,其特征在于,所述目标检测器是加快区域卷积神经网络模型,所述加快区域卷积神经网络模型包括区域建议网络和快速区域卷积神经网络,所述处理器在所述利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框之前,还执行所述至少一条计算机可读指令以实现以下步骤:The computer device according to claim 9, wherein the target detector is an accelerated regional convolutional neural network model, and the accelerated regional convolutional neural network model includes a regional suggestion network and a fast regional convolutional neural network, so The processor further executes the at least one computer-readable instruction to implement the following steps before the target detector detects a predetermined type of target in the current image to obtain the first target frame in the current image:
    第一训练步骤,使用Imagenet模型初始化所述区域建议网络,使用第一 训练样本集训练所述区域建议网络;The first training step is to initialize the region suggestion network using the Imagenet model, and train the region suggestion network using the first training sample set;
    第二训练步骤,使用所述第一训练步骤中训练后的区域建议网络生成所述第一训练样本集中各个样本图像的候选框,利用所述候选框训练所述快速区域卷积神经网络;In the second training step, the region suggestion network trained in the first training step is used to generate candidate frames of each sample image in the first training sample set, and the candidate frames are used to train the fast regional convolutional neural network;
    第三训练步骤,使用所述第二训练步骤中训练后的快速区域卷积神经网络初始化所述区域建议网络,使用所述第一训练样本集训练所述区域建议网络;A third training step, using the fast regional convolutional neural network trained in the second training step to initialize the region suggestion network, and using the first training sample set to train the region suggestion network;
    第四训练步骤,使用所述第三训练步骤中训练后的区域建议网络初始化所述快速区域卷积神经网络,并保持所述卷积层固定,使用所述第一训练样本集训练所述快速区域卷积神经网络。In the fourth training step, the fast regional convolutional neural network is initialized using the region proposal network trained in the third training step, and the convolutional layer is kept fixed, and the first training sample set is used to train the fast Area convolutional neural network.
  11. 如权利要求10所述的计算机装置,其特征在于,所述加快区域卷积神经网络模型采用ZF框架,所述区域建议网络和所述快速区域卷积神经网络共享5个卷积层。10. The computer device of claim 10, wherein the accelerated regional convolutional neural network model adopts a ZF framework, and the regional suggestion network and the fast regional convolutional neural network share 5 convolutional layers.
  12. 如权利要求9所述的计算机装置,其特征在于,所述预测器是构建有特征金字塔网络的深度神经网络模型。9. The computer device according to claim 9, wherein the predictor is a deep neural network model constructed with a feature pyramid network.
  13. 如权利要求9所述的计算机装置,其特征在于,所述处理器在所述利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框之前,还执行所述至少一条计算机可读指令以实现以下步骤:The computer device according to claim 9, wherein the processor predicts the position of the second target frame in the current image by using a predictor to obtain the position of the second target frame in the Before the prediction box in the current image, the at least one computer-readable instruction is also executed to implement the following steps:
    使用第二训练样本集对所述预测器进行训练,所述第二训练样本集包括不同光照、形变和高速运动物体的样本图像。The predictor is trained using a second training sample set, and the second training sample set includes sample images of objects with different illumination, deformation, and high-speed moving.
  14. 如权利要求9所述的计算机装置,其特征在于,所述将所述当前图像中的第一目标框与所述预测框进行匹配包括:9. The computer device of claim 9, wherein the matching the first target frame in the current image with the prediction frame comprises:
    计算所述第一目标框与所述预测框的重叠面积比例,根据所述重叠面积比例确定每一对匹配的所述第一目标框与所述预测框;或Calculate the overlap area ratio of the first target frame and the prediction frame, and determine each pair of matching first target frame and the prediction frame according to the overlap area ratio; or
    计算所述第一目标框与所述预测框的中心点的距离,根据所述距离确定每一对匹配的所述第一目标框与所述预测框。The distance between the center point of the first target frame and the prediction frame is calculated, and each pair of matching first target frame and the prediction frame is determined according to the distance.
  15. 如权利要求9所述的计算机装置,其特征在于,所述根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置包括:9. The computer device according to claim 9, wherein the updating the position of the target in the current image according to the matching result of the first target frame and the prediction frame comprises:
    若所述第一目标框与所述预测框匹配,则在所述当前图像中将所述第一目标框的位置作为所述预测框对应的目标更新后的位置;If the first target frame matches the prediction frame, use the position of the first target frame in the current image as the updated position of the target corresponding to the prediction frame;
    若所述第一目标框与任意所述预测框不匹配,则在所述当前图像中将所述第一目标框的位置作为新的目标的位置;If the first target frame does not match any of the prediction frames, use the position of the first target frame in the current image as the position of the new target;
    若所述预测框与任意所述第一目标框不匹配,则在所述当前图像中将所述预测框对应的目标作为丢失的目标。If the prediction frame does not match any of the first target frames, the target corresponding to the prediction frame is taken as the lost target in the current image.
  16. 一种非易失性可读存储介质,所述非易失性可读存储介质上存储有至少一条计算机可读指令,其特征在于,所述至少一条计算机可读指令被处理器执行时实现以下步骤:A non-volatile readable storage medium storing at least one computer readable instruction, wherein the at least one computer readable instruction is executed by a processor to achieve the following step:
    利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框;Using a target detector to detect a predetermined type of target in the current image to obtain the first target frame in the current image;
    获取所述当前图像的前一帧图像中的第二目标框,利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框;Acquire the second target frame in the previous frame of the current image, use a predictor to predict the position of the second target frame in the current image, and obtain the position of the second target frame in the current image Prediction box
    将所述当前图像中的第一目标框与所述预测框进行匹配,得到所述第一目标框与所述预测框的匹配结果;Matching the first target frame in the current image with the prediction frame to obtain a matching result of the first target frame and the prediction frame;
    根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置。According to the matching result of the first target frame and the prediction frame, the position of the target is updated in the current image.
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述目标检测器是加快区域卷积神经网络模型,所述加快区域卷积神经网络模型包括区域建议网络和快速区域卷积神经网络,所述利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框之前,所述至少一条计算机可读指令被所述处理器执行时还实现以下步骤:The non-volatile readable storage medium of claim 16, wherein the target detector is an accelerated region convolutional neural network model, and the accelerated region convolutional neural network model includes a region suggestion network and a fast region Convolutional neural network, said using a target detector to detect a predetermined type of target in the current image, and before obtaining the first target frame in the current image, the at least one computer-readable instruction is also implemented when executed by the processor The following steps:
    第一训练步骤,所述利用目标检测器检测当前图像中的预定类型目标,得到所述当前图像中的第一目标框之前,使用Imagenet模型初始化所述区域建议网络,使用第一训练样本集训练所述区域建议网络;In the first training step, the target detector is used to detect a predetermined type of target in the current image, and before the first target frame in the current image is obtained, the imagenet model is used to initialize the region suggestion network, and the first training sample set is used for training Said regional proposal network;
    第二训练步骤,使用所述第一训练步骤中训练后的区域建议网络生成所述第一训练样本集中各个样本图像的候选框,利用所述候选框训练所述快速区域卷积神经网络;In the second training step, the region suggestion network trained in the first training step is used to generate candidate frames of each sample image in the first training sample set, and the candidate frames are used to train the fast regional convolutional neural network;
    第三训练步骤,使用所述第二训练步骤中训练后的快速区域卷积神经网络初始化所述区域建议网络,使用所述第一训练样本集训练所述区域建议网络;A third training step, using the fast regional convolutional neural network trained in the second training step to initialize the region suggestion network, and using the first training sample set to train the region suggestion network;
    第四训练步骤,使用所述第三训练步骤中训练后的区域建议网络初始化所述快速区域卷积神经网络,并保持所述卷积层固定,使用所述第一训练样本集训练所述快速区域卷积神经网络。In the fourth training step, the fast regional convolutional neural network is initialized using the region proposal network trained in the third training step, and the convolutional layer is kept fixed, and the first training sample set is used to train the fast Area convolutional neural network.
  18. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述利用预测器预测所述第二目标框在所述当前图像中的位置,得到所述第二目标框在所述当前图像中的预测框之前,所述至少一条计算机可读指令被所述处理器执行时还实现以下步骤:The non-volatile readable storage medium according to claim 16, wherein the predictor is used to predict the position of the second target frame in the current image to obtain the position of the second target frame. Before the prediction frame in the current image, when the at least one computer-readable instruction is executed by the processor, the following steps are further implemented:
    使用第二训练样本集对所述预测器进行训练,所述第二训练样本集包括不同光照、形变和高速运动物体的样本图像。The predictor is trained using a second training sample set, and the second training sample set includes sample images of objects with different illumination, deformation, and high-speed moving.
  19. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述将所述当前图像中的第一目标框与所述预测框进行匹配包括:The non-volatile readable storage medium of claim 16, wherein the matching the first target frame in the current image with the prediction frame comprises:
    计算所述第一目标框与所述预测框的重叠面积比例,根据所述重叠面积比例确定每一对匹配的所述第一目标框与所述预测框;或Calculate the overlap area ratio of the first target frame and the prediction frame, and determine each pair of matching first target frame and the prediction frame according to the overlap area ratio; or
    计算所述第一目标框与所述预测框的中心点的距离,根据所述距离确定每一对匹配的所述第一目标框与所述预测框。The distance between the center point of the first target frame and the prediction frame is calculated, and each pair of matching first target frame and the prediction frame is determined according to the distance.
  20. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述根据所述第一目标框与所述预测框的匹配结果,在所述当前图像中更新目标的位置包括:The non-volatile readable storage medium according to claim 16, wherein the updating the position of the target in the current image according to the matching result of the first target frame and the prediction frame comprises:
    若所述第一目标框与所述预测框匹配,则在所述当前图像中将所述第一 目标框的位置作为所述预测框对应的目标更新后的位置;If the first target frame matches the prediction frame, use the position of the first target frame in the current image as the updated position of the target corresponding to the prediction frame;
    若所述第一目标框与任意所述预测框不匹配,则在所述当前图像中将所述第一目标框的位置作为新的目标的位置;If the first target frame does not match any of the prediction frames, use the position of the first target frame in the current image as the position of the new target;
    若所述预测框与任意所述第一目标框不匹配,则在所述当前图像中将所述预测框对应的目标作为丢失的目标。If the prediction frame does not match any of the first target frames, the target corresponding to the prediction frame is taken as the lost target in the current image.
PCT/CN2019/091160 2019-01-23 2019-06-13 Target tracking method and device, computer device and readable storage medium WO2020151167A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910064675.5 2019-01-23
CN201910064675.5A CN109903310A (en) 2019-01-23 2019-01-23 Method for tracking target, device, computer installation and computer storage medium

Publications (1)

Publication Number Publication Date
WO2020151167A1 true WO2020151167A1 (en) 2020-07-30

Family

ID=66944120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/091160 WO2020151167A1 (en) 2019-01-23 2019-06-13 Target tracking method and device, computer device and readable storage medium

Country Status (2)

Country Link
CN (1) CN109903310A (en)
WO (1) WO2020151167A1 (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110605724B (en) * 2019-07-01 2022-09-23 青岛联合创智科技有限公司 Intelligence endowment robot that accompanies
CN110490902B (en) * 2019-08-02 2022-06-14 西安天和防务技术股份有限公司 Target tracking method and device applied to smart city and computer equipment
CN110443210B (en) * 2019-08-08 2021-11-26 北京百度网讯科技有限公司 Pedestrian tracking method and device and terminal
CN112446229A (en) * 2019-08-27 2021-03-05 北京地平线机器人技术研发有限公司 Method and device for acquiring pixel coordinates of marker post
CN110517292A (en) * 2019-08-29 2019-11-29 京东方科技集团股份有限公司 Method for tracking target, device, system and computer readable storage medium
CN110738125B (en) * 2019-09-19 2023-08-01 平安科技(深圳)有限公司 Method, device and storage medium for selecting detection frame by Mask R-CNN
CN110738687A (en) * 2019-10-18 2020-01-31 上海眼控科技股份有限公司 Object tracking method, device, equipment and storage medium
CN112749590B (en) * 2019-10-30 2023-02-07 上海高德威智能交通系统有限公司 Object detection method, device, computer equipment and computer readable storage medium
CN110838125B (en) * 2019-11-08 2024-03-19 腾讯医疗健康(深圳)有限公司 Target detection method, device, equipment and storage medium for medical image
CN111199182A (en) * 2019-11-12 2020-05-26 恒大智慧科技有限公司 Lost object method, system and storage medium based on intelligent community
WO2021142571A1 (en) * 2020-01-13 2021-07-22 深圳大学 Twin dual-path target tracking method
CN111709975B (en) * 2020-06-22 2023-11-03 上海高德威智能交通系统有限公司 Multi-target tracking method, device, electronic equipment and storage medium
CN111754541B (en) * 2020-07-29 2023-09-19 腾讯科技(深圳)有限公司 Target tracking method, device, equipment and readable storage medium
CN112150505A (en) * 2020-09-11 2020-12-29 浙江大华技术股份有限公司 Target object tracker updating method and device, storage medium and electronic device
CN112184770A (en) * 2020-09-28 2021-01-05 中国电子科技集团公司第五十四研究所 Target tracking method based on YOLOv3 and improved KCF
CN112308045B (en) * 2020-11-30 2023-11-24 深圳集智数字科技有限公司 Method and device for detecting dense crowd and electronic equipment
CN113034541B (en) * 2021-02-26 2021-12-14 北京国双科技有限公司 Target tracking method and device, computer equipment and storage medium
CN113112866B (en) * 2021-04-14 2022-06-03 深圳市旗扬特种装备技术工程有限公司 Intelligent traffic early warning method and intelligent traffic early warning system
CN113673541B (en) * 2021-10-21 2022-02-11 广州微林软件有限公司 Image sample generation method for target detection and application
CN115457036B (en) * 2022-11-10 2023-04-25 中国平安财产保险股份有限公司 Detection model training method, intelligent point counting method and related equipment
CN117315028B (en) * 2023-10-12 2024-04-30 北京多维视通技术有限公司 Method, device, equipment and medium for positioning fire point of outdoor fire scene

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845430A (en) * 2017-02-06 2017-06-13 东华大学 Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN107516303A (en) * 2017-09-01 2017-12-26 成都通甲优博科技有限责任公司 Multi-object tracking method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875588B (en) * 2018-05-25 2022-04-15 武汉大学 Cross-camera pedestrian detection tracking method based on deep learning
CN109117794A (en) * 2018-08-16 2019-01-01 广东工业大学 A kind of moving target behavior tracking method, apparatus, equipment and readable storage medium storing program for executing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845430A (en) * 2017-02-06 2017-06-13 东华大学 Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN107516303A (en) * 2017-09-01 2017-12-26 成都通甲优博科技有限责任公司 Multi-object tracking method and system

Also Published As

Publication number Publication date
CN109903310A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
WO2020151167A1 (en) Target tracking method and device, computer device and readable storage medium
WO2020151166A1 (en) Multi-target tracking method and device, computer device and readable storage medium
CN108121986B (en) Object detection method and device, computer device and computer readable storage medium
US10691952B2 (en) Adapting to appearance variations when tracking a target object in video sequence
US20210279503A1 (en) Image processing method, apparatus, and device, and storage medium
AU2016201908B2 (en) Joint depth estimation and semantic labeling of a single image
WO2019114036A1 (en) Face detection method and device, computer device, and computer readable storage medium
EP3493105A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
KR101896357B1 (en) Method, device and program for detecting an object
WO2019174405A1 (en) License plate identification method and system thereof
GB2555136A (en) A method for analysing media content
EP3493106A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
EP3493104A1 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
JP7226696B2 (en) Machine learning method, machine learning system and non-transitory computer readable storage medium
Liu et al. Vehicle detection and ranging using two different focal length cameras
US11941822B2 (en) Volumetric sampling with correlative characterization for dense estimation
US20230154157A1 (en) Saliency-based input resampling for efficient object detection
Liang et al. Car detection and classification using cascade model
Delibaşoğlu Moving object detection method with motion regions tracking in background subtraction
Esfahani et al. DeepDSAIR: Deep 6-DOF camera relocalization using deblurred semantic-aware image representation for large-scale outdoor environments
Deng et al. Deep learning in crowd counting: A survey
CN113807407B (en) Target detection model training method, model performance detection method and device
US20220058452A1 (en) Spatiotemporal recycling network
Wang et al. A multistep framework for vision based vehicle detection
Bai et al. DUCAF-Net: An Object Detection Method for UAV Imagery.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911600

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19911600

Country of ref document: EP

Kind code of ref document: A1