WO2023077754A1 - Target tracking method and apparatus, and storage medium - Google Patents

Target tracking method and apparatus, and storage medium Download PDF

Info

Publication number
WO2023077754A1
WO2023077754A1 PCT/CN2022/090574 CN2022090574W WO2023077754A1 WO 2023077754 A1 WO2023077754 A1 WO 2023077754A1 CN 2022090574 W CN2022090574 W CN 2022090574W WO 2023077754 A1 WO2023077754 A1 WO 2023077754A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target
frame
tracking
detection
Prior art date
Application number
PCT/CN2022/090574
Other languages
French (fr)
Chinese (zh)
Inventor
梁浩
武鹏
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Publication of WO2023077754A1 publication Critical patent/WO2023077754A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing

Definitions

  • the present disclosure relates to the technical field of computer vision, and in particular to an object tracking method, device and storage medium.
  • Target tracking is to find the target in the image sequence by giving an image sequence, identify the same target in different frames, and assign ID to the same target in different frames.
  • target tracking is usually performed based on 2D information.
  • the present disclosure provides a target tracking method, device and storage medium.
  • a target tracking method including:
  • the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
  • the image is any image in the image acquisition sequence except the first image;
  • the second image is the The previous image of the first image in the image acquisition sequence
  • the 3D detection frame, the 2D detection frame, the 3D prediction frame and the 2D prediction frame determine a tracking result of tracking the target object with respect to the first image ,include:
  • the tracking result includes the 3D tracking location information.
  • determine a tracking result of tracking the target object with respect to the first image ,Also includes:
  • the tracking result includes the 2D tracking location information.
  • the tracking result includes 3D tracking position information of the target object corresponding to the second image, 2D tracking position information, and motion data of the target object;
  • the predicting the 3D prediction frame of the target object in the three-dimensional space and the 2D prediction frame on the first image according to the tracking result of tracking the target object with respect to the second image includes:
  • the 3D tracking position information and the 2D tracking position information are input into the updated tracker to obtain the 3D prediction frame and the 2D prediction frame output by the tracker.
  • the motion data includes the rate of change of the target object's position on the image, and the velocity and acceleration of the target object in the target three-dimensional space;
  • the tracker is capable of outputting the 2D predicted frame based on the position change rate and the 2D tracked position information, and outputting the 3D predicted frame based on the velocity, the acceleration, and the 3D tracked position information.
  • the first image includes multiple captured images, and the multiple captured images are images captured by multiple image capture devices at the same capture moment;
  • the performing object detection on the first image in the image acquisition sequence to obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image includes:
  • the method determines the tracking of the target object for the first image Before the result, the method also includes:
  • a non-maximum suppression process is performed on the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  • a target tracking device including:
  • the acquisition module is configured to acquire an image acquisition sequence, the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
  • the detection module is configured to perform object detection on the first image in the image acquisition sequence, and obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image box, the first image is any image in the image acquisition sequence except the first image;
  • the prediction module is configured to predict a 3D prediction frame of the target object in the target three-dimensional space and a 2D prediction frame on the first image according to the tracking result of the target object being tracked with respect to the second image,
  • the second image is a previous image of the first image in the image acquisition sequence;
  • the determination module is configured to determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  • the determining module is further configured to:
  • the tracking result includes the 3D tracking location information.
  • the determining module is further configured to:
  • the tracking result includes the 2D tracking location information.
  • the tracking result includes 3D tracking position information of the target object corresponding to the second image, 2D tracking position information, and motion data of the target object;
  • the prediction module is further configured to:
  • the 3D tracking position information and the 2D tracking position information are input into the updated tracker to obtain the 3D prediction frame and the 2D prediction frame output by the tracker.
  • the motion data includes the rate of change of the target object's position on the image, and the velocity and acceleration of the target object in the target three-dimensional space;
  • the tracker is capable of outputting the 2D predicted frame based on the position change rate and the 2D tracked position information, and outputting the 3D predicted frame based on the velocity, the acceleration, and the 3D tracked position information.
  • the first image includes multiple captured images, and the multiple captured images are images captured by multiple image capture devices at the same capture moment;
  • the detection module is further configured to:
  • the determining module is further configured to:
  • a non-maximum suppression process is performed on the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  • a computer-readable storage medium on which computer program instructions are stored, and when the program instructions are executed by a processor, the steps of the target tracking method provided in the first aspect of the present disclosure are implemented.
  • a target tracking device including:
  • memory for storing processor-executable instructions
  • the processor is configured as:
  • the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
  • the image is any image in the image acquisition sequence except the first image;
  • the second image is the The previous image of the first image in the image acquisition sequence
  • a computer program product includes a computer program executable by a programmable device, and the computer program has a function for realizing the present invention when executed by the programmable device.
  • the steps of the target tracking method provided by the first aspect are disclosed.
  • the technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: determine the tracking of the target object through 3D frame information (for example, 3D detection frame and 3D prediction frame) and 2D frame information (for example, 2D detection frame and 2D prediction frame)
  • 3D frame information for example, 3D detection frame and 3D prediction frame
  • 2D frame information for example, 2D detection frame and 2D prediction frame
  • the 3D frame information is introduced, so that the target tracking method of the present disclosure can continue to perform motion estimation on the target object in the target three-dimensional space for a period of time after the target object is lost, which improves the successful matching probability after the target object reappears, Reduce the ID switching caused by the target missing or out of view, that is, reduce the wrong tracking of the target.
  • combining the 3D frame information and the 2D frame information to determine the tracking result of the target object can improve the tracking accuracy of the target object.
  • Fig. 1 is a flow chart showing a method for tracking a target according to an exemplary embodiment.
  • Fig. 2 is a flow chart of determining a 3D detection frame and a 2D detection frame according to an exemplary embodiment.
  • Fig. 3 is a block diagram of an object tracking device according to an exemplary embodiment.
  • Fig. 4 is a block diagram of a device for target tracking according to an exemplary embodiment.
  • Fig. 5 is a block diagram of a device for target tracking according to an exemplary embodiment.
  • the object tracking method of the present disclosure can be applied to different scenarios. For example, it can be applied to automatic driving scenarios to track targets in images collected by image acquisition devices on vehicles. For another example, it can be applied to a traffic monitoring scene to track a target in an image captured by an image acquisition device in a traffic monitoring system.
  • the application scenarios of the target tracking method mentioned in the present disclosure are only some examples or embodiments of the present disclosure, and those of ordinary skill in the art can also use the target tracking method without creative work.
  • the method is applicable to other similar scenarios, for example, it can also be applied to target tracking of a mobile robot, which is not limited in the present disclosure.
  • 2D information is usually used to track targets in images captured by one or more image acquisition devices.
  • 2D information due to the affine transformation of targets projected onto 2D images, it is difficult to accurately track targets on 2D images. Therefore, the target cannot be accurately tracked using 2D information, resulting in matching the wrong ID for the target. And when using 2D information for target tracking, once the target is lost, it is difficult to retrieve it.
  • images acquired by multiple image acquisition devices are usually tracked separately, which is not only inefficient, but also cannot deal with overlapping targets in images of adjacent image acquisition devices.
  • Fig. 1 is a flowchart of a target tracking method according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.
  • Step 110 acquiring an image acquisition sequence, which is obtained according to the images acquired by the image acquisition device at multiple acquisition moments.
  • the image acquisition sequence may be obtained according to images acquired by one or more image acquisition devices at multiple acquisition moments.
  • the acquired image at each acquisition moment in the image acquisition sequence may be the acquired image of the image acquisition device at the acquisition moment; for multiple image acquisition devices, the acquired image at each acquisition moment in the image acquisition sequence may be is the captured image of the plurality of image capturing devices at the capturing moment.
  • the image acquisition sequence 1 may be (P1 , P2, P3).
  • image acquisition device 2 is The acquired images at the acquisition times t1, t2 and t3 are P 21 , P 22 and P 23 , and the acquired images of the image acquisition device 3 at multiple acquisition times t1, t2 and t3 are P 31 , P 32 and P 33 as an example, Then the image acquisition sequence 2 may be (P 11 P 21 P 11 , P 12 P 22 P 32 , P 13 P 23 P 33 ).
  • image capture devices may include, but are not limited to, video cameras and cameras.
  • the image acquisition device can be set at a preset fixed position or in a mobile device, and the preset fixed position and the mobile device can be specifically set according to actual needs.
  • a mobile device could be an autonomous vehicle.
  • the image capture device may be one or more cameras included in the autonomous vehicle.
  • the capture directions of the multiple image capture devices may be different.
  • the acquisition directions of the image acquisition devices 1-3 may be left direction, forward direction, right direction, etc. respectively.
  • the collection directions of multiple image collection devices may be specifically set according to actual conditions, and this disclosure does not impose any limitation on this.
  • an image capture sequence may be acquired based on video captured by one or more image capture devices.
  • the captured images in the image capturing sequence may be image frames included in the video.
  • Step 120 perform object detection on the first image in the image acquisition sequence, and obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image, the first image is the image acquisition sequence Any image in , except the first image.
  • the object on the first image may refer to one or more objects included in the first image, and the objects may include different types of objects.
  • the objects on the first image may include objects of the pedestrian category and objects of the vehicle category.
  • object detection may be performed on the first image according to a monocular 3D detection algorithm.
  • the monocular 3D detection algorithm may include but not limited to a fully convolutional single-stage 3D monocular target detection method (Fully Convolutional One-Stage Monocular 3D Object Detection, FCOS 3D) and a real-time monocular 3D target detection algorithm ( Real-time Monocular 3D Object Detection, RTM 3D).
  • the 3D detection frame of the object included in the image in the three-dimensional space (for example, the camera coordinate system) of the image acquisition device and the 2D detection frame in the image coordinate system of the image can be simultaneously obtained through the monocular 3D detection algorithm.
  • the 3D detection frame in the three-dimensional space of the image acquisition device obtained by the detection algorithm may be represented by (x, y, z, rot, w, h, l), where (x, y, z) may be Characterize the coordinates of the center point of the 3D detection frame in the three-dimensional space of the image acquisition device, rot can represent the heading angle of the 3D detection frame, and (w, h, l) can represent the width, height and length of the 3D detection frame respectively.
  • the 2D detection frame obtained by the detection algorithm can be represented by (x1, y1, x2, y2), where (x1, y1) can represent the coordinates of the upper left corner of the 2D detection frame in the image coordinate system, and (x2, y2) can represent The coordinates of the lower right corner of the 2D detection box in the image coordinate system.
  • the image acquisition device may include one or more.
  • the first image may be an acquired image.
  • the first image may be P2 or P3.
  • object detection may be performed on the captured image to obtain the 3D detection frame of the object on the captured image in the three-dimensional space of the image capture device, and the 3D detection frame of the object on the captured image.
  • the 2D detection frame, the 2D detection frame on the collected image is the 2D detection frame in the image coordinate system of the collected image.
  • the 3D detection frame in the three-dimensional space of the image capture device may be determined as the 3D detection frame in the target three-dimensional space, or the 3D detection frame in the three-dimensional space of the image capture device may be determined as The 3D detection frame is mapped to the target coordinate system, and the 3D detection frame in the target coordinate system is determined as the 3D detection frame in the target three-dimensional space.
  • the target coordinate system For specific details about the target coordinate system, reference may be made to FIG. 2 and its related descriptions, which will not be repeated here.
  • the first image may include multiple capture images, and the multiple capture images may be images captured by multiple image capture devices at the same capture time.
  • the first image may include P 12 , P 22 and P 32 , or include P 13 , P 23 and P 33 .
  • object detection may be performed on the multiple captured images to obtain a 3D detection frame of the object on each captured image in the three-dimensional space of each image capture device And a 2D detection frame on the captured image.
  • the 3D detection frame in the three-dimensional space in each image acquisition device and the 2D detection frame on each captured image can be processed separately to obtain the 3D detection frame in the target three-dimensional space and the 3D detection frame in the first image 2D detection boxes on .
  • the 3D detection frame and the 2D detection frame when the first image is a plurality of acquired images refer to FIG. 2 and its related descriptions, which will not be repeated here.
  • Step 130 Predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of the target object in the second image, the second image is the first image in the image acquisition The previous image in the sequence.
  • the first image and the second image may be images acquired by the image acquisition device at different acquisition moments, wherein the second image is an image obtained by the image acquisition device at the last acquisition moment of the first image acquisition moment , that is, the second image is the previous image of the first image in the image acquisition sequence.
  • the target object may be one or more objects included in the second image, and the target objects may include objects of different categories. For example, objects of pedestrian class and objects of vehicle class etc.
  • the tracking result of tracking the target object with respect to the second image may include 3D tracking position information, 2D tracking position information and motion data of the target object corresponding to the second image.
  • the manner of determining the 3D tracking position information and the 2D tracking position information reference may be made to the following step 140 and related descriptions, and details are not repeated here.
  • the motion data may include the rate of change of the target object's position on the image and the velocity and acceleration of the target object in the target three-dimensional space.
  • the position change rate of the target object on the image may refer to the position change rate between the 2D tracking information of the target object on the second image and the 2D tracking information of the target object on the first image. For details about determining the position change rate, reference may be made to the relevant description of the tracker below, and details are not repeated here.
  • the velocity and acceleration of the target object in the target three-dimensional space may refer to the velocity and acceleration of the target object at the acquisition moment corresponding to the second image.
  • the velocity and acceleration of the target object in the target three-dimensional space may be the velocity and acceleration of the pedestrian at the time t2. It can be understood that the velocity and acceleration is the walking speed and acceleration of pedestrians in real space.
  • the velocity and acceleration of the mobile device at the moment of capturing the second image may be determined as the velocity and acceleration of the target object in the target three-dimensional space.
  • the motion data may be the speed and acceleration of the vehicle at time t2. It can be understood that the speed and acceleration are the driving speed and acceleration of the vehicle in real space.
  • the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image may be predicted according to the tracking result of the target object tracked by the tracker on the second image.
  • predicting the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of the target object in the second image includes: updating the tracking according to the motion data The 3D tracking position information and the 2D tracking position information are input into the updated tracker, and the 3D prediction frame and the 2D prediction frame output by the tracker are obtained.
  • the tracker is capable of outputting a 2D predicted frame based on the rate of change of position and 2D tracked position information, and a 3D predicted frame based on velocity, acceleration, and 3D tracked position. In some embodiments, the value of the rotation angle of the 3D prediction frame set by the tracker remains unchanged.
  • the 2D tracking information of the target object may be a 2D detection frame corresponding to the 2D tracking information.
  • the 2D tracking information can be represented by (cx, cy, w, h), where (cx, cy) represents that the center point of the 2D detection frame corresponding to the 2D tracking information is in the image coordinate system of the first image
  • the coordinates in (w, h) can represent the width and height of the 2D detection frame.
  • the 3D tracking information of the target object may be a 3D detection frame corresponding to the 3D tracking information.
  • the 3D tracking information can be characterized by (x, y, rot), where (x, y) can represent the coordinates of the center point of the 3D detection frame corresponding to the 3D tracking information in the target three-dimensional space, and rot represents The rotation angle of the 3D detection box.
  • the 3D prediction frame of each target object in the target three-dimensional space and the 2D prediction frame on the first image may be predicted according to the tracking results of each target object tracked by multiple trackers on the second image frame.
  • the tracker corresponding to the target object may include the state transition function of the target object, and track the target object according to the state transition function of the target object and the second image. As a result of tracking, predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image.
  • the state transition function of the target object included in the tracker corresponding to the target object is the following formula (1):
  • (V cx , V cy , V w , V h ) represent the position change rate of the target object on the image, corresponding to (cx, cy, w, h)
  • V x and V y respectively represent the position of the target object along the target three-dimensional space
  • the velocities in the X-axis and Y-axis directions, a x and a y respectively represent the acceleration of the target object along the X-axis and Y-axis directions in the target three-dimensional space.
  • the rate of change of the position of the target object on the image is an initial value of 0.
  • the first image is P2, which includes target objects 1-5
  • the second image is P1, which includes objects 1-5
  • the following will describe the process of the tracker 1 corresponding to the target object 1 predicting the 3D prediction frame and the 2D prediction frame of the target object 1 for each captured image in the image capture sequence 1 with reference to an example.
  • the 2D detection frame and the 3D detection frame of the target object 1 corresponding to the second image P1 can be obtained.
  • the corresponding position of the target object 1 changes The rate 1 is 0, and the 3D tracking position information and 2D tracking position information corresponding to the second image P1 of the target object 1 is the 3D detection frame and 2D detection frame obtained by its object detection (that is, the 3D detection frame and 2D detection frame obtained through the above step 120).
  • the speed 1 and acceleration 1 of the target object 1 at the acquisition time corresponding to the second image P1 can be detected, and the tracker 1 is updated according to the position change rate 1, speed 1 and acceleration 1, that is, the above formula (1) Update, input the 3D tracking position information and 2D tracking position information into the updated tracker 1 to obtain the 3D prediction frame and 2D prediction frame of the target object 1 corresponding to the image at the next moment (that is, the first image P2) .
  • the 3D detection frame and the 2D detection frame of the object on the first image P2 can be obtained according to the object detection, and the object belonging to the same target as the target object 1 in the first image P2 can be obtained by means of the following step 140,
  • the position change rate 2 of the target object 1 on the second image P1 and the first image P2 can be obtained through the 2D detection frame corresponding to the object 1 and the 2D detection frame of the target object 1, through the position change rate 2 and the velocity and acceleration of the target object 1 at the acquisition moment of the first image P2, the 3D prediction frame and the 2D prediction frame of the target object 1 corresponding to the third image P3 can be obtained.
  • the position information of the target object 1 on each captured image in the image capture sequence can be obtained, so as to realize the tracking of the target object 1 on the image sequence.
  • the tracker uses a constant velocity model to realize the prediction or motion estimation of the 2D frame, which can reduce the calculation amount of predicting the 2D frame.
  • the tracker uses a uniform acceleration model to realize prediction or motion estimation of the 3D frame, which can improve the prediction accuracy of the 3D frame.
  • the angle of the 3D detection frame obtained through object detection has a large uncertainty.
  • Step 140 determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  • the method before determining the tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame, the method further includes: the 3D detection frame, The 2D detection frame, 3D prediction frame, and 2D prediction frame perform non-maximum value suppression processing.
  • non-maximum suppression processing By performing non-maximum suppression processing, overlapping 3D frames (ie, 3D detection frames or 3D prediction frames) and 2D frames (ie, 2D detection frames or 2D prediction frames) can be filtered out to avoid overlapping frames affecting subsequent 3D frames.
  • the matching between , and the matching between 2D boxes improves the accuracy of subsequent matching, thus improving the accuracy of object tracking.
  • determining the tracking result of tracking the target object with respect to the first image includes: according to the 3D prediction frame and each 3D detection frame Between the first intersection ratio and/or distance value, determine the target 3D detection frame matching with the 3D prediction frame from each 3D detection frame; use the object corresponding to the target 3D detection frame in the target three-dimensional space as the target object, and The position information of the target 3D detection frame in the target three-dimensional space is used as the 3D tracking position information of the target object, and the tracking result includes the 3D tracking position information.
  • the first intersection ratio may refer to an overlap rate between the 3D prediction frame and the 3D detection frame, that is, the ratio of the intersection and union of the 3D prediction frame and the 3D detection frame.
  • there may be one or more target objects, and correspondingly, the 3D prediction frame of the target object in the target three-dimensional space may also include one or more.
  • the first intersection ratio matrix may be determined based on the first intersection ratio between each 3D prediction frame and each 3D detection frame.
  • the distance value may be the distance between the center points of the 3D prediction frame and the 3D detection frame in the target three-dimensional space. The distance may include, but is not limited to, Manhattan distance or Euclidean distance, among others.
  • a distance matrix may be determined based on the distance value between each 3D prediction frame and each 3D detection frame.
  • the first intersection ratio matrix or distance matrix can be solved according to the Hungarian algorithm, the matching result between the 3D prediction frame and the 3D detection frame can be determined, and each 3D prediction frame can be determined from each 3D detection frame. The box matches the target 3D detection box.
  • target objects or objects can be different classes of objects.
  • the first intersection ratio matrix may be determined based on the first intersection ratio value between each 3D prediction frame and each 3D detection frame.
  • a distance matrix may be determined based on the distance values between each 3D prediction frame and each 3D detection frame.
  • the first cross-over-unit ratio matrix and the distance matrix can be solved separately according to the Hungarian algorithm, the matching result between the 3D prediction frame and the 3D detection frame can be determined, and each 3D detection frame can be determined from each 3D detection frame. The target 3D detection box that the predicted box matches.
  • the 3D prediction frames of target objects 1-5 are respectively N 3D-1 , N 3D-2 , N 3D-3 , N 3D-4 , N 3D-5
  • the first image includes objects 1-5
  • the 3D detection frames of objects 1-5 are respectively M 3D-1 , M 3D-2 , M 3D-3 , M 3D-4 , M 3D-5 as an example, if according to The Hungarian algorithm solves the first intersection ratio matrix and the distance matrix to obtain the matching results: N 3D-1 -M 3D-2 , N 3D-2 -M 3D-1 , N 3D-3 -M 3D-3 , then with The target 3D detection frame matched by 3D prediction frame N 3D-1 is 3D detection frame M 3D-2 , the target 3D detection frame matching 3D prediction frame N 3D-2 is 3D detection frame M 3D-1 , and 3D prediction frame N
  • the target 3D detection frame matched by 3D-3 is the 3D detection frame M 3D-3
  • determining the tracking result of tracking the target object for the first image further includes: for each 3D detection frame that does not match a 3D The 3D detection frame of the prediction frame, determine the 2D detection frame corresponding to the same object as the 3D detection frame, and determine the target 2D detection frame matching the 2D prediction frame according to the second intersection ratio between the 2D detection frame and the 2D prediction frame frame: the object corresponding to the target 2D detection frame on the first image is used as the target object, and the position information of the target 2D detection frame on the first image is used as the 2D tracking position information of the target object, and the tracking result includes the 2D tracking position information.
  • the method of determining the second intersection ratio between the 2D detection frame and the 2D prediction frame is similar to that of the first intersection ratio, and will not be repeated here.
  • the second intersection ratio matrix can be determined according to the second intersection ratio between the 2D detection frame and the 2D prediction frame, and the second intersection ratio matrix can be solved according to the Hungarian algorithm to determine the 2D detection frame The matching result with the 2D prediction frame, so as to determine the target 2D detection frame that matches the 2D prediction frame.
  • the 3D detection frames that do not match the 3D prediction frames include M 3D-4 and M 3D-5 , and it is determined that the 3D detection frames M 3D-4 and M 3D- 5 corresponding to the 2D detection frames M 2D-4 and M 2D-5 of the same object, and the 2D detection frames M 2D -4 and M 2D -5 and the 2D prediction frames N 2D-1 , N 2D-2 , N 2D-3 ,
  • the second intersection ratio matrix between N 2D-4 and N 2D-3 that is, the 2D prediction frame of the target object 1-5), and solve the second intersection ratio matrix according to the Hungarian algorithm to obtain the 2D detection frame and the matching results M 2D-4 -N 2D-4 , M 2D-5 -N 2D-5 between the 2D prediction frames.
  • the 3D prediction frame and the 2D prediction frame are the 3D frame and the 2D frame corresponding to the target object in the second image
  • the 3D detection frame and the 3D detection frame are the 3D frame and the 2D frame corresponding to the object in the first image.
  • the detection frame is matched, and the target object and object belonging to the same target in the second image and the first image can be determined to realize target tracking. For example, taking the above matching result M 2D-4 -N 2D-4 as an example, it can be determined that the object 4 in the second image at the previous moment belongs to the same object as the target object 4 in the first image at the next moment.
  • target objects and objects belonging to the same target may be assigned the same ID.
  • the 3D detection frame is first matched with the 3D prediction frame, and then the 2D detection frame is matched with the 2D prediction frame, and a two-stage matching is adopted.
  • 3D frames that is, 3D detection frames and 3D prediction frames
  • 2D frames that is, 2D detection frames and 2D prediction frames
  • the situation of missing matching can be reduced, that is, the matching accuracy between the target object in the image at the previous moment and the object in the image at the next moment can be improved, and the target object can be reduced.
  • Object-to-object missing matching thereby improving the tracking accuracy of the same target in images at different times.
  • a new tracker may be created for the target object for which neither the 3D predicted frame nor the 2D predicted frame is successfully matched.
  • the tracker corresponding to the target object that fails to match the 3D prediction frame and the 2D prediction frame for a preset number of times may be discarded.
  • the preset number of times can be specifically set according to actual conditions. It can be seen from the above that the 3D prediction frame and the 2D prediction frame correspond to two-stage matching. If there is no matching result for the target object in the two-stage matching of the preset number of times, it can be considered that the tracker failed to track the target object, and the target object of trackers dropped.
  • the tracking result of the target object is determined through the 3D frame and 2D frame information, and the 3D information is introduced, so that the target tracking method of the present disclosure can continue to target the target object in the three-dimensional space for a period of time after the target object is lost.
  • the motion estimation in the system improves the probability of successful matching after the target object reappears, and reduces the ID switching caused by the target missing or out of the field of view, that is, the error tracking of the target is reduced.
  • Fig. 2 is a flowchart of determining a 3D detection frame and a 2D detection frame according to an exemplary embodiment. As shown in Fig. 2 , the method includes the following steps.
  • Step 210 performing object detection on the plurality of captured images to obtain the 3D detection frame of the object on each of the captured images in the three-dimensional space of each of the image capture devices and the 2D detection frame on the captured images. detection box.
  • step 210 The specific details of step 210 are similar to step 120, for details, please refer to the above-mentioned step 210 and related descriptions, which will not be repeated here.
  • object detection can be performed on the captured image P1 to obtain the 3D detection frame of the object on the captured image P1 in the three-dimensional space of the image capture device 1 and The 2D detection frame in the image coordinate system of the captured image P1; the object detection of the captured images P2 and P3 is similar to that of the captured image P1, and will not be repeated here.
  • Step 220 Map the 3D detection frames located in the three-dimensional space of different image acquisition devices to the same target coordinate system according to the external parameters of each of the image capture devices, and the target three-dimensional space is the space defined by the target coordinate system .
  • the target coordinate system can be determined according to the position set by the image acquisition device.
  • the target coordinate system may be the own vehicle coordinate system corresponding to the first image in the image acquisition sequence.
  • the target coordinate system may be a coordinate system determined based on the preset fixed position, and the origin, X axis, Y axis, and Z axis of the coordinate system may be specifically set according to actual conditions .
  • the extrinsic parameters of each image acquisition coordinate system may reflect the pose relationship between the image acquisition coordinate system and the target coordinate system. Extrinsic parameters can include translation parameters and rotation parameters.
  • the external parameters of each image acquisition coordinate system can be obtained by calibrating the image acquisition device. Regarding the calibration of the image acquisition device, reference may be made to related technologies, which will not be repeated here.
  • Step 230 stitching the plurality of collected images to obtain a stitched image, and mapping the 2D detection frames on the plurality of collected images to the stitched image, the 2D detection frame on the first image is the 2D detection boxes on the stitched image.
  • the 2D detection frame on the stitched image may refer to the 2D detection frame in the image coordinate system of the stitched image.
  • the 2D detection frames in the image coordinate system of each captured image can be converted to the image coordinate system of the spliced image, that is, corresponding to different image coordinate systems.
  • the 2D detection frame is transformed to the same image coordinate system.
  • mapping the 3D detection frames in the three-dimensional space of different image acquisition devices to the same target coordinate system, and mapping the 2D detection frames on multiple captured images to the spliced image that is, multiple
  • the detection results of multiple acquired images of the image acquisition device are fused, and the targets in multiple acquired images can be tracked at the same time, so that only one tracking algorithm is needed to realize the tracking of the same target in different image acquisition images.
  • the problem of inefficiency caused by separately tracking the target in the captured image of each image capture device is avoided, and the problem of ID switching of the same target in different image capture devices is reduced.
  • Fig. 3 is a block diagram of an object tracking device 300 according to an exemplary embodiment.
  • the device includes an acquisition module 310 , a detection module 320 , a prediction module 330 and a determination module 340 .
  • the acquisition module 310 is configured to acquire an image acquisition sequence, the image acquisition sequence is obtained according to the acquired images of the image acquisition device at multiple acquisition moments;
  • the detection module 320 is configured to perform object detection on the first image in the image acquisition sequence, and obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image.
  • a detection frame, the first image is any image in the image acquisition sequence except the first image;
  • the prediction module 330 is configured to predict a 3D prediction frame of the target object in the target three-dimensional space and a 2D prediction frame on the first image according to the tracking result of the target object being tracked on the second image,
  • the second image is a previous image in the image acquisition sequence of the first image;
  • the determining module 340 is configured to determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  • the determining module 340 is further configured to:
  • the tracking result includes the 3D tracking location information.
  • the determining module 340 is further configured to:
  • the tracking result includes the 2D tracking location information.
  • the tracking result includes 3D tracking position information of the target object corresponding to the second image, 2D tracking position information, and motion data of the target object;
  • the prediction module 330 is further configured to:
  • the 3D tracking position information and the 2D tracking position information are input into the updated tracker to obtain the 3D prediction frame and the 2D prediction frame output by the tracker.
  • the motion data includes the rate of change of the target object's position on the image, and the velocity and acceleration of the target object in the target three-dimensional space;
  • the tracker is capable of outputting the 2D predicted frame based on the position change rate and the 2D tracked position information, and outputting the 3D predicted frame based on the velocity, the acceleration, and the 3D tracked position information.
  • the first image includes multiple captured images, and the multiple captured images are images captured by multiple image capture devices at the same capture moment;
  • the detection module 320 is further configured to:
  • the determining module 340 is further configured to:
  • a non-maximum suppression process is performed on the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  • the present disclosure also provides a computer-readable storage medium, on which computer program instructions are stored, and when the program instructions are executed by a processor, the steps of the target tracking method provided in the present disclosure are realized.
  • Fig. 4 is a block diagram of an apparatus 400 for object tracking according to an exemplary embodiment.
  • the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • the device 400 may include one or more of the following components: a processing component 402, a memory 404, a power component 406, a multimedia component 408, an audio component 410, an input/output (I/O) interface 412, a sensor component 414, and communication component 416 .
  • the processing component 402 generally controls the overall operations of the device 400, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 402 may include one or more processors 420 to execute instructions to complete all or part of the steps of the above method for object tracking.
  • processing component 402 may include one or more modules that facilitate interaction between processing component 402 and other components.
  • processing component 402 may include a multimedia module to facilitate interaction between multimedia component 408 and processing component 402 .
  • the memory 404 is configured to store various types of data to support operations at the device 400 . Examples of such data include instructions for any application or method operating on device 400, contact data, phonebook data, messages, pictures, videos, etc.
  • the memory 404 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power component 406 provides power to various components of device 400 .
  • Power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 400 .
  • the multimedia component 408 includes a screen that provides an output interface between the device 400 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action.
  • the multimedia component 408 includes a front camera and/or a rear camera. When the device 400 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 410 is configured to output and/or input audio signals.
  • the audio component 410 includes a microphone (MIC), which is configured to receive external audio signals when the device 400 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 404 or sent via communication component 416 .
  • the audio component 410 also includes a speaker for outputting audio signals.
  • the I/O interface 412 provides an interface between the processing component 402 and a peripheral interface module.
  • the peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 414 includes one or more sensors for providing status assessments of various aspects of device 400 .
  • the sensor component 414 can detect the open/closed state of the device 400, the relative positioning of components, such as the display and keypad of the device 400, and the sensor component 414 can also detect a change in the position of the device 400 or a component of the device 400 , the presence or absence of user contact with the device 400 , the device 400 orientation or acceleration/deceleration and the temperature change of the device 400 .
  • the sensor assembly 414 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 414 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices.
  • the device 400 can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 416 also includes a near field communication (NFC) module to facilitate short-range communication.
  • NFC near field communication
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth
  • apparatus 400 may be programmed by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above object tracking method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the above object tracking method.
  • non-transitory computer-readable storage medium including instructions, such as the memory 404 including instructions, which can be executed by the processor 420 of the device 400 to implement the above object tracking method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • a computer program product comprising a computer program executable by a programmable device, the computer program having a function for performing the above-mentioned The code section of the object tracking method.
  • Fig. 5 is a block diagram of an apparatus 500 for object tracking according to an exemplary embodiment.
  • the apparatus 500 may be provided as a server.
  • apparatus 500 includes processing component 522, which further includes one or more processors, and memory resources represented by memory 532 for storing instructions executable by processing component 522, such as application programs.
  • the application program stored in memory 532 may include one or more modules each corresponding to a set of instructions.
  • the processing component 522 is configured to execute instructions to perform the above object tracking method.
  • Device 500 may also include a power component 526 configured to perform power management of device 500 , a wired or wireless network interface 550 configured to connect device 500 to a network, and an input-output (I/O) interface 558 .
  • the apparatus 500 may operate based on an operating system stored in the memory 532, such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a target tracking method and apparatus, and a storage medium. The method comprises: acquiring an image collection sequence; performing object detection on a first image in the image collection sequence, so as to obtain a 3D detection box, in the first image, of an object in a target three-dimensional space, and a 2D detection box of same in the first image, wherein the first image is any image in the image collection sequence other than the image ranking first therein; according to a tracking result of tracking a target object on the basis of a second image, predicting a 3D prediction box of the target object in the target three-dimensional space, and a 2D prediction box of same in the first image, wherein the second image is the previous image of the first image in the image collection sequence; and according to the 3D detection box, the 2D detection box, the 3D prediction box and the 2D prediction box, determining a tracking result of tracking the target object on the basis of the first image. By means of the target tracking method in the present disclosure, the accuracy of tracking a target object can be improved.

Description

目标跟踪方法、装置及存储介质Target tracking method, device and storage medium 技术领域technical field
本公开涉及计算机视觉技术领域,尤其涉及一种目标跟踪方法、装置及存储介质。The present disclosure relates to the technical field of computer vision, and in particular to an object tracking method, device and storage medium.
背景技术Background technique
目标跟踪是通过给定一个图像序列,找到图像序列中的目标,将不同帧的同一目标进行识别,并为不同帧的同一目标赋予ID。相关技术中,通常根据2D信息进行目标跟踪,然而,仅利用2D信息难以对目标进行准确的运动估计,导致出现错误跟踪的情况。Target tracking is to find the target in the image sequence by giving an image sequence, identify the same target in different frames, and assign ID to the same target in different frames. In related technologies, target tracking is usually performed based on 2D information. However, it is difficult to accurately estimate the target's motion only by using 2D information, resulting in false tracking.
发明内容Contents of the invention
为克服相关技术中存在的问题,本公开提供一种目标跟踪方法、装置及存储介质。In order to overcome the problems existing in related technologies, the present disclosure provides a target tracking method, device and storage medium.
根据本公开实施例的第一方面,提供一种目标跟踪方法,包括:According to a first aspect of an embodiment of the present disclosure, a target tracking method is provided, including:
获取图像采集序列,所述图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的;Obtaining an image acquisition sequence, the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,所述第一图像为所述图像采集序列中的除首张图像以外的任一图像;performing object detection on the first image in the image acquisition sequence to obtain a 3D detection frame of the object on the first image in the target three-dimensional space and a 2D detection frame on the first image, the first The image is any image in the image acquisition sequence except the first image;
根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述目标三维空间中的3D预测框以及在所述第一图像上的2D预测框,所述第二图像是所述第一图像在所述图像采集序列中的上一图像;Predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of tracking the target object with respect to the second image, the second image is the The previous image of the first image in the image acquisition sequence;
根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果。Determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
在一些实施例中,所述根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果,包括:In some embodiments, according to the 3D detection frame, the 2D detection frame, the 3D prediction frame and the 2D prediction frame, determine a tracking result of tracking the target object with respect to the first image ,include:
根据所述3D预测框和每一所述3D检测框之间的第一交并比值和/或距离值,从各所述3D检测框中确定与所述3D预测框匹配的目标3D检测框;determining a target 3D detection frame matching the 3D prediction frame from each of the 3D detection frames according to the first intersection ratio and/or distance value between the 3D prediction frame and each of the 3D detection frames;
将所述目标三维空间中对应所述目标3D检测框的对象作为所述目标对象,并将所述目标3D检测框在所述目标三维空间中的位置信息作为所述目标对象的3D跟踪位置信息,所述跟踪结果包括所述3D跟踪位置信息。Taking the object corresponding to the target 3D detection frame in the target three-dimensional space as the target object, and using the position information of the target 3D detection frame in the target three-dimensional space as the 3D tracking position information of the target object , the tracking result includes the 3D tracking location information.
在一些实施例中,所述根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果,还包括:In some embodiments, according to the 3D detection frame, the 2D detection frame, the 3D prediction frame and the 2D prediction frame, determine a tracking result of tracking the target object with respect to the first image ,Also includes:
针对各所述3D检测框中未匹配到所述3D预测框的3D检测框,确定与该3D检测框对应同一对象的2D检测框,并根据该2D检测框与所述2D预测框之间的第二交并比值,确定与所述2D预测框匹配的目标2D检测框;For a 3D detection frame that does not match the 3D prediction frame in each of the 3D detection frames, determine a 2D detection frame corresponding to the same object as the 3D detection frame, and The second intersection ratio is used to determine the target 2D detection frame matching the 2D prediction frame;
将所述第一图像上对应所述目标2D检测框的对象作为所述目标对象,并将所述目标2D检测框在所述第一图像上的位置信息作为所述目标对象的2D跟踪位置信息,所述跟踪结果包括所述2D跟踪位置信息。Taking the object corresponding to the target 2D detection frame on the first image as the target object, and using the position information of the target 2D detection frame on the first image as the 2D tracking position information of the target object , the tracking result includes the 2D tracking location information.
在一些实施例中,所述跟踪结果包括所述目标对象对应所述第二图像的3D跟踪位置信息、2D跟踪位置信息以及所述目标对象的运动数据;In some embodiments, the tracking result includes 3D tracking position information of the target object corresponding to the second image, 2D tracking position information, and motion data of the target object;
所述根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述三维空间中的3D预测框以及在所述第一图像上的2D预测框,包括:The predicting the 3D prediction frame of the target object in the three-dimensional space and the 2D prediction frame on the first image according to the tracking result of tracking the target object with respect to the second image includes:
根据所述运动数据更新跟踪器;updating a tracker based on said motion data;
将所述3D跟踪位置信息和2D跟踪位置信息输入到更新后的跟踪器中,得到所述跟踪 器输出的所述3D预测框以及所述2D预测框。The 3D tracking position information and the 2D tracking position information are input into the updated tracker to obtain the 3D prediction frame and the 2D prediction frame output by the tracker.
在一些实施例中,所述运动数据包括所述目标对象在图像上的位置变化率,以及所述目标对象在所述目标三维空间中的速度以及加速度;In some embodiments, the motion data includes the rate of change of the target object's position on the image, and the velocity and acceleration of the target object in the target three-dimensional space;
所述跟踪器能够基于所述位置变化率以及所述2D跟踪位置信息输出所述2D预测框,以及基于所述速度、所述加速度以及所述3D跟踪位置信息输出所述3D预测框。The tracker is capable of outputting the 2D predicted frame based on the position change rate and the 2D tracked position information, and outputting the 3D predicted frame based on the velocity, the acceleration, and the 3D tracked position information.
在一些实施例中,所述第一图像包括多个采集图像,所述多个采集图像是多个图像采集设备在同一采集时刻采集到的图像;In some embodiments, the first image includes multiple captured images, and the multiple captured images are images captured by multiple image capture devices at the same capture moment;
所述对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,包括:The performing object detection on the first image in the image acquisition sequence to obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image includes:
对所述多个采集图像均进行对象检测,得到每一所述采集图像上的对象在每一所述图像采集设备的三维空间中的3D检测框以及在所述采集图像上的2D检测框;performing object detection on the multiple captured images to obtain a 3D detection frame of the object on each of the captured images in the three-dimensional space of each of the image capture devices and a 2D detection frame on the captured images;
根据每一所述图像采集设备的外参将位于不同图像采集设备的三维空间中的3D检测框映射到同一目标坐标系下,所述目标三维空间是所述目标坐标系限定的空间;mapping the 3D detection frames located in the three-dimensional spaces of different image acquisition devices to the same target coordinate system according to the external parameters of each of the image capture devices, and the target three-dimensional space is a space defined by the target coordinate system;
对所述多个采集图像进行拼接,得到拼接图像,并将所述多个采集图像上的2D检测框映射到所述拼接图像上,所述第一图像上的2D检测框为所述拼接图像上的2D检测框。Stitching the multiple collected images to obtain a spliced image, and mapping the 2D detection frames on the multiple collected images to the spliced image, where the 2D detection frame on the first image is the spliced image 2D detection boxes on .
在一些实施例中,在所述根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果之前,所述方法还包括:In some embodiments, according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame, determine the tracking of the target object for the first image Before the result, the method also includes:
对所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框执行非极大值抑制处理。A non-maximum suppression process is performed on the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
根据本公开实施例的第二方面,提供一种目标跟踪装置,包括:According to a second aspect of an embodiment of the present disclosure, a target tracking device is provided, including:
获取模块,被配置为获取图像采集序列,所述图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的;The acquisition module is configured to acquire an image acquisition sequence, the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
检测模块,被配置为对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,所述第一图像为所述图像采集序列中的除首张图像以外的任一图像;The detection module is configured to perform object detection on the first image in the image acquisition sequence, and obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image box, the first image is any image in the image acquisition sequence except the first image;
预测模块,被配置为根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述目标三维空间中的3D预测框以及在所述第一图像上的2D预测框,所述第二图像是所述第一图像在所述图像采集序列中的上一图像;The prediction module is configured to predict a 3D prediction frame of the target object in the target three-dimensional space and a 2D prediction frame on the first image according to the tracking result of the target object being tracked with respect to the second image, The second image is a previous image of the first image in the image acquisition sequence;
确定模块,被配置为根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果。The determination module is configured to determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
在一些实施例中,所述确定模块进一步被配置为:In some embodiments, the determining module is further configured to:
根据所述3D预测框和每一所述3D检测框之间的第一交并比值和/或距离值,从各所述3D检测框中确定与所述3D预测框匹配的目标3D检测框;determining a target 3D detection frame matching the 3D prediction frame from each of the 3D detection frames according to the first intersection ratio and/or distance value between the 3D prediction frame and each of the 3D detection frames;
将所述目标三维空间中对应所述目标3D检测框的对象作为所述目标对象,并将所述目标3D检测框在所述目标三维空间中的位置信息作为所述目标对象的3D跟踪位置信息,所述跟踪结果包括所述3D跟踪位置信息。Taking the object corresponding to the target 3D detection frame in the target three-dimensional space as the target object, and using the position information of the target 3D detection frame in the target three-dimensional space as the 3D tracking position information of the target object , the tracking result includes the 3D tracking location information.
在一些实施例中,所述确定模块进一步被配置为:In some embodiments, the determining module is further configured to:
针对各所述3D检测框中未匹配到所述3D预测框的3D检测框,确定与该3D检测框对应同一对象的2D检测框,并根据该2D检测框与所述2D预测框之间的第二交并比值,确定与所述2D预测框匹配的目标2D检测框;For a 3D detection frame that does not match the 3D prediction frame in each of the 3D detection frames, determine a 2D detection frame corresponding to the same object as the 3D detection frame, and The second intersection ratio is used to determine the target 2D detection frame matching the 2D prediction frame;
将所述第一图像上对应所述目标2D检测框的对象作为所述目标对象,并将所述目标2D检测框在所述第一图像上的位置信息作为所述目标对象的2D跟踪位置信息,所述跟踪结果包括所述2D跟踪位置信息。Taking the object corresponding to the target 2D detection frame on the first image as the target object, and using the position information of the target 2D detection frame on the first image as the 2D tracking position information of the target object , the tracking result includes the 2D tracking location information.
在一些实施例中,所述跟踪结果包括所述目标对象对应所述第二图像的3D跟踪位置信息、2D跟踪位置信息以及所述目标对象的运动数据;In some embodiments, the tracking result includes 3D tracking position information of the target object corresponding to the second image, 2D tracking position information, and motion data of the target object;
所述预测模块进一步被配置为:The prediction module is further configured to:
根据所述运动数据更新跟踪器;updating a tracker based on said motion data;
将所述3D跟踪位置信息和2D跟踪位置信息输入到更新后的跟踪器中,得到所述跟踪器输出的所述3D预测框以及所述2D预测框。The 3D tracking position information and the 2D tracking position information are input into the updated tracker to obtain the 3D prediction frame and the 2D prediction frame output by the tracker.
在一些实施例中,所述运动数据包括所述目标对象的在图像上的位置变化率,以及所述目标对象在所述目标三维空间中的速度以及加速度;In some embodiments, the motion data includes the rate of change of the target object's position on the image, and the velocity and acceleration of the target object in the target three-dimensional space;
所述跟踪器能够基于所述位置变化率以及所述2D跟踪位置信息输出所述2D预测框,以及基于所述速度、所述加速度以及所述3D跟踪位置信息输出所述3D预测框。The tracker is capable of outputting the 2D predicted frame based on the position change rate and the 2D tracked position information, and outputting the 3D predicted frame based on the velocity, the acceleration, and the 3D tracked position information.
在一些实施例中,所述第一图像包括多个采集图像,所述多个采集图像是多个图像采集设备在同一采集时刻采集到的图像;In some embodiments, the first image includes multiple captured images, and the multiple captured images are images captured by multiple image capture devices at the same capture moment;
所述检测模块进一步被配置为:The detection module is further configured to:
对所述多个采集图像均进行对象检测,得到每一所述采集图像上的对象在每一所述图像采集设备的三维空间中的3D检测框以及在所述采集图像上的2D检测框;performing object detection on the multiple captured images to obtain a 3D detection frame of the object on each of the captured images in the three-dimensional space of each of the image capture devices and a 2D detection frame on the captured images;
根据每一所述图像采集设备的外参将位于不同图像采集设备的三维空间中的3D检测框映射到同一目标坐标系下,所述目标三维空间是所述目标坐标系限定的空间;mapping the 3D detection frames located in the three-dimensional spaces of different image acquisition devices to the same target coordinate system according to the external parameters of each of the image capture devices, and the target three-dimensional space is a space defined by the target coordinate system;
对所述多个采集图像进行拼接,得到拼接图像,并将所述多个采集图像上的2D检测框映射到所述拼接图像上,所述第一图像上的2D检测框为所述拼接图像上的2D检测框。Stitching the multiple collected images to obtain a spliced image, and mapping the 2D detection frames on the multiple collected images to the spliced image, where the 2D detection frame on the first image is the spliced image 2D detection boxes on .
在一些实施例中,所述确定模块进一步被配置为:In some embodiments, the determining module is further configured to:
对所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框执行非极大值抑制处理。A non-maximum suppression process is performed on the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
根据本公开实施例的第三方面,提供一种计算机可读存储介质,其上存储有计算机程序指令,该程序指令被处理器执行时实现本公开第一方面所提供的目标跟踪方法的步骤。According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which computer program instructions are stored, and when the program instructions are executed by a processor, the steps of the target tracking method provided in the first aspect of the present disclosure are implemented.
根据本公开实施例的第四方面,提供一种目标跟踪装置,包括:According to a fourth aspect of an embodiment of the present disclosure, a target tracking device is provided, including:
处理器;processor;
用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
其中,所述处理器被配置为:Wherein, the processor is configured as:
获取图像采集序列,所述图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的;Obtaining an image acquisition sequence, the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,所述第一图像为所述图像采集序列中的除首张图像以外的任一图像;performing object detection on the first image in the image acquisition sequence to obtain a 3D detection frame of the object on the first image in the target three-dimensional space and a 2D detection frame on the first image, the first The image is any image in the image acquisition sequence except the first image;
根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述目标三维空间中的3D预测框以及在所述第一图像上的2D预测框,所述第二图像是所述第一图像在所述图像采集序列中的上一图像;Predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of tracking the target object with respect to the second image, the second image is the The previous image of the first image in the image acquisition sequence;
根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果。Determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
根据本公开实施例的第五方面,提供一种计算机程序产品,该计算机程序产品包含能够由可编程的装置执行的计算机程序,该计算机程序具有当由该可编程的装置执行时用于实现本公开第一方面所提供的目标跟踪方法的步骤。According to a fifth aspect of the embodiments of the present disclosure, a computer program product is provided, the computer program product includes a computer program executable by a programmable device, and the computer program has a function for realizing the present invention when executed by the programmable device. The steps of the target tracking method provided by the first aspect are disclosed.
本公开的实施例提供的技术方案可以包括以下有益效果:通过3D框信息(例如,3D检测框和3D预测框)和2D框信息(例如,2D检测框和2D预测框)确定目标对象的跟 踪结果,引入3D框信息,使得本公开的目标跟踪方法可以在目标对象丢失后的一段时间里继续对目标对象进行目标三维空间中的运动估计,提升了在目标对象重新出现后的成功匹配概率,减少了因目标漏检或出视野等情况引起的ID切换,即减少目标的错误跟踪的情况。同时,结合3D框信息和2D框信息确定目标对象的跟踪结果,可以提高目标对象的跟踪准确度。The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects: determine the tracking of the target object through 3D frame information (for example, 3D detection frame and 3D prediction frame) and 2D frame information (for example, 2D detection frame and 2D prediction frame) As a result, the 3D frame information is introduced, so that the target tracking method of the present disclosure can continue to perform motion estimation on the target object in the target three-dimensional space for a period of time after the target object is lost, which improves the successful matching probability after the target object reappears, Reduce the ID switching caused by the target missing or out of view, that is, reduce the wrong tracking of the target. At the same time, combining the 3D frame information and the 2D frame information to determine the tracking result of the target object can improve the tracking accuracy of the target object.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.
附图说明Description of drawings
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and understandable from the following description of the embodiments in conjunction with the accompanying drawings, wherein:
图1是根据一示例性实施例示出的一种目标跟踪方法的流程图。Fig. 1 is a flow chart showing a method for tracking a target according to an exemplary embodiment.
图2是根据一示例性实施例示出的确定3D检测框和2D检测框的流程图。Fig. 2 is a flow chart of determining a 3D detection frame and a 2D detection frame according to an exemplary embodiment.
图3是根据一示例性实施例示出的一种目标跟踪装置的框图。Fig. 3 is a block diagram of an object tracking device according to an exemplary embodiment.
图4是根据一示例性实施例示出的一种用于目标跟踪的装置的框图。Fig. 4 is a block diagram of a device for target tracking according to an exemplary embodiment.
图5是根据一示例性实施例示出的一种用于目标跟踪的装置的框图。Fig. 5 is a block diagram of a device for target tracking according to an exemplary embodiment.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.
在一些实施例中,本公开的目标跟踪方法可以应用于不同的场景。例如,可以应用于自动驾驶场景,对车辆上的图像采集设备采集的图像中的目标进行跟踪。又例如,可以应用于交通监控场景,对交通监控系统中的图像采集设备采集的图像中的目标进行跟踪。应当理解的,本公开所提到的目标跟踪方法的应用场景仅仅是本公开一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以将目标跟踪方法应用于其他类似场景,例如,还可以应用于移动机器人的目标跟踪,本公开对此不做限制。In some embodiments, the object tracking method of the present disclosure can be applied to different scenarios. For example, it can be applied to automatic driving scenarios to track targets in images collected by image acquisition devices on vehicles. For another example, it can be applied to a traffic monitoring scene to track a target in an image captured by an image acquisition device in a traffic monitoring system. It should be understood that the application scenarios of the target tracking method mentioned in the present disclosure are only some examples or embodiments of the present disclosure, and those of ordinary skill in the art can also use the target tracking method without creative work. The method is applicable to other similar scenarios, for example, it can also be applied to target tracking of a mobile robot, which is not limited in the present disclosure.
相关技术中,通常采用2D信息对一个或多个图像采集设备采集的图像中的目标进行跟踪,然而,由于目标投影到2D图像上存在仿射变换,因此,在2D图像上难以对目标进行准确的运动估计,从而,利用2D信息无法对目标进行准确跟踪,导致为目标匹配错误的ID。且利用2D信息进行目标跟踪时,目标一旦丢失,则难以找回。除此之外,相关技术中通常对多个图像采集设备采集的图像分别进行跟踪处理,不仅效率不高,且无法处理相邻图像采集设备的图像中重叠的目标。In related technologies, 2D information is usually used to track targets in images captured by one or more image acquisition devices. However, due to the affine transformation of targets projected onto 2D images, it is difficult to accurately track targets on 2D images. Therefore, the target cannot be accurately tracked using 2D information, resulting in matching the wrong ID for the target. And when using 2D information for target tracking, once the target is lost, it is difficult to retrieve it. In addition, in the related art, images acquired by multiple image acquisition devices are usually tracked separately, which is not only inefficient, but also cannot deal with overlapping targets in images of adjacent image acquisition devices.
图1是根据一示例性实施例示出的一种目标跟踪方法的流程图,如图1所示,该方法包括以下步骤。Fig. 1 is a flowchart of a target tracking method according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.
步骤110,获取图像采集序列,图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的。Step 110, acquiring an image acquisition sequence, which is obtained according to the images acquired by the image acquisition device at multiple acquisition moments.
在一些实施例中,图像采集序列可以是根据一个或多个图像采集设备在多个采集时刻的采集图像得到的。针对一个图像采集设备,图像采集序列中每个采集时刻的采集图像可以是该图像采集设备在该采集时刻的采集图像;针对多个图像采集设备,图像采集序列中每个采集时刻的采集图像可以是该多个图像采集设备在该采集时刻的采集图像。In some embodiments, the image acquisition sequence may be obtained according to images acquired by one or more image acquisition devices at multiple acquisition moments. For an image acquisition device, the acquired image at each acquisition moment in the image acquisition sequence may be the acquired image of the image acquisition device at the acquisition moment; for multiple image acquisition devices, the acquired image at each acquisition moment in the image acquisition sequence may be is the captured image of the plurality of image capturing devices at the capturing moment.
示例地,以一个图像采集设备为图像采集设备1,该图像采集设备1在多个采集时刻t1、t2以及t3的采集图像为P1,P2,P3为例,则图像采集序列1可以是(P1,P2,P3)。 以多个图像采集设备为图像采集设备1-3为例,若图像采集设备1在多个采集时刻t1、t2以及t3的采集图像为P 11、P 12、P 13,图像采集设备2在多个采集时刻t1、t2以及t3的采集图像为P 21、P 22、P 23,图像采集设备3在多个采集时刻t1、t2以及t3的采集图像为P 31、P 32、P 33为例,则图像采集序列2可以是(P 11P 21P 11,P 12P 22P 32,P 13P 23P 33)。 Illustratively, taking an image acquisition device as image acquisition device 1, and the acquired images of the image acquisition device 1 at multiple acquisition moments t1, t2 and t3 are P1, P2, P3 as an example, then the image acquisition sequence 1 may be (P1 , P2, P3). Taking multiple image acquisition devices as image acquisition devices 1-3 as an example, if the images acquired by image acquisition device 1 at multiple acquisition times t1, t2 and t3 are P 11 , P 12 , P 13 , image acquisition device 2 is The acquired images at the acquisition times t1, t2 and t3 are P 21 , P 22 and P 23 , and the acquired images of the image acquisition device 3 at multiple acquisition times t1, t2 and t3 are P 31 , P 32 and P 33 as an example, Then the image acquisition sequence 2 may be (P 11 P 21 P 11 , P 12 P 22 P 32 , P 13 P 23 P 33 ).
在一些实施例中,图像采集设备可以包括但不限于摄像机和相机。图像采集设备可以设置于预设固定位置上或设置于移动设备中,预设固定位置和移动设备可以根据实际需求进行具体设置。例如,移动设备可以是自动驾驶车辆。在一些实施例中,图像采集设备可以是自动驾驶车辆中包括的一个或多个相机。In some embodiments, image capture devices may include, but are not limited to, video cameras and cameras. The image acquisition device can be set at a preset fixed position or in a mobile device, and the preset fixed position and the mobile device can be specifically set according to actual needs. For example, a mobile device could be an autonomous vehicle. In some embodiments, the image capture device may be one or more cameras included in the autonomous vehicle.
在一些实施例中,当图像采集设备为多个时,多个图像采集设备的采集方向可以不同。示例地,仍以多个图像采集设备为图像采集设备1-3为例,则图像采集设备1-3的采集方向可以分别是左方向、前向、右方向等。值得说明的是,多个图像采集设备的采集方向可以根据实际情况具体设置,本公开对此不做任何限制。In some embodiments, when there are multiple image capture devices, the capture directions of the multiple image capture devices may be different. For example, still taking the multiple image acquisition devices as the image acquisition devices 1-3 as an example, the acquisition directions of the image acquisition devices 1-3 may be left direction, forward direction, right direction, etc. respectively. It is worth noting that the collection directions of multiple image collection devices may be specifically set according to actual conditions, and this disclosure does not impose any limitation on this.
在一些实施例中,可以根据一个或多个图像采集设备采集的视频,获取图像采集序列。示例地,图像采集序列中的采集图像可以是视频中包括的图像帧。In some embodiments, an image capture sequence may be acquired based on video captured by one or more image capture devices. Exemplarily, the captured images in the image capturing sequence may be image frames included in the video.
步骤120,对图像采集序列中的第一图像进行对象检测,得到第一图像上的对象在目标三维空间中的3D检测框以及在第一图像上的2D检测框,第一图像为图像采集序列中的除首张图像以外的任一图像。 Step 120, perform object detection on the first image in the image acquisition sequence, and obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image, the first image is the image acquisition sequence Any image in , except the first image.
在一些实施例中,第一图像上的对象可以是指第一图像中包括的一个或多个目标,该对象可以包括不同类别的目标。例如,以第一图像为路况图像为例,则第一图像上的对象可以包括行人类别的目标和车辆类别的目标等。In some embodiments, the object on the first image may refer to one or more objects included in the first image, and the objects may include different types of objects. For example, taking the first image as an example of a road condition image, the objects on the first image may include objects of the pedestrian category and objects of the vehicle category.
在一些实施例中,可以根据单目3D检测算法对第一图像进行对象检测。在一些实施例中,单目3D检测算法可以包括但不限于全卷积单阶段3D单目目标检测方法(Fully Convolutional One-Stage Monocular 3D Object Detection,FCOS 3D)和实时单目3D目标检测算法(Real-time Monocular 3D Object Detection,RTM 3D)。In some embodiments, object detection may be performed on the first image according to a monocular 3D detection algorithm. In some embodiments, the monocular 3D detection algorithm may include but not limited to a fully convolutional single-stage 3D monocular target detection method (Fully Convolutional One-Stage Monocular 3D Object Detection, FCOS 3D) and a real-time monocular 3D target detection algorithm ( Real-time Monocular 3D Object Detection, RTM 3D).
通过单目3D检测算法可以同时得到图像中包括的对象在图像采集设备的三维空间(例如,相机坐标系)中的3D检测框和在该图像的图像坐标系中的2D检测框。在一些实施例中,检测算法得到的图像采集设备的三维空间中的3D检测框可以通过(x,y,z,rot,w,h,l)表示,其中,(x,y,z)可以表征3D检测框的中心点在图像采集设备的三维空间中的坐标,rot可以表征3D检测框的航向角,(w,h,l)可以分别表征3D检测框的宽度、高度和长度。检测算法得到的2D检测框可以通过(x1,y1,x2,y2)表示,其中,(x1,y1)可以表征2D检测框的左上角在图像坐标系中的坐标,(x2,y2)可以表征2D检测框的右下角在图像坐标系中的坐标。The 3D detection frame of the object included in the image in the three-dimensional space (for example, the camera coordinate system) of the image acquisition device and the 2D detection frame in the image coordinate system of the image can be simultaneously obtained through the monocular 3D detection algorithm. In some embodiments, the 3D detection frame in the three-dimensional space of the image acquisition device obtained by the detection algorithm may be represented by (x, y, z, rot, w, h, l), where (x, y, z) may be Characterize the coordinates of the center point of the 3D detection frame in the three-dimensional space of the image acquisition device, rot can represent the heading angle of the 3D detection frame, and (w, h, l) can represent the width, height and length of the 3D detection frame respectively. The 2D detection frame obtained by the detection algorithm can be represented by (x1, y1, x2, y2), where (x1, y1) can represent the coordinates of the upper left corner of the 2D detection frame in the image coordinate system, and (x2, y2) can represent The coordinates of the lower right corner of the 2D detection box in the image coordinate system.
如前所述,图像采集设备可以包括一个或多个,针对一个图像采集设备,第一图像可以是一个采集图像,示例地,仍以上述图像采集序列1为例,则第一图像可以是P2或P3。在一些实施例中,当第一图像是一个采集图像时,可以对该采集图像进行对象检测,得到采集图像上的对象在图像采集设备的三维空间中的3D检测框,以及在采集图像上的2D检测框,在采集图像上的2D检测框即采集图像的图像坐标系中的2D检测框。在一些实施例中,当第一图像是一个采集图像时,可以将图像采集设备的三维空间中的3D检测框确定为目标三维空间中的3D检测框,或者将图像采集设备的三维空间中的3D检测框映射至目标坐标系,将目标坐标系中的3D检测框确定为目标三维空间中的3D检测框。关于目标坐标系的具体细节可以参见图2及其相关描述,在此不再赘述。As mentioned above, the image acquisition device may include one or more. For one image acquisition device, the first image may be an acquired image. For example, still taking the above image acquisition sequence 1 as an example, the first image may be P2 or P3. In some embodiments, when the first image is a captured image, object detection may be performed on the captured image to obtain the 3D detection frame of the object on the captured image in the three-dimensional space of the image capture device, and the 3D detection frame of the object on the captured image. The 2D detection frame, the 2D detection frame on the collected image is the 2D detection frame in the image coordinate system of the collected image. In some embodiments, when the first image is a captured image, the 3D detection frame in the three-dimensional space of the image capture device may be determined as the 3D detection frame in the target three-dimensional space, or the 3D detection frame in the three-dimensional space of the image capture device may be determined as The 3D detection frame is mapped to the target coordinate system, and the 3D detection frame in the target coordinate system is determined as the 3D detection frame in the target three-dimensional space. For specific details about the target coordinate system, reference may be made to FIG. 2 and its related descriptions, which will not be repeated here.
针对多个图像采集设备,第一图像可以包括多个采集图像,多个采集图像可以是多个图像采集设备在同一采集时刻采集到的图像。示例地,仍以上述图像采集序列2为例,则第一图像可以包括P 12、P 22以及P 32,或者包括P 13、P 23以及P 33。在一些实施例中,当第一 图像是多个采集图像时,可以对多个采集图像均进行对象检测,得到每一采集图像上的对象在每一图像采集设备的三维空间中的3D检测框以及在采集图像上的2D检测框。在一些实施例中,可以对每一图像采集设备中的三维空间中的3D检测框和每一采集图像上的2D检测框分别进行处理,得到目标三维空间中的3D检测框以及在第一图像上的2D检测框。关于第一图像为多个采集图像时得到3D检测框和2D检测框的具体细节可以参见图2及其相关描述,在此不再赘述。 For multiple image capture devices, the first image may include multiple capture images, and the multiple capture images may be images captured by multiple image capture devices at the same capture time. For example, still taking the above image acquisition sequence 2 as an example, the first image may include P 12 , P 22 and P 32 , or include P 13 , P 23 and P 33 . In some embodiments, when the first image is a plurality of captured images, object detection may be performed on the multiple captured images to obtain a 3D detection frame of the object on each captured image in the three-dimensional space of each image capture device And a 2D detection frame on the captured image. In some embodiments, the 3D detection frame in the three-dimensional space in each image acquisition device and the 2D detection frame on each captured image can be processed separately to obtain the 3D detection frame in the target three-dimensional space and the 3D detection frame in the first image 2D detection boxes on . For specific details of obtaining the 3D detection frame and the 2D detection frame when the first image is a plurality of acquired images, refer to FIG. 2 and its related descriptions, which will not be repeated here.
步骤130,根据针对第二图像对目标对象进行跟踪的跟踪结果,预测目标对象在目标三维空间中的3D预测框以及在第一图像上的2D预测框,第二图像是第一图像在图像采集序列中的上一图像。Step 130: Predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of the target object in the second image, the second image is the first image in the image acquisition The previous image in the sequence.
在一些实施例中,第一图像和第二图像可以是图像采集设备在不同采集时刻采集的图像,其中,第二图像是图像采集设备在第一图像的采集时刻的上一采集时刻得到的图像,即第二图像是第一图像在图像采集序列中的上一图像。In some embodiments, the first image and the second image may be images acquired by the image acquisition device at different acquisition moments, wherein the second image is an image obtained by the image acquisition device at the last acquisition moment of the first image acquisition moment , that is, the second image is the previous image of the first image in the image acquisition sequence.
在一些实施例中,目标对象可以是第二图像中包括的一个或多个目标,目标对象可以包括不同类别的目标。例如,行人类别的目标和车辆类别的目标等。在一些实施例中,针对第二图像对目标对象进行跟踪的跟踪结果可以包括目标对象对应第二图像的3D跟踪位置信息、2D跟踪位置信息以及目标对象的运动数据。关于3D跟踪位置信息和2D跟踪位置信息的确定方式可以参见下述步骤140及其相关描述,在此不再赘述。In some embodiments, the target object may be one or more objects included in the second image, and the target objects may include objects of different categories. For example, objects of pedestrian class and objects of vehicle class etc. In some embodiments, the tracking result of tracking the target object with respect to the second image may include 3D tracking position information, 2D tracking position information and motion data of the target object corresponding to the second image. Regarding the manner of determining the 3D tracking position information and the 2D tracking position information, reference may be made to the following step 140 and related descriptions, and details are not repeated here.
在一些实施例中,运动数据可以包括目标对象在图像上的位置变化率以及目标对象在目标三维空间中的速度以及加速度。目标对象在图像上的位置变化率可以是指目标对象在第二图像上的2D跟踪信息与该目标对象在第一图像上的2D跟踪信息之间的位置变化率。关于确定位置变化率的细节可以参见下述跟踪器的相关描述,在此不再赘述。In some embodiments, the motion data may include the rate of change of the target object's position on the image and the velocity and acceleration of the target object in the target three-dimensional space. The position change rate of the target object on the image may refer to the position change rate between the 2D tracking information of the target object on the second image and the 2D tracking information of the target object on the first image. For details about determining the position change rate, reference may be made to the relevant description of the tracker below, and details are not repeated here.
在一些实施例中,目标对象在目标三维空间中的速度以及加速度可以是指目标对象在第二图像对应的采集时刻的速度和加速度。例如,以目标对象为行人,第二图像对应的采集时刻t2为例,则目标对象在目标三维空间中的速度和加速度可以是行人在t2时刻的速度和加速度,可以理解的,该速度和加速度为行人在现实空间中的行走速度和加速度。在一些实施例中,当图像采集设备设置于移动设备中,则可以将移动设备在第二图像的采集时刻的速度和加速度,确定为目标对象在目标三维空间中的速度和加速度。例如,以移动设备为自动驾驶车辆为例,则运动数据可以是该车辆在t2时刻的速度和加速度,可以理解的,该速度和加速度为该车辆在现实空间中的行驶速度和加速度。In some embodiments, the velocity and acceleration of the target object in the target three-dimensional space may refer to the velocity and acceleration of the target object at the acquisition moment corresponding to the second image. For example, taking the target object as a pedestrian and the acquisition time t2 corresponding to the second image as an example, the velocity and acceleration of the target object in the target three-dimensional space may be the velocity and acceleration of the pedestrian at the time t2. It can be understood that the velocity and acceleration is the walking speed and acceleration of pedestrians in real space. In some embodiments, when the image capture device is set in the mobile device, the velocity and acceleration of the mobile device at the moment of capturing the second image may be determined as the velocity and acceleration of the target object in the target three-dimensional space. For example, if the mobile device is an automatic driving vehicle, the motion data may be the speed and acceleration of the vehicle at time t2. It can be understood that the speed and acceleration are the driving speed and acceleration of the vehicle in real space.
在一些实施例中,可以根据跟踪器针对第二图像对目标对象进行跟踪的跟踪结果,预测目标对象在目标三维空间中的3D预测框以及在第一图像上的2D预测框。在一些实施例中,根据针对第二图像对目标对象进行跟踪的跟踪结果,预测目标对象在目标三维空间中的3D预测框以及在第一图像上的2D预测框,包括:根据运动数据更新跟踪器;将3D跟踪位置信息和2D跟踪位置信息输入到更新后的跟踪器中,得到跟踪器输出的3D预测框以及2D预测框。In some embodiments, the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image may be predicted according to the tracking result of the target object tracked by the tracker on the second image. In some embodiments, predicting the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of the target object in the second image includes: updating the tracking according to the motion data The 3D tracking position information and the 2D tracking position information are input into the updated tracker, and the 3D prediction frame and the 2D prediction frame output by the tracker are obtained.
在一些实施例中,跟踪器能够基于位置变化率以及2D跟踪位置信息输出2D预测框,以及基于速度、加速度和3D跟踪位置输出3D预测框。在一些实施例中,跟踪器设置的3D预测框的旋转角的值保持不变。In some embodiments, the tracker is capable of outputting a 2D predicted frame based on the rate of change of position and 2D tracked position information, and a 3D predicted frame based on velocity, acceleration, and 3D tracked position. In some embodiments, the value of the rotation angle of the 3D prediction frame set by the tracker remains unchanged.
在一些实施例中,目标对象的2D跟踪信息可以是该2D跟踪信息对应的2D检测框。在一些实施例中,2D跟踪信息可以通过(cx,cy,w,h)表征,其中,(cx,cy)表征该2D跟踪信息对应的2D检测框的中心点在第一图像的图像坐标系中的坐标,(w,h)可以表征该2D检测框的宽度和高度。在一些实施例中,目标对象的3D跟踪信息可以是该3D跟踪信息对应的3D检测框。在一些实施例中,3D跟踪信息可以(x,y,rot)表征,其中,(x,y)可以表征该3D跟踪信息对应的3D检测框的中心点在目标三维空间中的坐标,rot表征该 3D检测框的旋转角。In some embodiments, the 2D tracking information of the target object may be a 2D detection frame corresponding to the 2D tracking information. In some embodiments, the 2D tracking information can be represented by (cx, cy, w, h), where (cx, cy) represents that the center point of the 2D detection frame corresponding to the 2D tracking information is in the image coordinate system of the first image The coordinates in (w, h) can represent the width and height of the 2D detection frame. In some embodiments, the 3D tracking information of the target object may be a 3D detection frame corresponding to the 3D tracking information. In some embodiments, the 3D tracking information can be characterized by (x, y, rot), where (x, y) can represent the coordinates of the center point of the 3D detection frame corresponding to the 3D tracking information in the target three-dimensional space, and rot represents The rotation angle of the 3D detection box.
在一些实施例中,可以根据多个跟踪器针对第二图像对每一目标对象进行跟踪的跟踪结果,预测每一目标对象在目标三维空间中的3D预测框以及在第一图像上的2D预测框。在一些实施例中,针对任一目标对象,该目标对象对应的跟踪器可以包括该目标对象的状态转移函数,并根据该目标对象的状态转移函数和针对第二图像对该目标对象进行跟踪的跟踪结果,预测该目标对象在目标三维空间中的3D预测框以及在第一图像上的2D预测框。In some embodiments, the 3D prediction frame of each target object in the target three-dimensional space and the 2D prediction frame on the first image may be predicted according to the tracking results of each target object tracked by multiple trackers on the second image frame. In some embodiments, for any target object, the tracker corresponding to the target object may include the state transition function of the target object, and track the target object according to the state transition function of the target object and the second image. As a result of tracking, predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image.
在一些实施例中,针对任一目标对象,该目标对象对应的跟踪器包括的该目标对象的状态转移函数为以下公式(1):In some embodiments, for any target object, the state transition function of the target object included in the tracker corresponding to the target object is the following formula (1):
Figure PCTCN2022090574-appb-000001
Figure PCTCN2022090574-appb-000001
其中,关于(cx,cy,w,h)和(x,y,rot)的含义可以参见上述相关描述,在此不再赘述。(V cx,V cy,V w,V h)表征目标对象在图像上的位置变化率,与(cx,cy,w,h)对应,V x和V y分别表征目标对象沿目标三维空间的X轴和Y轴方向上的速度,a x和a y分别表征目标对象沿目标三维空间的X轴和Y轴方向上的加速度。 For the meanings of (cx, cy, w, h) and (x, y, rot), reference may be made to the relevant description above, which will not be repeated here. (V cx , V cy , V w , V h ) represent the position change rate of the target object on the image, corresponding to (cx, cy, w, h), V x and V y respectively represent the position of the target object along the target three-dimensional space The velocities in the X-axis and Y-axis directions, a x and a y respectively represent the acceleration of the target object along the X-axis and Y-axis directions in the target three-dimensional space.
在一些实施例中,当第二图像为图像采集序列中的首张图像时,目标对象在图像上的位置变化率为初始值0。示例地,仍以上述图像采集序列1“(P1,P2,P3)”为例,假设第一图像为P2,其包括目标对象1-5,第二图像为P1,其包括对象1-5,以下将结合示例,对与目标对象1对应的跟踪器1预测目标对象1针对图像采集序列1中各采集图像的3D预测框和2D预测框的过程进行说明。首先,通过对第二图像P1进行对象检测,可以得到目标对象1对应第二图像P1的2D检测框和3D检测框,由于第二图像P1为首张图像,此时,目标对象1对应的位置变化率1为0,且目标对象1对应第二图像P1的3D跟踪位置信息和2D跟踪位置信息为其对象检测得到的3D检测框和2D检测框(即通过上述步骤120得到的3D检测框和2D检测框),同时,目标对象1在第二图像P1对应的采集时刻的速度1和加速度1可以检测得到,根据位置变化率1、速度1以及加速度1对跟踪器1进行更新,即对上述公式(1)进行更新,将3D跟踪位置信息和2D跟踪位置信息输入到更新后的跟踪器1可以得到目标对象1对应下一时刻的图像(即第一图像P2)的3D预测框和2D预测框。进一步地,第一图像P2上的对象的3D检测框和2D检测框可以根据对象检测得到,通过下述步骤140的方式,可以得到与第一图像P2中与目标对象1属于同一目标的对象,假设为对象1,进一步的,通过对象1对应的2D检测框和目标对象1的2D检测框可以得到目标对象1在第二图像P1和第一图像P2上的位置变化率2,通过位置变化率2和目标对象1在第一图像P2的采集时刻的速度和加速度,可以得到目标对象1对应第三图像P3的3D预测框和2D预测框。由此可知,通过目标对象1对应的跟踪器1可以得到目标对象1在图像采集序列中每个采集图像上的位置信息,实现对目标对象1在图 像序列上的跟踪。In some embodiments, when the second image is the first image in the image collection sequence, the rate of change of the position of the target object on the image is an initial value of 0. Exemplarily, still taking the above-mentioned image acquisition sequence 1 "(P1, P2, P3)" as an example, assuming that the first image is P2, which includes target objects 1-5, and the second image is P1, which includes objects 1-5, The following will describe the process of the tracker 1 corresponding to the target object 1 predicting the 3D prediction frame and the 2D prediction frame of the target object 1 for each captured image in the image capture sequence 1 with reference to an example. First, by performing object detection on the second image P1, the 2D detection frame and the 3D detection frame of the target object 1 corresponding to the second image P1 can be obtained. Since the second image P1 is the first image, at this time, the corresponding position of the target object 1 changes The rate 1 is 0, and the 3D tracking position information and 2D tracking position information corresponding to the second image P1 of the target object 1 is the 3D detection frame and 2D detection frame obtained by its object detection (that is, the 3D detection frame and 2D detection frame obtained through the above step 120). detection frame), at the same time, the speed 1 and acceleration 1 of the target object 1 at the acquisition time corresponding to the second image P1 can be detected, and the tracker 1 is updated according to the position change rate 1, speed 1 and acceleration 1, that is, the above formula (1) Update, input the 3D tracking position information and 2D tracking position information into the updated tracker 1 to obtain the 3D prediction frame and 2D prediction frame of the target object 1 corresponding to the image at the next moment (that is, the first image P2) . Further, the 3D detection frame and the 2D detection frame of the object on the first image P2 can be obtained according to the object detection, and the object belonging to the same target as the target object 1 in the first image P2 can be obtained by means of the following step 140, Suppose it is an object 1, further, the position change rate 2 of the target object 1 on the second image P1 and the first image P2 can be obtained through the 2D detection frame corresponding to the object 1 and the 2D detection frame of the target object 1, through the position change rate 2 and the velocity and acceleration of the target object 1 at the acquisition moment of the first image P2, the 3D prediction frame and the 2D prediction frame of the target object 1 corresponding to the third image P3 can be obtained. It can be seen that, through the tracker 1 corresponding to the target object 1, the position information of the target object 1 on each captured image in the image capture sequence can be obtained, so as to realize the tracking of the target object 1 on the image sequence.
在本说明书实施例中,通过根据跟踪器基于位置变化率和2D跟踪位置信息输出2D预测框,即跟踪器采用匀速模型实现对2D框的预测或运动估计,可以减少预测2D框的计算量。通过根据跟踪器基于速度、加速度以及3D跟踪位置信息输出3D预测框,即跟踪器采用匀加速模型实现对3D框的预测或运动估计,可以提高3D框的预测精准度。且通过对象检测得到的3D检测框的角度的不确定性较大,通过对跟踪器中3D预测框的旋转角保持不变,使用静态模型对旋转角进行平滑处理,进一步提高3D框的预测精确度。In the embodiment of this specification, by outputting the 2D prediction frame based on the position change rate and 2D tracking position information according to the tracker, that is, the tracker uses a constant velocity model to realize the prediction or motion estimation of the 2D frame, which can reduce the calculation amount of predicting the 2D frame. By outputting a 3D prediction frame based on the velocity, acceleration, and 3D tracking position information of the tracker, that is, the tracker uses a uniform acceleration model to realize prediction or motion estimation of the 3D frame, which can improve the prediction accuracy of the 3D frame. And the angle of the 3D detection frame obtained through object detection has a large uncertainty. By keeping the rotation angle of the 3D prediction frame in the tracker unchanged, the static model is used to smooth the rotation angle to further improve the prediction accuracy of the 3D frame. Spend.
步骤140,根据3D检测框、2D检测框、3D预测框以及2D预测框,确定针对第一图像对目标对象进行跟踪的跟踪结果。 Step 140, determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
在一些实施例中,在根据3D检测框、2D检测框、3D预测框以及2D预测框,确定针对第一图像对目标对象进行跟踪的跟踪结果之前,所述方法还包括:对3D检测框、2D检测框、3D预测框以及2D预测框执行非极大值抑制处理。通过执行非极大值抑制处理,可以过滤掉重叠的3D框(即3D检测框或3D预测框)和2D框(即2D检测框或2D预测框),避免重叠的框影响后续3D框之间的匹配和2D框之间的匹配,提高了后续匹配的准确度,从而提高了目标跟踪的准确度。In some embodiments, before determining the tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame, the method further includes: the 3D detection frame, The 2D detection frame, 3D prediction frame, and 2D prediction frame perform non-maximum value suppression processing. By performing non-maximum suppression processing, overlapping 3D frames (ie, 3D detection frames or 3D prediction frames) and 2D frames (ie, 2D detection frames or 2D prediction frames) can be filtered out to avoid overlapping frames affecting subsequent 3D frames. The matching between , and the matching between 2D boxes improves the accuracy of subsequent matching, thus improving the accuracy of object tracking.
在一些实施例中,根据3D检测框、2D检测框、3D预测框以及2D预测框,确定针对第一图像对目标对象进行跟踪的跟踪结果,包括:根据3D预测框和每一3D检测框之间的第一交并比值和/或距离值,从各3D检测框中确定与3D预测框匹配的目标3D检测框;将目标三维空间中对应目标3D检测框的对象作为所述目标对象,并将目标3D检测框在目标三维空间中的位置信息作为目标对象的3D跟踪位置信息,跟踪结果包括3D跟踪位置信息。In some embodiments, according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame, determining the tracking result of tracking the target object with respect to the first image includes: according to the 3D prediction frame and each 3D detection frame Between the first intersection ratio and/or distance value, determine the target 3D detection frame matching with the 3D prediction frame from each 3D detection frame; use the object corresponding to the target 3D detection frame in the target three-dimensional space as the target object, and The position information of the target 3D detection frame in the target three-dimensional space is used as the 3D tracking position information of the target object, and the tracking result includes the 3D tracking position information.
在一些实施例中,第一交并比值可以是指3D预测框和3D检测框之间的交叠率,即3D预测框和3D检测框的交集和并集的比值。在一些实施例中,目标对象可以为一个或多个,对应的,目标对象在目标三维空间中的3D预测框也可以包括一个或多个。在一些实施例中,针对多个3D预测框,可以基于每一3D预测框和每一3D检测框之间的第一交并比值,确定第一交并比矩阵。在一些实施例中,距离值可以是3D预测框和3D检测框在目标三维空间下的中心点之间的距离。距离可以包括但不限于曼哈顿距离或欧式距离等。在一些实施例中,针对多个3D预测框,可以基于每一3D预测框和每一3D检测框之间的距离值,确定距离矩阵。在本说明书实施例中,可以根据匈牙利算法对第一交并比矩阵或距离矩阵进行求解,确定3D预测框和3D检测框之间的匹配结果,从各3D检测框中确定与每一3D预测框匹配的目标3D检测框。In some embodiments, the first intersection ratio may refer to an overlap rate between the 3D prediction frame and the 3D detection frame, that is, the ratio of the intersection and union of the 3D prediction frame and the 3D detection frame. In some embodiments, there may be one or more target objects, and correspondingly, the 3D prediction frame of the target object in the target three-dimensional space may also include one or more. In some embodiments, for multiple 3D prediction frames, the first intersection ratio matrix may be determined based on the first intersection ratio between each 3D prediction frame and each 3D detection frame. In some embodiments, the distance value may be the distance between the center points of the 3D prediction frame and the 3D detection frame in the target three-dimensional space. The distance may include, but is not limited to, Manhattan distance or Euclidean distance, among others. In some embodiments, for multiple 3D prediction frames, a distance matrix may be determined based on the distance value between each 3D prediction frame and each 3D detection frame. In the embodiment of this specification, the first intersection ratio matrix or distance matrix can be solved according to the Hungarian algorithm, the matching result between the 3D prediction frame and the 3D detection frame can be determined, and each 3D prediction frame can be determined from each 3D detection frame. The box matches the target 3D detection box.
如前所述,目标对象或对象可以是不同类别的目标。在一些实施例中,针对车辆类别的目标对象和对象,可以基于每一3D预测框和每一3D检测框之间的第一交并比值,确定第一交并比矩阵。针对行人类别的目标对象和对象,可以基于每一3D预测框和每一3D检测框之间的距离值,确定距离矩阵。在本说明书实施例中,可以根据匈牙利算法对第一交并比矩阵和距离矩阵分别进行求解,确定3D预测框和3D检测框之间的匹配结果,从各3D检测框中确定与每一3D预测框匹配的目标3D检测框。As mentioned earlier, target objects or objects can be different classes of objects. In some embodiments, for the target object and the object of the vehicle category, the first intersection ratio matrix may be determined based on the first intersection ratio value between each 3D prediction frame and each 3D detection frame. For target objects and objects of the pedestrian category, a distance matrix may be determined based on the distance values between each 3D prediction frame and each 3D detection frame. In the embodiment of this specification, the first cross-over-unit ratio matrix and the distance matrix can be solved separately according to the Hungarian algorithm, the matching result between the 3D prediction frame and the 3D detection frame can be determined, and each 3D detection frame can be determined from each 3D detection frame. The target 3D detection box that the predicted box matches.
示例地,以第二图像中包括目标对象1-5,目标对象1-5的3D预测框分别为N 3D-1、N 3D-2、N 3D-3、N 3D-4、N 3D-5,第一图像中包括对象1-5,对象1-5的3D检测框分别为M 3D-1、M 3D-2、M 3D-3、M 3D-4、M 3D-5为例,若根据匈牙利算法对第一交并比矩阵和距离矩阵求解得到匹配结果为:N 3D-1-M 3D-2,N 3D-2-M 3D-1,N 3D-3-M 3D-3,则与3D预测框N 3D-1匹配的目标3D检测框为3D检测框M 3D-2,与3D预测框N 3D-2匹配的目标3D检测框为3D检测框M 3D-1,与3D预测框N 3D-3匹配的目标3D检测框为3D检测框M 3D-3For example, assuming that the second image includes target objects 1-5, the 3D prediction frames of target objects 1-5 are respectively N 3D-1 , N 3D-2 , N 3D-3 , N 3D-4 , N 3D-5 , the first image includes objects 1-5, and the 3D detection frames of objects 1-5 are respectively M 3D-1 , M 3D-2 , M 3D-3 , M 3D-4 , M 3D-5 as an example, if according to The Hungarian algorithm solves the first intersection ratio matrix and the distance matrix to obtain the matching results: N 3D-1 -M 3D-2 , N 3D-2 -M 3D-1 , N 3D-3 -M 3D-3 , then with The target 3D detection frame matched by 3D prediction frame N 3D-1 is 3D detection frame M 3D-2 , the target 3D detection frame matching 3D prediction frame N 3D-2 is 3D detection frame M 3D-1 , and 3D prediction frame N The target 3D detection frame matched by 3D-3 is the 3D detection frame M 3D-3 .
在一些实施例中,根据3D检测框、2D检测框、3D预测框以及2D预测框,确定针对第一图像对目标对象进行跟踪的跟踪结果,还包括:针对各3D检测框中未匹配到3D预测 框的3D检测框,确定与该3D检测框对应同一对象的2D检测框,并根据该2D检测框与2D预测框之间的第二交并比值,确定与2D预测框匹配的目标2D检测框;将第一图像上对应目标2D检测框的对象作为目标对象,并将目标2D检测框在第一图像上的位置信息作为目标对象的2D跟踪位置信息,跟踪结果包括2D跟踪位置信息。In some embodiments, according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame, determining the tracking result of tracking the target object for the first image further includes: for each 3D detection frame that does not match a 3D The 3D detection frame of the prediction frame, determine the 2D detection frame corresponding to the same object as the 3D detection frame, and determine the target 2D detection frame matching the 2D prediction frame according to the second intersection ratio between the 2D detection frame and the 2D prediction frame frame: the object corresponding to the target 2D detection frame on the first image is used as the target object, and the position information of the target 2D detection frame on the first image is used as the 2D tracking position information of the target object, and the tracking result includes the 2D tracking position information.
关于2D检测框与2D预测框之间的第二交并比值的确定方式与第一交并比值类似,在此不再赘述。在一些实施例中,可以根据该2D检测框与2D预测框之间的第二交并比值确定第二交并比矩阵,并根据匈牙利算法对第二交并比矩阵进行求解,确定2D检测框与2D预测框之间的匹配结果,从而确定与2D预测框匹配的目标2D检测框。The method of determining the second intersection ratio between the 2D detection frame and the 2D prediction frame is similar to that of the first intersection ratio, and will not be repeated here. In some embodiments, the second intersection ratio matrix can be determined according to the second intersection ratio between the 2D detection frame and the 2D prediction frame, and the second intersection ratio matrix can be solved according to the Hungarian algorithm to determine the 2D detection frame The matching result with the 2D prediction frame, so as to determine the target 2D detection frame that matches the 2D prediction frame.
示例地,仍以前述示例为例,则3D检测框中未匹配到3D预测框的3D检测框包括M 3D-4和M 3D-5,确定与该3D检测框M 3D-4和M 3D-5对应同一对象的2D检测框M 2D-4和M 2D-5,以及2D检测框M 2D-4和M 2D-5与2D预测框N 2D-1、N 2D-2、N 2D-3、N 2D-4、N 2D-3(即目标对象1-5的2D预测框)之间的第二交并比矩阵,并根据匈牙利算法对该第二交并比矩阵进行求解,得到2D检测框和2D预测框之间的匹配结果M 2D-4-N 2D-4,M 2D-5-N 2D-5Exemplarily, still taking the foregoing example as an example, the 3D detection frames that do not match the 3D prediction frames include M 3D-4 and M 3D-5 , and it is determined that the 3D detection frames M 3D-4 and M 3D- 5 corresponding to the 2D detection frames M 2D-4 and M 2D-5 of the same object, and the 2D detection frames M 2D -4 and M 2D -5 and the 2D prediction frames N 2D-1 , N 2D-2 , N 2D-3 , The second intersection ratio matrix between N 2D-4 and N 2D-3 (that is, the 2D prediction frame of the target object 1-5), and solve the second intersection ratio matrix according to the Hungarian algorithm to obtain the 2D detection frame and the matching results M 2D-4 -N 2D-4 , M 2D-5 -N 2D-5 between the 2D prediction frames.
3D预测框和2D预测框为第二图像中的目标对象对应的3D框和2D框,3D检测框和3D检测框为第一图像中的对象对应的3D框和2D框,通过对预测框和检测框进行匹配,可以确定第二图像与第一图像中属于同一目标的目标对象和对象,实现目标跟踪。例如,以上述匹配结果M 2D-4-N 2D-4为例,则可以确定前一时刻的第二图像中的对象4,与下一时刻的第一图像中的目标对象4属于同一目标。在一些实施例中,可以将属于同一目标的目标对象与对象赋予相同的ID。 The 3D prediction frame and the 2D prediction frame are the 3D frame and the 2D frame corresponding to the target object in the second image, and the 3D detection frame and the 3D detection frame are the 3D frame and the 2D frame corresponding to the object in the first image. The detection frame is matched, and the target object and object belonging to the same target in the second image and the first image can be determined to realize target tracking. For example, taking the above matching result M 2D-4 -N 2D-4 as an example, it can be determined that the object 4 in the second image at the previous moment belongs to the same object as the target object 4 in the first image at the next moment. In some embodiments, target objects and objects belonging to the same target may be assigned the same ID.
在本说明书的实施例中,首先对3D检测框和3D预测框进行匹配,再对2D检测框和2D预测框进行匹配,采用两阶段的匹配。使用3D框(即3D检测框和3D预测框)进行匹配的精度高,错误匹配的概率小,使用2D框(即2D检测框和2D预测框)进行匹配,使得漏匹配的概率低。因此,通过两阶段的匹配可以在实现高精度匹配的同时,又减少了漏匹配的情况,即提高上一时刻的图像中的目标对象与下一时刻的图像中的对象的匹配精度,减少目标对象与对象的漏匹配的情况,进而提高对不同时刻的图像中的同一目标的跟踪准确度。In the embodiment of this specification, the 3D detection frame is first matched with the 3D prediction frame, and then the 2D detection frame is matched with the 2D prediction frame, and a two-stage matching is adopted. Using 3D frames (that is, 3D detection frames and 3D prediction frames) for matching has high precision, and the probability of false matching is small. Using 2D frames (that is, 2D detection frames and 2D prediction frames) for matching makes the probability of missing matches low. Therefore, through the two-stage matching, while achieving high-precision matching, the situation of missing matching can be reduced, that is, the matching accuracy between the target object in the image at the previous moment and the object in the image at the next moment can be improved, and the target object can be reduced. Object-to-object missing matching, thereby improving the tracking accuracy of the same target in images at different times.
在一些实施例,可以针对3D预测框和2D预测框均未匹配成功的目标对象创建新的跟踪器。在一些实施例中,可以针对3D预测框和2D预测框在预设次数均未匹配成功的目标对象所对应的跟踪器丢弃。预设次数可以根据实际情况具体设置。由上述可知,3D预测框和2D预测框对应两阶段的匹配,针对目标对象在预设次数的两阶段匹配均未有匹配结果的,可以认为跟踪器对该目标对象跟踪失败,将该目标对象的跟踪器丢弃。In some embodiments, a new tracker may be created for the target object for which neither the 3D predicted frame nor the 2D predicted frame is successfully matched. In some embodiments, the tracker corresponding to the target object that fails to match the 3D prediction frame and the 2D prediction frame for a preset number of times may be discarded. The preset number of times can be specifically set according to actual conditions. It can be seen from the above that the 3D prediction frame and the 2D prediction frame correspond to two-stage matching. If there is no matching result for the target object in the two-stage matching of the preset number of times, it can be considered that the tracker failed to track the target object, and the target object of trackers dropped.
在本公开实施例中,通过3D框和2D框信息确定目标对象的跟踪结果,引入3D信息,使得本公开的目标跟踪方法可以在目标对象丢失后的一段时间里继续对目标对象进行目标三维空间中的运动估计,提升了在目标对象重新出现后的成功匹配概率,减少了因目标漏检或出视野等情况引起的ID切换,即减少目标的错误跟踪的情况。In the embodiment of the present disclosure, the tracking result of the target object is determined through the 3D frame and 2D frame information, and the 3D information is introduced, so that the target tracking method of the present disclosure can continue to target the target object in the three-dimensional space for a period of time after the target object is lost. The motion estimation in the system improves the probability of successful matching after the target object reappears, and reduces the ID switching caused by the target missing or out of the field of view, that is, the error tracking of the target is reduced.
图2是根据一示例性实施例示出的确定3D检测框和2D检测框的流程图,如图2所示,该方法包括以下步骤。Fig. 2 is a flowchart of determining a 3D detection frame and a 2D detection frame according to an exemplary embodiment. As shown in Fig. 2 , the method includes the following steps.
步骤210,对所述多个采集图像均进行对象检测,得到每一所述采集图像上的对象在每一所述图像采集设备的三维空间中的3D检测框以及在所述采集图像上的2D检测框。Step 210, performing object detection on the plurality of captured images to obtain the 3D detection frame of the object on each of the captured images in the three-dimensional space of each of the image capture devices and the 2D detection frame on the captured images. detection box.
关于步骤210的具体细节与步骤120类似,具体可参见上述步骤210及其相关描述,在此不再赘述。The specific details of step 210 are similar to step 120, for details, please refer to the above-mentioned step 210 and related descriptions, which will not be repeated here.
示例地,仍以前述多个采集图像为P1,P2,P3为例,则可以对采集图像P1进行对象检测,得到采集图像P1上的对象在图像采集设备1的三维空间中的3D检测框以及在采集图像P1的图像坐标系下的2D检测框;对采集图像P2和P3的对象检测与采集图像P1类 似,在此不再赘述。Exemplarily, still taking the aforementioned multiple captured images as P1, P2, and P3 as an example, object detection can be performed on the captured image P1 to obtain the 3D detection frame of the object on the captured image P1 in the three-dimensional space of the image capture device 1 and The 2D detection frame in the image coordinate system of the captured image P1; the object detection of the captured images P2 and P3 is similar to that of the captured image P1, and will not be repeated here.
步骤220,根据每一所述图像采集设备的外参将位于不同图像采集设备的三维空间中的3D检测框映射到同一目标坐标系下,所述目标三维空间是所述目标坐标系限定的空间。Step 220: Map the 3D detection frames located in the three-dimensional space of different image acquisition devices to the same target coordinate system according to the external parameters of each of the image capture devices, and the target three-dimensional space is the space defined by the target coordinate system .
在一些实施例中,目标坐标系可以根据图像采集设备设置的位置确定。例如,图像采集设备设置于自动驾驶车辆中,则目标坐标系可以是图像采集序列中的首张图像对应的自车坐标系。又例如,图像采集设备设置于预设固定位置上,则目标坐标系可以是基于预设固定位置确定的坐标系,该坐标系的原点、X轴、Y轴以及Z轴可以根据实际情况具体设置。In some embodiments, the target coordinate system can be determined according to the position set by the image acquisition device. For example, if the image acquisition device is set in an automatic driving vehicle, the target coordinate system may be the own vehicle coordinate system corresponding to the first image in the image acquisition sequence. For another example, if the image acquisition device is set at a preset fixed position, the target coordinate system may be a coordinate system determined based on the preset fixed position, and the origin, X axis, Y axis, and Z axis of the coordinate system may be specifically set according to actual conditions .
在一些实施例中,每一图像采集坐标系的外参可以反映该图像采集坐标系与目标坐标系之间的位姿关系。外参可以包括平移参数和旋转参数。在一些实施例中,每一图像采集坐标系的外参可以通过对图像采集设备的标定得到。关于图像采集设备的标定可以参见相关技术,在此不再赘述。In some embodiments, the extrinsic parameters of each image acquisition coordinate system may reflect the pose relationship between the image acquisition coordinate system and the target coordinate system. Extrinsic parameters can include translation parameters and rotation parameters. In some embodiments, the external parameters of each image acquisition coordinate system can be obtained by calibrating the image acquisition device. Regarding the calibration of the image acquisition device, reference may be made to related technologies, which will not be repeated here.
步骤230,对所述多个采集图像进行拼接,得到拼接图像,并将所述多个采集图像上的2D检测框映射到所述拼接图像上,所述第一图像上的2D检测框为所述拼接图像上的2D检测框。 Step 230, stitching the plurality of collected images to obtain a stitched image, and mapping the 2D detection frames on the plurality of collected images to the stitched image, the 2D detection frame on the first image is the 2D detection boxes on the stitched image.
在一些实施例中,拼接图像上的2D检测框可以是指拼接图像的图像坐标系下的2D检测框。通过将多个采集图像上的2D检测框映射到拼接图像上,可以将每一采集图像的图像坐标系下的2D检测框转换到拼接图像的图像坐标系下,即将对应不同图像坐标系下的2D检测框转换到同一图像坐标系下。In some embodiments, the 2D detection frame on the stitched image may refer to the 2D detection frame in the image coordinate system of the stitched image. By mapping the 2D detection frames on multiple captured images to the spliced image, the 2D detection frames in the image coordinate system of each captured image can be converted to the image coordinate system of the spliced image, that is, corresponding to different image coordinate systems. The 2D detection frame is transformed to the same image coordinate system.
在本说明书实施例中,通过将位于不同图像采集设备的三维空间中的3D检测框映射到同一目标坐标系下,以及将多个采集图像上的2D检测框映射到拼接图像上,即将多个图像采集设备的多个采集图像的检测结果进行了融合,可以对多个采集图像中的目标同时进行跟踪,从而仅需执行一次跟踪算法即可实现对不同图像采集图像中的同一目标的跟踪,避免了对每个图像采集设备的采集图像中的目标单独进行跟踪带来的低效率问题,且减少了同一目标在不同图像采集设备中的ID切换问题。In the embodiment of this specification, by mapping the 3D detection frames in the three-dimensional space of different image acquisition devices to the same target coordinate system, and mapping the 2D detection frames on multiple captured images to the spliced image, that is, multiple The detection results of multiple acquired images of the image acquisition device are fused, and the targets in multiple acquired images can be tracked at the same time, so that only one tracking algorithm is needed to realize the tracking of the same target in different image acquisition images. The problem of inefficiency caused by separately tracking the target in the captured image of each image capture device is avoided, and the problem of ID switching of the same target in different image capture devices is reduced.
图3是根据一示例性实施例示出的一种目标跟踪装置300的框图。参照图3,该装置包括获取模块310,检测模块320、预测模块330以及确定模块340。Fig. 3 is a block diagram of an object tracking device 300 according to an exemplary embodiment. Referring to FIG. 3 , the device includes an acquisition module 310 , a detection module 320 , a prediction module 330 and a determination module 340 .
获取模块310,被配置为获取图像采集序列,所述图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的;The acquisition module 310 is configured to acquire an image acquisition sequence, the image acquisition sequence is obtained according to the acquired images of the image acquisition device at multiple acquisition moments;
检测模块320,被配置为对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,所述第一图像为所述图像采集序列中的除首张图像以外的任一图像;The detection module 320 is configured to perform object detection on the first image in the image acquisition sequence, and obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image. A detection frame, the first image is any image in the image acquisition sequence except the first image;
预测模块330,被配置为根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述目标三维空间中的3D预测框以及在所述第一图像上的2D预测框,所述第二图像是所述第一图像在所述图像采集序列中的上一图像;The prediction module 330 is configured to predict a 3D prediction frame of the target object in the target three-dimensional space and a 2D prediction frame on the first image according to the tracking result of the target object being tracked on the second image, The second image is a previous image in the image acquisition sequence of the first image;
确定模块340,被配置为根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果。The determining module 340 is configured to determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
在一些实施例中,所述确定模块340进一步被配置为:In some embodiments, the determining module 340 is further configured to:
根据所述3D预测框和每一所述3D检测框之间的第一交并比值和/或距离值,从各所述3D检测框中确定与所述3D预测框匹配的目标3D检测框;determining a target 3D detection frame matching the 3D prediction frame from each of the 3D detection frames according to the first intersection ratio and/or distance value between the 3D prediction frame and each of the 3D detection frames;
将所述目标三维空间中对应所述目标3D检测框的对象作为所述目标对象,并将所述目标3D检测框在所述目标三维空间中的位置信息作为所述目标对象的3D跟踪位置信息,所述跟踪结果包括所述3D跟踪位置信息。Taking the object corresponding to the target 3D detection frame in the target three-dimensional space as the target object, and using the position information of the target 3D detection frame in the target three-dimensional space as the 3D tracking position information of the target object , the tracking result includes the 3D tracking location information.
在一些实施例中,所述确定模块340进一步被配置为:In some embodiments, the determining module 340 is further configured to:
针对各所述3D检测框中未匹配到所述3D预测框的3D检测框,确定与该3D检测框对应同一对象的2D检测框,并根据该2D检测框与所述2D预测框之间的第二交并比值,确定与所述2D预测框匹配的目标2D检测框;For a 3D detection frame that does not match the 3D prediction frame in each of the 3D detection frames, determine a 2D detection frame corresponding to the same object as the 3D detection frame, and The second intersection ratio is used to determine the target 2D detection frame matching the 2D prediction frame;
将所述第一图像上对应所述目标2D检测框的对象作为所述目标对象,并将所述目标2D检测框在所述第一图像上的位置信息作为所述目标对象的2D跟踪位置信息,所述跟踪结果包括所述2D跟踪位置信息。Taking the object corresponding to the target 2D detection frame on the first image as the target object, and using the position information of the target 2D detection frame on the first image as the 2D tracking position information of the target object , the tracking result includes the 2D tracking location information.
在一些实施例中,所述跟踪结果包括所述目标对象对应所述第二图像的3D跟踪位置信息、2D跟踪位置信息以及所述目标对象的运动数据;In some embodiments, the tracking result includes 3D tracking position information of the target object corresponding to the second image, 2D tracking position information, and motion data of the target object;
所述预测模块330进一步被配置为:The prediction module 330 is further configured to:
根据所述运动数据更新跟踪器;updating a tracker based on said motion data;
将所述3D跟踪位置信息和2D跟踪位置信息输入到更新后的跟踪器中,得到所述跟踪器输出的所述3D预测框以及所述2D预测框。The 3D tracking position information and the 2D tracking position information are input into the updated tracker to obtain the 3D prediction frame and the 2D prediction frame output by the tracker.
在一些实施例中,所述运动数据包括所述目标对象的在图像上的位置变化率,以及所述目标对象在所述目标三维空间中的速度以及加速度;In some embodiments, the motion data includes the rate of change of the target object's position on the image, and the velocity and acceleration of the target object in the target three-dimensional space;
所述跟踪器能够基于所述位置变化率以及所述2D跟踪位置信息输出所述2D预测框,以及基于所述速度、所述加速度以及所述3D跟踪位置信息输出所述3D预测框。The tracker is capable of outputting the 2D predicted frame based on the position change rate and the 2D tracked position information, and outputting the 3D predicted frame based on the velocity, the acceleration, and the 3D tracked position information.
在一些实施例中,所述第一图像包括多个采集图像,所述多个采集图像是多个图像采集设备在同一采集时刻采集到的图像;In some embodiments, the first image includes multiple captured images, and the multiple captured images are images captured by multiple image capture devices at the same capture moment;
所述检测模块320进一步被配置为:The detection module 320 is further configured to:
对所述多个采集图像均进行对象检测,得到每一所述采集图像上的对象在每一所述图像采集设备的三维空间中的3D检测框以及在所述采集图像上的2D检测框;performing object detection on the multiple captured images to obtain a 3D detection frame of the object on each of the captured images in the three-dimensional space of each of the image capture devices and a 2D detection frame on the captured images;
根据每一所述图像采集设备的外参将位于不同图像采集设备的三维空间中的3D检测框映射到同一目标坐标系下,所述目标三维空间是所述目标坐标系限定的空间;mapping the 3D detection frames located in the three-dimensional spaces of different image acquisition devices to the same target coordinate system according to the external parameters of each of the image capture devices, and the target three-dimensional space is a space defined by the target coordinate system;
对所述多个采集图像进行拼接,得到拼接图像,并将所述多个采集图像上的2D检测框映射到所述拼接图像上,所述第一图像上的2D检测框为所述拼接图像上的2D检测框。Stitching the multiple collected images to obtain a spliced image, and mapping the 2D detection frames on the multiple collected images to the spliced image, where the 2D detection frame on the first image is the spliced image 2D detection boxes on .
在一些实施例中,所述确定模块340进一步被配置为:In some embodiments, the determining module 340 is further configured to:
对所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框执行非极大值抑制处理。A non-maximum suppression process is performed on the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
本公开还提供一种计算机可读存储介质,其上存储有计算机程序指令,该程序指令被处理器执行时实现本公开提供的目标跟踪方法的步骤。The present disclosure also provides a computer-readable storage medium, on which computer program instructions are stored, and when the program instructions are executed by a processor, the steps of the target tracking method provided in the present disclosure are realized.
图4是根据一示例性实施例示出的一种用于目标跟踪的装置400的框图。例如,装置400可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。Fig. 4 is a block diagram of an apparatus 400 for object tracking according to an exemplary embodiment. For example, the apparatus 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
参照图4,装置400可以包括以下一个或多个组件:处理组件402,存储器404,电力组件406,多媒体组件408,音频组件410,输入/输出(I/O)的接口412,传感器组件414,以及通信组件416。Referring to FIG. 4, the device 400 may include one or more of the following components: a processing component 402, a memory 404, a power component 406, a multimedia component 408, an audio component 410, an input/output (I/O) interface 412, a sensor component 414, and communication component 416 .
处理组件402通常控制装置400的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件402可以包括一个或多个处理器420来执行指令,以完成上述目标跟踪的方法的全部或部分步骤。此外,处理组件402可以包括一个或多个模块,便于处理组件402和其他组件之间的交互。例如,处理组件402可以包括多媒体模块,以方便多媒体组件408和处理组件402之间的交互。The processing component 402 generally controls the overall operations of the device 400, such as those associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to complete all or part of the steps of the above method for object tracking. Additionally, processing component 402 may include one or more modules that facilitate interaction between processing component 402 and other components. For example, processing component 402 may include a multimedia module to facilitate interaction between multimedia component 408 and processing component 402 .
存储器404被配置为存储各种类型的数据以支持在装置400的操作。这些数据的示例 包括用于在装置400上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器404可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 404 is configured to store various types of data to support operations at the device 400 . Examples of such data include instructions for any application or method operating on device 400, contact data, phonebook data, messages, pictures, videos, etc. The memory 404 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
电力组件406为装置400的各种组件提供电力。电力组件406可以包括电源管理系统,一个或多个电源,及其他与为装置400生成、管理和分配电力相关联的组件。 Power component 406 provides power to various components of device 400 . Power components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 400 .
多媒体组件408包括在所述装置400和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件408包括一个前置摄像头和/或后置摄像头。当装置400处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。The multimedia component 408 includes a screen that provides an output interface between the device 400 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 408 includes a front camera and/or a rear camera. When the device 400 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
音频组件410被配置为输出和/或输入音频信号。例如,音频组件410包括一个麦克风(MIC),当装置400处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器404或经由通信组件416发送。在一些实施例中,音频组件410还包括一个扬声器,用于输出音频信号。The audio component 410 is configured to output and/or input audio signals. For example, the audio component 410 includes a microphone (MIC), which is configured to receive external audio signals when the device 400 is in operation modes, such as call mode, recording mode and voice recognition mode. Received audio signals may be further stored in memory 404 or sent via communication component 416 . In some embodiments, the audio component 410 also includes a speaker for outputting audio signals.
I/O接口412为处理组件402和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 412 provides an interface between the processing component 402 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
传感器组件414包括一个或多个传感器,用于为装置400提供各个方面的状态评估。例如,传感器组件414可以检测到装置400的打开/关闭状态,组件的相对定位,例如所述组件为装置400的显示器和小键盘,传感器组件414还可以检测装置400或装置400一个组件的位置改变,用户与装置400接触的存在或不存在,装置400方位或加速/减速和装置400的温度变化。传感器组件414可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件414还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件414还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor assembly 414 includes one or more sensors for providing status assessments of various aspects of device 400 . For example, the sensor component 414 can detect the open/closed state of the device 400, the relative positioning of components, such as the display and keypad of the device 400, and the sensor component 414 can also detect a change in the position of the device 400 or a component of the device 400 , the presence or absence of user contact with the device 400 , the device 400 orientation or acceleration/deceleration and the temperature change of the device 400 . The sensor assembly 414 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 414 may also include an optical sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 414 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
通信组件416被配置为便于装置400和其他设备之间有线或无线方式的通信。装置400可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件416经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件416还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。The communication component 416 is configured to facilitate wired or wireless communication between the apparatus 400 and other devices. The device 400 can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 416 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wide Band (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,装置400可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述目标跟踪方法。In an exemplary embodiment, apparatus 400 may be programmed by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above object tracking method.
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器404,上述指令可由装置400的处理器420执行以完成上述目标跟踪方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as the memory 404 including instructions, which can be executed by the processor 420 of the device 400 to implement the above object tracking method. For example, the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
在另一示例性实施例中,还提供一种计算机程序产品,该计算机程序产品包含能够由可编程的装置执行的计算机程序,该计算机程序具有当由该可编程的装置执行时用于执行上述的目标跟踪方法的代码部分。In another exemplary embodiment, there is also provided a computer program product comprising a computer program executable by a programmable device, the computer program having a function for performing the above-mentioned The code section of the object tracking method.
图5是根据一示例性实施例示出的一种用于目标跟踪的装置500的框图。例如,装置500可以被提供为一服务器。参照图5,装置500包括处理组件522,其进一步包括一个或多个处理器,以及由存储器532所代表的存储器资源,用于存储可由处理组件522的执行的指令,例如应用程序。存储器532中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件522被配置为执行指令,以执行上述目标跟踪方法。Fig. 5 is a block diagram of an apparatus 500 for object tracking according to an exemplary embodiment. For example, the apparatus 500 may be provided as a server. 5, apparatus 500 includes processing component 522, which further includes one or more processors, and memory resources represented by memory 532 for storing instructions executable by processing component 522, such as application programs. The application program stored in memory 532 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 522 is configured to execute instructions to perform the above object tracking method.
装置500还可以包括一个电源组件526被配置为执行装置500的电源管理,一个有线或无线网络接口550被配置为将装置500连接到网络,和一个输入输出(I/O)接口558。装置500可以操作基于存储在存储器532的操作系统,例如Windows Server TM,Mac OS X TM,Unix TM,Linux TM,FreeBSD TM或类似。 Device 500 may also include a power component 526 configured to perform power management of device 500 , a wired or wireless network interface 550 configured to connect device 500 to a network, and an input-output (I/O) interface 558 . The apparatus 500 may operate based on an operating system stored in the memory 532, such as Windows Server , Mac OS X , Unix , Linux , FreeBSD or the like.
本领域技术人员在考虑说明书及实践本公开后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

  1. 一种目标跟踪方法,其特征在于,包括:A target tracking method, characterized in that, comprising:
    获取图像采集序列,所述图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的;Obtaining an image acquisition sequence, the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
    对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,所述第一图像为所述图像采集序列中的除首张图像以外的任一图像;performing object detection on the first image in the image acquisition sequence to obtain a 3D detection frame of the object on the first image in the target three-dimensional space and a 2D detection frame on the first image, the first The image is any image in the image acquisition sequence except the first image;
    根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述目标三维空间中的3D预测框以及在所述第一图像上的2D预测框,所述第二图像是所述第一图像在所述图像采集序列中的上一图像;Predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of tracking the target object with respect to the second image, the second image is the The previous image of the first image in the image acquisition sequence;
    根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果。Determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  2. 根据权利要求1所述的目标跟踪方法,其特征在于,所述根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果,包括:The target tracking method according to claim 1, wherein, according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame, it is determined for the first image pair The tracking results of the tracking of the target object include:
    根据所述3D预测框和每一所述3D检测框之间的第一交并比值和/或距离值,从各所述3D检测框中确定与所述3D预测框匹配的目标3D检测框;determining a target 3D detection frame matching the 3D prediction frame from each of the 3D detection frames according to the first intersection ratio and/or distance value between the 3D prediction frame and each of the 3D detection frames;
    将所述目标三维空间中对应所述目标3D检测框的对象作为所述目标对象,并将所述目标3D检测框在所述目标三维空间中的位置信息作为所述目标对象的3D跟踪位置信息,所述跟踪结果包括所述3D跟踪位置信息。Taking the object corresponding to the target 3D detection frame in the target three-dimensional space as the target object, and using the position information of the target 3D detection frame in the target three-dimensional space as the 3D tracking position information of the target object , the tracking result includes the 3D tracking location information.
  3. 根据权利要求2所述的目标跟踪方法,其特征在于,所述根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果,还包括:The target tracking method according to claim 2, wherein, according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame, it is determined for the first image pair The tracking result of the tracking of the target object also includes:
    针对各所述3D检测框中未匹配到所述3D预测框的3D检测框,确定与该3D检测框对应同一对象的2D检测框,并根据该2D检测框与所述2D预测框之间的第二交并比值,确定与所述2D预测框匹配的目标2D检测框;For a 3D detection frame that does not match the 3D prediction frame in each of the 3D detection frames, determine a 2D detection frame corresponding to the same object as the 3D detection frame, and The second intersection ratio is used to determine the target 2D detection frame matching the 2D prediction frame;
    将所述第一图像上对应所述目标2D检测框的对象作为所述目标对象,并将所述目标2D检测框在所述第一图像上的位置信息作为所述目标对象的2D跟踪位置信息,所述跟踪结果包括所述2D跟踪位置信息。Taking the object corresponding to the target 2D detection frame on the first image as the target object, and using the position information of the target 2D detection frame on the first image as the 2D tracking position information of the target object , the tracking result includes the 2D tracking location information.
  4. 根据权利要求1所述的目标跟踪方法,其特征在于,所述跟踪结果包括所述目标对象对应所述第二图像的3D跟踪位置信息、2D跟踪位置信息以及所述目标对象的运动数据;The target tracking method according to claim 1, wherein the tracking result includes 3D tracking position information of the target object corresponding to the second image, 2D tracking position information and motion data of the target object;
    所述根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述三维空间中的3D预测框以及在所述第一图像上的2D预测框,包括:The predicting the 3D prediction frame of the target object in the three-dimensional space and the 2D prediction frame on the first image according to the tracking result of tracking the target object with respect to the second image includes:
    根据所述运动数据更新跟踪器;updating a tracker based on said motion data;
    将所述3D跟踪位置信息和2D跟踪位置信息输入到更新后的跟踪器中,得到所述跟踪器输出的所述3D预测框以及所述2D预测框。The 3D tracking position information and the 2D tracking position information are input into the updated tracker to obtain the 3D prediction frame and the 2D prediction frame output by the tracker.
  5. 根据权利要求4所述的目标跟踪方法,其特征在于,所述运动数据包括所述目标对象在图像上的位置变化率,以及所述目标对象在所述目标三维空间中的速度以及加速度;The target tracking method according to claim 4, wherein the motion data includes the rate of change of the position of the target object on the image, and the velocity and acceleration of the target object in the three-dimensional space of the target;
    所述跟踪器能够基于所述位置变化率以及所述2D跟踪位置信息输出所述2D预测框,以及基于所述速度、所述加速度以及所述3D跟踪位置信息输出所述3D预测框。The tracker is capable of outputting the 2D predicted frame based on the position change rate and the 2D tracked position information, and outputting the 3D predicted frame based on the velocity, the acceleration, and the 3D tracked position information.
  6. 根据权利要求1所述的目标跟踪方法,其特征在于,所述第一图像包括多个采集图像,所述多个采集图像是多个图像采集设备在同一采集时刻采集到的图像;The target tracking method according to claim 1, wherein the first image comprises a plurality of acquired images, and the plurality of acquired images are images acquired by a plurality of image acquisition devices at the same acquisition time;
    所述对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,包括:The performing object detection on the first image in the image acquisition sequence to obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image includes:
    对所述多个采集图像均进行对象检测,得到每一所述采集图像上的对象在每一所述图像采集设备的三维空间中的3D检测框以及在所述采集图像上的2D检测框;performing object detection on the multiple captured images to obtain a 3D detection frame of the object on each of the captured images in the three-dimensional space of each of the image capture devices and a 2D detection frame on the captured images;
    根据每一所述图像采集设备的外参将位于不同图像采集设备的三维空间中的3D检测框映射到同一目标坐标系下,所述目标三维空间是所述目标坐标系限定的空间;mapping the 3D detection frames located in the three-dimensional spaces of different image acquisition devices to the same target coordinate system according to the external parameters of each of the image capture devices, and the target three-dimensional space is a space defined by the target coordinate system;
    对所述多个采集图像进行拼接,得到拼接图像,并将所述多个采集图像上的2D检测框映射到所述拼接图像上,所述第一图像上的2D检测框为所述拼接图像上的2D检测框。Stitching the multiple collected images to obtain a spliced image, and mapping the 2D detection frames on the multiple collected images to the spliced image, where the 2D detection frame on the first image is the spliced image 2D detection boxes on .
  7. 根据权利要求1所述的目标跟踪方法,在所述根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果之前,所述方法还包括:According to the target tracking method according to claim 1, in said according to said 3D detection frame, said 2D detection frame, said 3D prediction frame and said 2D prediction frame, determining the pair of said target for said first image Before the tracking result of the object being tracked, the method further includes:
    对所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框执行非极大值抑制处理。A non-maximum suppression process is performed on the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  8. 一种目标跟踪装置,其特征在于,包括:A target tracking device, characterized in that it comprises:
    获取模块,被配置为获取图像采集序列,所述图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的;The acquisition module is configured to acquire an image acquisition sequence, the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
    检测模块,被配置为对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,所述第一图像为所述图像采集序列中的除首张图像以外的任一图像;The detection module is configured to perform object detection on the first image in the image acquisition sequence, and obtain the 3D detection frame of the object on the first image in the target three-dimensional space and the 2D detection frame on the first image box, the first image is any image in the image acquisition sequence except the first image;
    预测模块,被配置为根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述目标三维空间中的3D预测框以及在所述第一图像上的2D预测框,所述第二图像是所述第一图像在所述图像采集序列中的上一图像;The prediction module is configured to predict a 3D prediction frame of the target object in the target three-dimensional space and a 2D prediction frame on the first image according to the tracking result of the target object being tracked with respect to the second image, The second image is a previous image of the first image in the image acquisition sequence;
    确定模块,被配置为根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果。The determination module is configured to determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  9. 一种目标跟踪装置,其特征在于,包括:A target tracking device, characterized in that it comprises:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器被配置为:Wherein, the processor is configured as:
    获取图像采集序列,所述图像采集序列是根据图像采集设备在多个采集时刻的采集图像得到的;Obtaining an image acquisition sequence, the image acquisition sequence is obtained according to the images acquired by the image acquisition device at multiple acquisition moments;
    对所述图像采集序列中的第一图像进行对象检测,得到所述第一图像上的对象在目标三维空间中的3D检测框以及在所述第一图像上的2D检测框,所述第一图像为所述图像采集序列中的除首张图像以外的任一图像;performing object detection on the first image in the image acquisition sequence to obtain a 3D detection frame of the object on the first image in the target three-dimensional space and a 2D detection frame on the first image, the first The image is any image in the image acquisition sequence except the first image;
    根据针对第二图像对目标对象进行跟踪的跟踪结果,预测所述目标对象在所述目标三维空间中的3D预测框以及在所述第一图像上的2D预测框,所述第二图像是所述第一图像在所述图像采集序列中的上一图像;Predict the 3D prediction frame of the target object in the target three-dimensional space and the 2D prediction frame on the first image according to the tracking result of tracking the target object with respect to the second image, the second image is the The previous image of the first image in the image acquisition sequence;
    根据所述3D检测框、所述2D检测框、所述3D预测框以及所述2D预测框,确定针对所述第一图像对所述目标对象进行跟踪的跟踪结果。Determine a tracking result of tracking the target object with respect to the first image according to the 3D detection frame, the 2D detection frame, the 3D prediction frame, and the 2D prediction frame.
  10. 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,该程序指令被处理器执行时实现权利要求1~7中任一项所述方法的步骤。A computer-readable storage medium, on which computer program instructions are stored, characterized in that, when the program instructions are executed by a processor, the steps of the method described in any one of claims 1-7 are implemented.
  11. 一种计算机程序产品,该计算机程序产品包含能够由可编程的装置执行的计算机程序,其特征在于,该计算机程序具有当由该可编程的装置执行时用于实现权利要求1-7中任一项所述方法的步骤。A computer program product comprising a computer program executable by a programmable device, characterized in that the computer program has functions for implementing any one of claims 1-7 when executed by the programmable device steps of the method described in the item.
PCT/CN2022/090574 2021-11-05 2022-04-29 Target tracking method and apparatus, and storage medium WO2023077754A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111308752.0 2021-11-05
CN202111308752.0A CN114549578A (en) 2021-11-05 2021-11-05 Target tracking method, device and storage medium

Publications (1)

Publication Number Publication Date
WO2023077754A1 true WO2023077754A1 (en) 2023-05-11

Family

ID=81668543

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090574 WO2023077754A1 (en) 2021-11-05 2022-04-29 Target tracking method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN114549578A (en)
WO (1) WO2023077754A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765031A (en) * 2024-02-21 2024-03-26 四川盎芯科技有限公司 image multi-target pre-tracking method and system for edge intelligent equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116468967B (en) * 2023-04-18 2024-04-16 北京百度网讯科技有限公司 Sample image screening method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276783A (en) * 2019-04-23 2019-09-24 上海高重信息科技有限公司 A kind of multi-object tracking method, device and computer system
CN110378259A (en) * 2019-07-05 2019-10-25 桂林电子科技大学 A kind of multiple target Activity recognition method and system towards monitor video
CN112184772A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Target tracking method and device
US20210112238A1 (en) * 2020-12-22 2021-04-15 Intel Corporation Method and system of image processing with multi-object multi-view association
CN112883819A (en) * 2021-01-26 2021-06-01 恒睿(重庆)人工智能技术研究院有限公司 Multi-target tracking method, device, system and computer readable storage medium
CN113468950A (en) * 2021-05-12 2021-10-01 东风汽车股份有限公司 Multi-target tracking method based on deep learning in unmanned driving scene

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6310288B2 (en) * 2014-03-20 2018-04-11 日本ユニシス株式会社 Image processing apparatus and three-dimensional object tracking method
US10634778B2 (en) * 2014-10-21 2020-04-28 Texas Instruments Incorporated Camera assisted tracking of objects in a radar system
CN108509848B (en) * 2018-02-13 2019-03-05 视辰信息科技(上海)有限公司 The real-time detection method and system of three-dimension object
CN110163889A (en) * 2018-10-15 2019-08-23 腾讯科技(深圳)有限公司 Method for tracking target, target tracker, target following equipment
CN110782492B (en) * 2019-10-08 2023-03-28 三星(中国)半导体有限公司 Pose tracking method and device
CN113228103A (en) * 2020-07-27 2021-08-06 深圳市大疆创新科技有限公司 Target tracking method, device, unmanned aerial vehicle, system and readable storage medium
CN112507949A (en) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 Target tracking method and device, road side equipment and cloud control platform

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276783A (en) * 2019-04-23 2019-09-24 上海高重信息科技有限公司 A kind of multi-object tracking method, device and computer system
CN110378259A (en) * 2019-07-05 2019-10-25 桂林电子科技大学 A kind of multiple target Activity recognition method and system towards monitor video
CN112184772A (en) * 2020-09-30 2021-01-05 深兰人工智能(深圳)有限公司 Target tracking method and device
US20210112238A1 (en) * 2020-12-22 2021-04-15 Intel Corporation Method and system of image processing with multi-object multi-view association
CN112883819A (en) * 2021-01-26 2021-06-01 恒睿(重庆)人工智能技术研究院有限公司 Multi-target tracking method, device, system and computer readable storage medium
CN113468950A (en) * 2021-05-12 2021-10-01 东风汽车股份有限公司 Multi-target tracking method based on deep learning in unmanned driving scene

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117765031A (en) * 2024-02-21 2024-03-26 四川盎芯科技有限公司 image multi-target pre-tracking method and system for edge intelligent equipment
CN117765031B (en) * 2024-02-21 2024-05-03 四川盎芯科技有限公司 Image multi-target pre-tracking method and system for edge intelligent equipment

Also Published As

Publication number Publication date
CN114549578A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
US9953506B2 (en) Alarming method and device
CN106651955B (en) Method and device for positioning target object in picture
CN111105454B (en) Method, device and medium for obtaining positioning information
WO2020156341A1 (en) Method and apparatus for detecting moving target, and electronic device and storage medium
WO2023077754A1 (en) Target tracking method and apparatus, and storage medium
KR101712301B1 (en) Method and device for shooting a picture
JP2017538300A (en) Unmanned aircraft shooting control method, shooting control apparatus, electronic device, computer program, and computer-readable storage medium
WO2019006769A1 (en) Following-photographing method and device for unmanned aerial vehicle
CN110853095B (en) Camera positioning method and device, electronic equipment and storage medium
WO2022021872A1 (en) Target detection method and apparatus, electronic device, and storage medium
CN114267041B (en) Method and device for identifying object in scene
CN113643356A (en) Camera pose determination method, camera pose determination device, virtual object display method, virtual object display device and electronic equipment
WO2022099988A1 (en) Object tracking method and apparatus, electronic device, and storage medium
US20230048952A1 (en) Image registration method and electronic device
CN110012208B (en) Photographing focusing method and device, storage medium and electronic equipment
US20220345621A1 (en) Scene lock mode for capturing camera images
CN114430457A (en) Shooting method, shooting device, electronic equipment and storage medium
WO2019233299A1 (en) Mapping method and apparatus, and computer readable storage medium
WO2023240401A1 (en) Camera calibration method and apparatus, and readable storage medium
CN114898074A (en) Three-dimensional information determination method and device, electronic equipment and storage medium
CN117974772A (en) Visual repositioning method, device and storage medium
CN117115244A (en) Cloud repositioning method, device and storage medium
CN118154678A (en) Image processing method, device, medium, equipment and chip
CN117710779A (en) Stability coefficient determination method and device, electronic equipment and storage medium
CN115861431A (en) Camera registration method and device, communication equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22888793

Country of ref document: EP

Kind code of ref document: A1