WO2021036373A1 - 目标跟踪方法、装置和计算机可读存储介质 - Google Patents

目标跟踪方法、装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2021036373A1
WO2021036373A1 PCT/CN2020/092556 CN2020092556W WO2021036373A1 WO 2021036373 A1 WO2021036373 A1 WO 2021036373A1 CN 2020092556 W CN2020092556 W CN 2020092556W WO 2021036373 A1 WO2021036373 A1 WO 2021036373A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
position information
current frame
bounding box
frame
Prior art date
Application number
PCT/CN2020/092556
Other languages
English (en)
French (fr)
Inventor
朱兆琪
董玉新
陈宇
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Publication of WO2021036373A1 publication Critical patent/WO2021036373A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a target tracking method, device, and computer-readable storage medium.
  • Target tracking technology is currently an important research direction in the field of computer vision.
  • Target tracking technology can be applied to various fields such as video surveillance, human-computer interaction, and unmanned driving.
  • Target tracking is to determine the target to be tracked and the position of the target in each frame in the continuous video frame, so as to obtain the motion trajectory of the target.
  • a technical problem to be solved by the present disclosure is: how to improve the efficiency of target detection and tracking in the target tracking process.
  • a target tracking method which includes: acquiring position information of a target in a current frame of the video; determining a detection area corresponding to the target in the next frame of the video according to the position information of the target in the current frame; The detection area corresponding to the target is part of the global image of the next frame; the target is detected in the detection area corresponding to the target in the next frame; the target detected in the next frame is associated with the information of the target in the current frame in order to compare the target Follow up.
  • the position information of the target includes coordinate information of the bounding box of the target
  • determining the detection area corresponding to the target in the next frame according to the position information of the target in the current frame includes: according to the coordinate information of the bounding box of the target in the current frame , The preset extension length and the preset extension width determine the coordinate information of the target in the current frame after the bounding box is expanded; according to the coordinate information of the target in the current frame after the bounding box is expanded, in the next frame, the same coordinate information represents The area is the detection area corresponding to the target.
  • determining the detection area corresponding to the target in the next frame according to the position information of the target in the current frame includes: determining the difference between the position information of the target in the previous frame of the current frame and the position information of the target in the current frame; When the difference between the position information of the target in the previous frame and the position information of the target in the current frame is less than or equal to the first preset difference, the detection area corresponding to the target in the next frame is determined according to the position information of the target in the current frame.
  • the position information of the target includes: coordinate information of the center point of the bounding box of the target; the difference between the position information of the target in the previous frame and the position information of the target in the current frame is the difference of the bounding box of the target in the previous frame. The distance between the center point and the center point of the bounding box of the target in the current frame.
  • detecting the target in the detection area corresponding to the target in the next frame includes: inputting the image of the detection area corresponding to the target in the next frame into the target detection model to obtain one or more bounding boxes output by the target detection model.
  • Position information in the case of one or more bounding boxes, the image in the bounding box is determined as the target; in the case of one or more bounding boxes, for each bounding box, the The position information is compared with the position information of the target in the current frame, and if the difference between the position information of the bounding box and the position information of the target in the current frame is less than the second preset difference, the image in the bounding box is determined For the goal.
  • a target tracking method including: obtaining the position information of the target in the current frame of the video; determining the position information of the target in the previous frame of the current frame and the position information of the target in the current frame
  • the target corresponding to the target in the next frame is determined according to the position information of the target in the current frame Detection area, the detection area corresponding to the target is part of the global image of the next frame, and the target is detected in the detection area corresponding to the target in the next frame
  • the difference between the position information of the target in the previous frame and the position information of the target in the current frame In the case of greater than the first preset gap, the target is detected in the global image of the next frame; the target detected in the next frame is associated with the information of the target in the current frame to track the target.
  • the position information of the target includes coordinate information of the bounding box of the target
  • determining the detection area corresponding to the target in the next frame according to the position information of the target in the current frame includes: according to the coordinate information of the bounding box of the target in the current frame , The preset extension length and the preset extension width determine the coordinate information of the target in the current frame after the bounding box is expanded; according to the coordinate information of the target in the current frame after the bounding box is expanded, in the next frame, the same coordinate information represents The area is the detection area corresponding to the target.
  • the position information of the target includes: coordinate information of the center point of the bounding box of the target; the difference between the position information of the target in the previous frame and the position information of the target in the current frame is the difference of the bounding box of the target in the previous frame. The distance between the center point and the center point of the bounding box of the target in the current frame.
  • detecting the target in the detection area corresponding to the target in the next frame includes: inputting the image of the detection area corresponding to the target in the next frame into the target detection model to obtain one or more bounding boxes output by the target detection model.
  • Position information in the case of one or more bounding boxes, the image in the bounding box is determined as the target; in the case of one or more bounding boxes, for each bounding box, the The position information is compared with the position information of the target in the current frame, and if the difference between the position information of the bounding box and the position information of the target in the current frame is less than the second preset difference, the image in the bounding box is determined For the goal.
  • a target tracking device which includes: an information acquisition module for acquiring position information of a target in a current frame of a video; a detection area determination module for determining the position of a target in the current frame The information determines the detection area corresponding to the target in the next frame of the video, and the detection area corresponding to the target is part of the global image of the next frame; the target detection module is used to detect the target in the detection area corresponding to the target in the next frame; information association The module is used to associate the target detected in the next frame with the target information in the current frame so as to track the target.
  • the position information of the target includes the coordinate information of the bounding box of the target; the detection area determination module is used to determine the current frame according to the coordinate information of the bounding box of the target in the current frame, a preset extension length and a preset extension width Based on the expanded coordinate information of the bounding box of the target in the current frame, the area indicated by the same coordinate information is used as the detection area corresponding to the target in the next frame according to the expanded coordinate information of the bounding box of the target in the current frame.
  • the detection area determination module is used to determine the difference between the position information of the target in the previous frame of the current frame and the position information of the target in the current frame, and the position information of the target in the previous frame is compared with the position information of the target in the current frame. In the case that the gap of the position information is less than or equal to the first preset gap, the detection area corresponding to the target in the next frame is determined according to the position information of the target in the current frame.
  • the position information of the target includes: coordinate information of the center point of the bounding box of the target; the difference between the position information of the target in the previous frame and the position information of the target in the current frame is the difference of the bounding box of the target in the previous frame. The distance between the center point and the center point of the bounding box of the target in the current frame.
  • the target detection module is used to input the image of the detection area corresponding to the target in the next frame into the target detection model to obtain the position information of one or more bounding boxes output by the target detection model; In the case of one box, the image in the bounding box is determined as the target; in the case of one or more bounding boxes, for each bounding box, the position information of the bounding box and the position of the target in the current frame The information is compared, and when the difference between the position information of the bounding box and the position information of the target in the current frame is less than a second preset difference, the image in the bounding box is determined as the target.
  • a target tracking device including: an acquisition module for acquiring position information of a target in a current frame of a video; a gap determination module for determining a target in the previous frame of the current frame The difference between the position information of the target and the position information of the target in the current frame; the first detection module is used for the case where the difference between the position information of the target in the previous frame and the position information of the target in the current frame is less than or equal to the first preset gap Next, determine the detection area corresponding to the target in the next frame according to the position information of the target in the current frame, the detection area corresponding to the target is part of the global image of the next frame, and the target is detected in the detection area corresponding to the target in the next frame; The second detection module is used to detect the target in the global image of the next frame when the difference between the position information of the target in the previous frame and the position information of the target in the current frame is greater than the first preset gap; the association module is used to The target detected
  • the first detection module is configured to determine the expanded coordinate information of the bounding box of the target in the current frame according to the coordinate information of the bounding box of the target in the current frame.
  • the extended coordinate information of the bounding box of the target in the frame uses the area indicated by the same coordinate information as the detection area corresponding to the target in the next frame.
  • the position information of the target includes: coordinate information of the center point of the bounding box of the target; the difference between the position information of the target in the previous frame and the position information of the target in the current frame is the difference of the bounding box of the target in the previous frame. The distance between the center point and the center point of the bounding box of the target in the current frame.
  • the first detection module is used to input the image of the detection area corresponding to the target in the next frame into the target detection model to obtain the position information of one or more bounding boxes output by the target detection model; When there is one bounding box, the image in the bounding box is determined as the target; when one or more bounding boxes are multiple, for each bounding box, the position information of the bounding box and the target in the current frame The position information is compared, and when the difference between the position information of the bounding box and the position information of the target in the current frame is smaller than a second preset difference, the image in the bounding box is determined as the target.
  • a target tracking device including: a memory; and a processor coupled to the memory, and the processor is configured to execute the same as in any of the foregoing embodiments based on instructions stored in the memory.
  • Target tracking method including: a memory; and a processor coupled to the memory, and the processor is configured to execute the same as in any of the foregoing embodiments based on instructions stored in the memory.
  • a computer-readable storage medium having a computer program stored thereon, wherein the program is executed by a processor to implement the target tracking method of any of the foregoing embodiments.
  • a part of the global image of the next frame is determined as the detection area corresponding to the target, and the target is detected in the detection area corresponding to the target to realize the tracking of the target. Since the target is only detected for a part of the global image during detection, the amount of data processed by the computer is reduced, so the efficiency of target detection and tracking in the target tracking process is improved.
  • Fig. 1 shows a schematic flowchart of a target tracking method according to some embodiments of the present disclosure.
  • FIG. 2 shows a schematic diagram of determining a target detection area in some embodiments of the present disclosure.
  • FIG. 3 shows a schematic flowchart of a target tracking method according to other embodiments of the present disclosure.
  • Fig. 4 shows a schematic structural diagram of a target tracking device according to some embodiments of the present disclosure.
  • Fig. 5 shows a schematic structural diagram of a target tracking device according to other embodiments of the present disclosure.
  • FIG. 6 shows a schematic structural diagram of a target tracking device according to still other embodiments of the present disclosure.
  • FIG. 7 shows a schematic structural diagram of a target tracking device according to still other embodiments of the present disclosure.
  • the target is detected in the global image of that frame, which results in low detection and target tracking efficiency and a long time. This solution is proposed. The following describes some embodiments of the target tracking method of the present disclosure with reference to FIG. 1.
  • Fig. 1 is a flowchart of some embodiments of the disclosed target tracking method. As shown in Fig. 1, the method of this embodiment includes: steps S102 to S108.
  • step S102 the position information of the target in the current frame of the video is acquired.
  • the camera continuously collects image frames during data collection, which then constitutes a video stream.
  • Opencv is used to analyze the video stream of the camera to obtain the information of each frame of the video, and perform target detection and related logic calculations on the images of each frame, thereby realizing the tracking of one or more targets (for example, human faces).
  • targets for example, human faces.
  • the method in this embodiment can be executed for each target.
  • the position information of the target is, for example, coordinate information of the bounding box of the target.
  • the bounding box of the target in the current frame can be the image of the current frame (it can be the global image of the current frame, or the image of the detection area corresponding to the target of the current frame determined based on the previous frame) after inputting the pre-trained target detection model, The output result of the target detection model.
  • the target detection model can be an existing model.
  • the target detection model is a cascade CNN (cascade convolutional neural network) model.
  • the target detection model may also be other models, as long as it is a model that performs target detection in the global image of each frame, it can be optimized by applying the solution of the present disclosure, and it is not limited to the examples given.
  • step S104 the detection area corresponding to the target in the next frame of the video is determined according to the position information of the target in the current frame.
  • the detection area corresponding to the target may belong to a part of the global image of the next frame.
  • the preset expansion length and the preset expansion width are used to determine the coordinate information of the bounding box of the target in the current frame after expansion; according to the bounding box of the target in the current frame
  • the expanded coordinate information uses the area indicated by the same coordinate information as the detection area corresponding to the target in the next frame.
  • the bounding box 104 of the target 102 in the current frame 100 is expanded according to a preset expansion length and a preset expansion width to obtain the expanded boundary
  • the frame 106 uses the image at the same position in the next frame 200 as the detection area 108 corresponding to the target 102 according to the coordinate information of the expanded bounding frame 106.
  • the preset extension length and the preset extension width may be determined according to the moving speed of the target and the time interval between the current frame and the next frame.
  • the maximum moving speed corresponding to different types of targets can be counted, and the product of the maximum moving speed of the target in the current frame and the time interval between the current frame and the next frame can be determined. Extend the bounding box of the target in the current frame along the two length directions by a length equal to the product value, and extend the bounding box of the target in the current frame along the two width directions by a width equal to the product value.
  • Different types of targets can correspond to different preset extension lengths and preset extension widths.
  • step S106 the target is detected in the detection area corresponding to the target in the next frame.
  • the image of the detection area corresponding to the target in the next frame is input into the target detection model to obtain position information of one or more bounding boxes output by the target detection model. In the case of one or more bounding boxes, the image in the bounding box is determined as the target.
  • the image in the bounding box can be directly determined as the target. It is also possible to further compare the features of the image in the bounding box with the features of the target in the current frame to determine whether the image in the bounding box is the target.
  • the image of the detection area corresponding to the target in the next frame is input into the target detection model to obtain the position information of one or more bounding boxes output by the target detection model; In this case, for each bounding box, compare the position information of the bounding box with the position information of the target in the current frame, and the difference between the position information of the bounding box and the position information of the target in the current frame is smaller than the second preset In the case of a gap, the image in the bounding box is determined as the target.
  • the position information of each bounding box in the next frame is compared with the position information of each target in the current frame to determine that each bounding box is Which goal can improve the efficiency of goal determination.
  • the distance between the coordinate of the bounding box center and the coordinate of the bounding box center of the target in the current frame can be calculated, as the bounding box position information and the position information of the target in the current frame. gap.
  • the second preset gap is determined, for example, according to the moving speed of the target and the time interval between the current frame and the next frame.
  • the features of the image and the target can be extracted by the target detection model.
  • the target corresponding to each bounding box can be determined according to the distance between the feature vector of the image in each bounding box and the feature vector of each target in the current frame.
  • the position information of the target in the detection area can be converted into the position information of the target in the next frame of the global image. That is, the coordinate information of the bounding box of the target is converted from the detection area to the coordinate information of the bounding box of the target in the global image of the next frame, so as to determine the position information of the target in each frame and realize the tracking of the target.
  • step S108 the target detected in the next frame is associated with the target information in the current frame, so as to track the target.
  • the target information is, for example, the target's identification (ID, name, etc.), and may also include target description information.
  • ID the target's identification
  • target description information For example, in the case of tracking a human face, the attributes of the human face frame of the current frame (for example, the gender, age, id number, etc.) of the human face can be inherited in the next frame.
  • a part of the global image of the next frame is determined as the detection area corresponding to the target, and the target is detected in the detection area corresponding to the target to realize the target track. Since the target is only detected for a part of the global image during detection, the amount of data processed by the computer is reduced, thus improving the efficiency of target detection and tracking in the target tracking process.
  • Fig. 3 is a flowchart of other embodiments of the disclosed target tracking method. As shown in FIG. 3, the method of this embodiment includes: performing steps S302 to S314 for each of one or more targets.
  • step S302 the position information of the target in the current frame of the video and the position information of the target in the previous frame are acquired.
  • step S304 the difference between the position information of the target in the previous frame of the video and the position information of the target in the current frame is determined.
  • the target location information includes: coordinate information of the center point of the bounding box of the target.
  • the difference between the position information of the target in the previous frame and the position information of the target in the current frame is the distance between the center point of the bounding box of the target in the previous frame and the center point of the bounding box of the target in the current frame.
  • the target detection model outputs the coordinate information of the bounding box of the target expressed as (x, y, w, h), (x, y) represents the position coordinates of the upper left corner of the bounding box, w and h respectively represent the width and height.
  • the current frame is assumed to be the k-th frame, k is a positive integer, and the coordinate information of the bounding box of the target in the current frame can be expressed as (x k , y k , w k , h k ).
  • the coordinates of the center point of the bounding box of the current frame are
  • the coordinate information of the bounding box in the previous frame can be expressed as (x k-1 , y k-1 , w k-1 , h k-1 ), and the coordinates of the center point of the bounding box in the previous frame are
  • the distance between the center point of the bounding box in the previous frame and the center point of the bounding box in the current frame can be expressed by the following formula.
  • step S306 when the difference between the position information of the target in the previous frame and the position information of the target in the current frame is less than or equal to the first preset difference, the target in the next frame is determined according to the position information of the target in the current frame Corresponding detection area.
  • the detection area corresponding to the target is determined in the next frame.
  • the method for determining the target detection area can refer to the foregoing embodiment.
  • step S308 the target is detected in the detection area corresponding to the target in the next frame.
  • step S310 if the difference between the position information of the target in the previous frame and the position information of the target in the current frame is greater than the first preset difference, the target is detected in the global image of the next frame
  • target detection is performed in the global image of the next frame. In this way, a large change in the target position can prevent the target from being accurately detected in the target detection area, thereby further improving the accuracy of detection.
  • step S312 the target detected in the next frame is associated with the target information in the current frame, so as to track the target.
  • step S31 update the next frame to the current frame, and return to step S302 to restart execution.
  • the inventor has obtained the target tracking algorithm of the present disclosure through experiments. Compared with the existing tracking algorithm that performs target detection in the global image, the calculation speed can be increased by 3-4 times.
  • the detection area corresponding to the target in the next frame or the target detection in the global image is determined.
  • the solution of the foregoing embodiment can improve the detection and tracking efficiency while ensuring the accuracy of the detection.
  • the present disclosure also provides a target tracking device, which is described below with reference to FIG. 4.
  • Fig. 4 is a structural diagram of some embodiments of the target tracking device of the present disclosure.
  • the device 40 of this embodiment includes: an information acquisition module 410, a detection area determination module 420, a target detection module 430, and an information association module 440.
  • the information acquisition module 410 is used to acquire the position information of the target in the current frame of the video.
  • the detection area determination module 420 is configured to determine the detection area corresponding to the target in the next frame of the video according to the position information of the target in the current frame, and the detection area corresponding to the target belongs to a part of the global image of the next frame.
  • the location information of the target includes: coordinate information of the bounding box of the target.
  • the detection area determining module 420 is configured to preset the extension length and the preset extension width according to the coordinate information of the bounding box of the target in the current frame, and determine the coordinate information of the bounding box of the target in the current frame after being expanded, and according to the bounds of the target in the current frame The coordinate information after the frame is expanded, the area indicated by the same coordinate information is used as the detection area corresponding to the target in the next frame.
  • the detection area determination module 420 is used to determine the difference between the position information of the target in the previous frame of the current frame and the position information of the target in the current frame, and the position information of the target in the previous frame is compared with the target in the current frame. In the case where the gap of the position information of is less than or equal to the first preset gap, the detection area corresponding to the target in the next frame is determined according to the position information of the target in the current frame.
  • the position information of the target includes: coordinate information of the center point of the bounding box of the target.
  • the difference between the position information of the target in the previous frame and the position information of the target in the current frame is the distance between the center point of the bounding box of the target in the previous frame and the center point of the bounding box of the target in the current frame.
  • the target detection module 430 is used to detect the target in the detection area corresponding to the target in the next frame.
  • the target detection module 430 is used to input the image of the detection area corresponding to the target in the next frame into the target detection model to obtain the position information of one or more bounding boxes output by the target detection model; When there is one bounding box, the image in the bounding box is determined as the target; when one or more bounding boxes are multiple, for each bounding box, the position information of the bounding box and the target in the current frame The position information is compared, and when the difference between the position information of the bounding box and the position information of the target in the current frame is smaller than a second preset difference, the image in the bounding box is determined as the target.
  • the information association module 440 is used for associating the detected target in the next frame with the information of the target in the current frame, so as to track the target.
  • the present disclosure also provides a target tracking device, which is described below with reference to FIG. 5.
  • Fig. 5 is a structural diagram of some embodiments of the target tracking device of the present disclosure.
  • the apparatus 50 of this embodiment includes: an acquisition module 510, a gap determination module 520, a first detection module 530, a second detection module 540, and an association module 550.
  • the obtaining module 510 is used to obtain the position information of the target in the current frame of the video;
  • the gap determination module 520 is used to determine the gap between the position information of the target in the previous frame of the current frame and the position information of the target in the current frame.
  • the method for calculating the difference between the position information of the target in the previous frame and the position information of the target in the current frame can refer to the foregoing embodiment.
  • the position information of the target includes: coordinate information of the center point of the bounding box of the target; the difference between the position information of the target in the previous frame and the position information of the target in the current frame is the difference of the bounding box of the target in the previous frame. The distance between the center point and the center point of the bounding box of the target in the current frame.
  • the first detection module 530 is configured to determine the next target according to the position information of the target in the current frame when the difference between the position information of the target in the previous frame and the position information of the target in the current frame is less than or equal to the first preset difference.
  • the detection area corresponding to the target in the frame, the detection area corresponding to the target belongs to a part of the global image of the next frame, and the target is detected in the detection area corresponding to the target in the next frame.
  • the first detection module 530 is configured to preset the extension length and the preset extension width according to the coordinate information of the bounding box of the target in the current frame to determine the coordinate information of the bounding box of the target in the current frame after being expanded;
  • the extended coordinate information of the bounding box of the target in the current frame uses the area indicated by the same coordinate information as the detection area corresponding to the target in the next frame.
  • the first detection module 530 is configured to input the image of the detection area corresponding to the target in the next frame into the target detection model to obtain the position information of one or more bounding boxes output by the target detection model; When there is one bounding box, the image in the bounding box is determined as the target; in the case of one or more bounding boxes, for each bounding box, the position information of the bounding box and the target in the current frame Compare the position information of the bounding box, and if the difference between the position information of the bounding box and the position information of the target in the current frame is smaller than the second preset difference, the image in the bounding box is determined as the target.
  • the second detection module 540 is configured to detect the target in the global image of the next frame when the difference between the position information of the target in the previous frame and the position information of the target in the current frame is greater than the first preset difference.
  • the associating module 550 is used for associating the target detected in the next frame with the target information in the current frame, so as to track the target.
  • the target tracking apparatus in the embodiments of the present disclosure may be implemented by various computing devices or computer systems, which are described below in conjunction with FIG. 6 and FIG. 7.
  • Fig. 6 is a structural diagram of some embodiments of the target tracking device of the present disclosure.
  • the device 60 of this embodiment includes: a memory 610 and a processor 620 coupled to the memory 610.
  • the processor 620 is configured to execute any of the implementations in the present disclosure based on instructions stored in the memory 610.
  • the target tracking method in the example is a structural diagram of some embodiments of the target tracking device of the present disclosure.
  • the memory 610 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
  • the system memory for example, stores an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • Fig. 7 is a structural diagram of other embodiments of the target tracking device of the present disclosure.
  • the apparatus 70 of this embodiment includes a memory 710 and a processor 720, which are similar to the memory 610 and the processor 620, respectively. It may also include an input/output interface 730, a network interface 740, a storage interface 750, and so on. These interfaces 730, 740, 750, and the memory 710 and the processor 720 may be connected via a bus 760, for example.
  • the input and output interface 730 provides connection interfaces for input and output devices such as a display, a mouse, a keyboard, and a touch screen.
  • the network interface 740 provides a connection interface for various networked devices, for example, it can be connected to a database server or a cloud storage server.
  • the storage interface 750 provides a connection interface for external storage devices such as SD cards and U disks.
  • the embodiments of the present disclosure can be provided as a method, a system, or a computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable non-transitory storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. .
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps configured to implement functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及一种目标跟踪方法、装置和计算机可读存储介质,涉及计算机技术领域。本公开的方法包括:获取视频的当前帧中目标的位置信息;根据当前帧中目标的位置信息确定视频的下一帧中目标对应的检测区域,目标对应的检测区域属于下一帧的全局图像的一部分;在下一帧的目标对应的检测区域中检测目标;将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。

Description

目标跟踪方法、装置和计算机可读存储介质
相关申请的交叉引用
本申请是以CN申请号为201910794278.3,申请日为2019年8月27日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及计算机技术领域,特别涉及一种目标跟踪方法、装置和计算机可读存储介质。
背景技术
目标跟踪技术是目前计算机视觉领域的一个重要研究方向。目标跟踪技术可以应用于视频监控、人机交互、无人驾驶等各个领域。
目标跟踪是在连续的视频帧中,确定所要跟踪的目标以及各个帧中目标的位置,从而得到目标的运动轨迹。
发明内容
发明人发现:目前一些目标跟踪算法中,对于每一帧图像都是在该帧的全局图像中检测目标,导致检测和目标跟踪效率较低,时间较长。
本公开所要解决的一个技术问题是:如何提高目标跟踪过程中对目标进行检测和跟踪的效率。
根据本公开的一些实施例,提供的一种目标跟踪方法,包括:获取视频的当前帧中目标的位置信息;根据当前帧中目标的位置信息确定视频的下一帧中目标对应的检测区域,目标对应的检测区域属于下一帧的全局图像的一部分;在下一帧的目标对应的检测区域中检测目标;将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。
在一些实施例中,目标的位置信息包括目标的边界框的坐标信息,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域包括:根据当前帧中目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定当前帧中目标的边界框扩展后的坐标信息;根据当前帧中目标的边界框扩展后的坐标信息,在下一帧中将相同坐标信息 所表示的区域作为目标对应的检测区域。
在一些实施例中,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域包括:确定当前帧的前一帧中目标的位置信息和当前帧中目标的位置信息的差距;在前一帧中目标的位置信息和当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域。
在一些实施例中,目标的位置信息包括:目标的边界框的中心点坐标信息;前一帧中目标的位置信息和当前帧中目标的位置信息的差距为前一帧中目标的边界框的中心点和当前帧中目标的边界框的中心点的距离。
在一些实施例中,在下一帧的目标对应的检测区域中检测目标包括:将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息;在一个或多个边界框为一个的情况下,将边界框中的图像确定为目标;在一个或多个边界框为多个的情况下,针对每个边界框,将边界框的位置信息和当前帧中目标的位置信息进行比对,在该边界框的位置信息和当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
根据本公开的另一些实施例,提供的一种目标跟踪方法,包括:获取视频的当前帧中目标的位置信息;确定当前帧的前一帧中目标的位置信息和当前帧中目标的位置信息的差距;在前一帧中目标的位置信息和当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域,目标对应的检测区域属于下一帧的全局图像的一部分,在下一帧的目标对应的检测区域中检测目标;在前一帧中目标的位置信息和当前帧中目标的位置信息的差距大于第一预设差距的情况下,在下一帧的全局图像中检测目标;将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。
在一些实施例中,目标的位置信息包括目标的边界框的坐标信息,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域包括:根据当前帧中目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定当前帧中目标的边界框扩展后的坐标信息;根据当前帧中目标的边界框扩展后的坐标信息,在下一帧中将相同坐标信息所表示的区域作为目标对应的检测区域。
在一些实施例中,目标的位置信息包括:目标的边界框的中心点坐标信息;前一帧中目标的位置信息和当前帧中目标的位置信息的差距为前一帧中目标的边界框的中心点和当前帧中目标的边界框的中心点的距离。
在一些实施例中,在下一帧的目标对应的检测区域中检测目标包括:将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息;在一个或多个边界框为一个的情况下,将边界框中的图像确定为目标;在一个或多个边界框为多个的情况下,针对每个边界框,将边界框的位置信息和当前帧中目标的位置信息进行比对,在该边界框的位置信息和当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
根据本公开的又一些实施例,提供的一种目标跟踪装置,包括:信息获取模块,用于获取视频的当前帧中目标的位置信息;检测区域确定模块,用于根据当前帧中目标的位置信息确定视频的下一帧中目标对应的检测区域,目标对应的检测区域属于下一帧的全局图像的一部分;目标检测模块,用于在下一帧的目标对应的检测区域中检测目标;信息关联模块,用于将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。
在一些实施例中,目标的位置信息包括目标的边界框的坐标信息;检测区域确定模块用于根据当前帧中目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定当前帧中目标的边界框扩展后的坐标信息,根据当前帧中目标的边界框扩展后的坐标信息,在下一帧中将相同坐标信息所表示的区域作为目标对应的检测区域。
在一些实施例中,检测区域确定模块用于确定当前帧的前一帧中目标的位置信息和当前帧中目标的位置信息的差距,在前一帧中目标的位置信息和当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域。
在一些实施例中,目标的位置信息包括:目标的边界框的中心点坐标信息;前一帧中目标的位置信息和当前帧中目标的位置信息的差距为前一帧中目标的边界框的中心点和当前帧中目标的边界框的中心点的距离。
在一些实施例中,目标检测模块用于将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息;在一个或多个边界框为一个的情况下,将边界框中的图像确定为目标;在一个或多个边界框为多个的情况下,针对每个边界框,将边界框的位置信息和当前帧中目标的位置信息进行比对,在该边界框的位置信息和当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
根据本公开的再一些实施例,提供的一种目标跟踪装置,包括:获取模块,用于 获取视频的当前帧中目标的位置信息;差距确定模块,用于确定当前帧的前一帧中目标的位置信息和当前帧中目标的位置信息的差距;第一检测模块,用于在前一帧中目标的位置信息和当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域,目标对应的检测区域属于下一帧的全局图像的一部分,在下一帧的目标对应的检测区域中检测目标;第二检测模块,用于在前一帧中目标的位置信息和当前帧中目标的位置信息的差距大于第一预设差距的情况下,在下一帧的全局图像中检测目标;关联模块,用于将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。
在一些实施例中,第一检测模块用于根据当前帧中目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定当前帧中目标的边界框扩展后的坐标信息;根据当前帧中目标的边界框扩展后的坐标信息,在下一帧中将相同坐标信息所表示的区域作为目标对应的检测区域。
在一些实施例中,目标的位置信息包括:目标的边界框的中心点坐标信息;前一帧中目标的位置信息和当前帧中目标的位置信息的差距为前一帧中目标的边界框的中心点和当前帧中目标的边界框的中心点的距离。
在一些实施例中,第一检测模块用于将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息;在一个或多个边界框为一个的情况下,将边界框中的图像确定为目标;在一个或多个边界框为多个的情况下,针对每个边界框,将边界框的位置信息和当前帧中目标的位置信息进行比对,在该边界框的位置信息和当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
根据本公开的又一些实施例,提供的一种目标跟踪装置,包括:存储器;以及耦接至存储器的处理器,处理器被配置为基于存储在存储器中的指令,执行如前述任意实施例的目标跟踪方法。
根据本公开的再一些实施例,提供的一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现前述任意实施例的目标跟踪方法。
本公开中基于视频的当前帧中目标的位置信息,从下一帧的全局图像中确定一部分为目标对应的检测区域,在目标对应的检测区域中对目标进行检测,实现对目标的跟踪。由于检测时只针对全局图像的一部分对目标进行检测,计算机处理的数据量减少了,因此提高了目标跟踪过程中对目标进行检测和跟踪的效率。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本申请的一部分,本公开的示意性实施例及其说明被配置为解释本公开,并不构成对本公开的不当限定。在附图中:
图1示出本公开的一些实施例的目标跟踪方法的流程示意图。
图2示出本公开的一些实施例的确定目标检测区域的示意图。
图3示出本公开的另一些实施例的目标跟踪方法的流程示意图。
图4示出本公开的一些实施例的目标跟踪装置的结构示意图。
图5示出本公开的另一些实施例的目标跟踪装置的结构示意图。
图6示出本公开的又一些实施例的目标跟踪装置的结构示意图。
图7示出本公开的再一些实施例的目标跟踪装置的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
针对目前一些目标跟踪算法中,对于每一帧图像,都是在该帧的全局图像中检测目标,导致检测和目标跟踪效率较低,时间较长的问题,提出本方案。下面结合图1描述本公开的目标跟踪方法的一些实施例。
图1为本公开目标跟踪方法一些实施例的流程图。如图1所示,该实施例的方法包括:步骤S102~S108。
在步骤S102中,获取视频的当前帧中目标的位置信息。
摄像头在进行数据采集时是不断的采集图像帧,进而构成了视频流。例如,通过Opencv对摄像头的视频流进行解析,获取视频的各帧的信息,将各帧的图像进行目标检测及相关逻辑计算,从而实现对一个或多个目标(例如,人脸)的跟踪。在目标 为多个的情况下,针对每个目标都可以执行本实施例中的方法。
目标的位置信息例如为目标的边界框(Bounding Box)的坐标信息。当前帧中目标的边界框可以是将当前帧的图像(可以是当前帧的全局图像,或者基于前一帧确定的当前帧的目标对应的检测区域的图像)输入预先训练的目标检测模型后,目标检测模型输出的结果。目标检测模型可以采用现有的模型,例如,目标为人脸的情况下,目标检测模型为cascade CNN(级联卷积神经网络)模型。目标检测模型也可以是其他模型,只要是在各帧的全局图像中进行目标检测的模型都可以应用本公开的方案进行优化,不限于所举示例。
在步骤S104中,根据当前帧中目标的位置信息确定视频的下一帧中目标对应的检测区域。
目标对应的检测区域可以属于下一帧的全局图像的一部分。在一些实施例中,根据当前帧中目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定当前帧中目标的边界框扩展后的坐标信息;根据当前帧中目标的边界框扩展后的坐标信息,在下一帧中将相同坐标信息所表示的区域作为目标对应的检测区域。
如图2所示,确定了当前帧100中目标102的边界框104后,将当前帧100中的目标102的边界框104按照预设扩展长度和预设扩展宽度进行扩展,得到扩展后的边界框106,根据扩展后的边界框106坐标信息将下一帧200中相同位置的图像作为目标102对应的检测区域108。预设扩展长度和预设扩展宽度可以是根据目标的移动速度,当前帧和下一帧的时间间隔确定的。例如,可以统计不同类别的目标对应的最大移动速度,确定当前帧中目标的最大移动速度与当前帧和下一帧的时间间隔的乘积。将当前帧中目标的边界框沿着长度的两个方向分别扩展与该乘积值相等的长度,将当前帧中目标的边界框沿着宽度的两个方向分别扩展与该乘积值相等的宽度。不同类别的目标可以对应不同的预设扩展长度和预设扩展宽度。
由于不同帧之间的间隔时间很短,目标的移动距离也会较短,因此,按照上述方式确定下一帧中目标检测区域较为准确。
在步骤S106中,在下一帧的目标对应的检测区域中检测目标。
可以从下一帧的全局图像中进行抠图,将目标对应的检测区域的图像提取出来。在一些实施例中,将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息。在一个或多个边界框为一个的情况下,将边界框中的图像确定为目标。
如果仅对一个目标进行跟踪,且目标检测模型输出的边界框仅有一个,可以直接将边界框中的图像确定为该目标。也可以进一步将边界框中图像的特征与当前帧中目标的特征进行比对,确定边界框中的图像是否为该目标。
在一些实施例中,将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息;在一个或多个边界框为多个的情况下,针对每个边界框,将边界框的位置信息和当前帧中目标的位置信息进行比对,在该边界框的位置信息和当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
基于相邻帧之间目标的位置移动不会太大的原理,将下一帧中每个边界框的位置信息和当前帧中每个目标的位置信息进行比对,从而确定各个边界框分别为哪一个目标,可以提高目标确定的效率。针对每个边界框和当前帧中每个目标,可以计算边界框中心的坐标与当前帧中目标的边界框中心的坐标的距离,作为该边界框位置信息和当前帧中该目标的位置信息的差距。第二预设差距例如根据目标的移动速度,当前帧和下一帧的时间间隔确定。
也可以将每个边界框中图像的特征和当前帧中每个目标的特征进行比对,确定各个边界框对应的目标。图像的特征和目标的特征可以由目标检测模型提取。可以根据每个边界框中图像的特征向量和当前帧中每个目标的特征向量的距离,确定各个边界框对应的目标。
在目标对应的检测区域中检测目标的位置信息后,可以将检测区域中目标的位置信转换为下一帧全局图像目标的位置信息。即将目标的边界框的坐标信息由检测区域进行坐标转换,转换为下一帧的全局图像中目标的边界框的坐标信息,以便确定各帧中目标的位置信息,实现对目标的跟踪。
在步骤S108中,将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。
目标的信息例如为目标的标识(ID、名称等),还可以包括目标的描述信息。例如,对人脸进行跟踪的情况下,可以把当前帧的人脸框属性(例如,人脸的性别、年龄、id号等)在下一帧中继承下来。
上述实施例的方法中基于视频的当前帧中目标的位置信息,从下一帧的全局图像中确定一部分为目标对应的检测区域,在目标对应的检测区域中对目标进行检测,实现对目标的跟踪。由于检测时只针对全局图像的一部分对目标进行检测,计算机处理 的数据量减少了,因此提高了目标跟踪过程中对目标进行检测和跟踪的效率。
下面结合图3描述本公开目标跟踪算法的另一些实施例,相对于前述实施例可以进一步提高目标检测的准确度。
图3为本公开目标跟踪方法另一些实施例的流程图。如图3所示,该实施例的方法包括:针对一个或多个目标中的每一个,执行步骤S302~S314。
在步骤S302中,获取视频的当前帧中目标的位置信息和前一帧中目标的位置信息。
在步骤S304中,确定视频的前一帧中目标的位置信息和当前帧中目标的位置信息的差距。
在一些实施例中,目标位置信息包括:目标的边界框的中心点坐标信息。前一帧中目标的位置信息和当前帧中目标的位置信息的差距为前一帧中目标的边界框的中心点和当前帧中目标的边界框的中心点的距离。
例如,目标检测模型输出目标的边界框的坐标信息表示为(x,y,w,h),(x,y)代表边界框框左上角点的位置坐标,w和h分别表示边界框的宽度和高度。当前帧假设为第k帧,k为正整数,当前帧中目标的边界框的坐标信息可以表示为(x k,y k,w k,h k)。则当前帧边界框的中心点坐标为
Figure PCTCN2020092556-appb-000001
Figure PCTCN2020092556-appb-000002
前一帧中边界框的坐标信息可以表示为(x k-1,y k-1,w k-1,h k-1),前一帧中边界框的中心点坐标为
Figure PCTCN2020092556-appb-000003
前一帧中边界框的中心点与当前帧中边界框的中心点的距离可以采用以下公式表示。
Figure PCTCN2020092556-appb-000004
在步骤S306中,在前一帧中目标的位置信息和当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域。
即前一帧和当前帧中目标的位置如果变化不大,则在下一帧中确定目标对应检测区域。确定目标检测区域的方法可以参考前述实施例。
在步骤S308中,在下一帧的目标对应的检测区域中检测目标。
在步骤S310中,在前一帧中目标的位置信息和当前帧中目标的位置信息的差距大于第一预设差距的情况下,在下一帧的全局图像中检测目标
如果前一帧和当前帧中目标的位置变化较大,则在下一帧的全局图像中进行目标检测。这样可以避免目标位置变化大导致在目标检测区域中无法准确检测到该目标, 进一步提高检测的准确性。
在步骤S312中,将下一帧中检测到的目标与当前帧中的目标的信息进行关联,以便对目标进行跟踪。
在步骤S314中,将下一帧更新为当前帧,返回步骤S302重新开始执行。
发明人经过实验得出本公开的目标跟踪算法,相对于现有的在全局图像中进行目标检测的跟踪算法,计算速度可以提高3-4倍。
上述实施例的方案,通过相邻帧之间目标的位置变化,确定在下一帧中在目标对应的检测区域或者在全局图像中对目标进行检测。上述实施例的方案能够在提高检测和跟踪效率的同时,保证检测的准确性。
本公开还提供一种目标跟踪装置,下面结合图4进行描述。
图4为本公开目标跟踪装置的一些实施例的结构图。如图4所示,该实施例的装置40包括:信息获取模块410,检测区域确定模块420,目标检测模块430,信息关联模块440。
信息获取模块410,用于获取视频的当前帧中目标的位置信息。
检测区域确定模块420,用于根据当前帧中目标的位置信息确定视频的下一帧中目标对应的检测区域,目标对应的检测区域属于下一帧的全局图像的一部分。
在一些实施例中,目标的位置信息包括:目标的边界框的坐标信息。检测区域确定模块420用于根据当前帧中目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定当前帧中目标的边界框扩展后的坐标信息,根据当前帧中目标的边界框扩展后的坐标信息,在下一帧中将相同坐标信息所表示的区域作为目标对应的检测区域。
在一些实施例中,检测区域确定模块420用于确定当前帧的前一帧中目标的位置信息和当前帧中目标的位置信息的差距,在前一帧中目标的位置信息和当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区。
在一些实施例中,目标的位置信息包括:目标的边界框的中心点坐标信息。前一帧中目标的位置信息和当前帧中目标的位置信息的差距为前一帧中目标的边界框的中心点和当前帧中目标的边界框的中心点的距离。
目标检测模块430,用于在下一帧的目标对应的检测区域中检测目标。
在一些实施例中,目标检测模块430用于将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息;在一个 或多个边界框为一个的情况下,将边界框中的图像确定为目标;在一个或多个边界框为多个的情况下,针对每个边界框,将边界框的位置信息和当前帧中目标的位置信息进行比对,在该边界框的位置信息和当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
信息关联模块440,用于将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。
本公开还提供一种目标跟踪装置,下面结合图5进行描述。
图5为本公开目标跟踪装置的一些实施例的结构图。如图5所示,该实施例的装置50包括:获取模块510,差距确定模块520,第一检测模块530,第二检测模块540,关联模块550。
获取模块510,用于获取视频的当前帧中目标的位置信息;
差距确定模块520,用于确定当前帧的前一帧中目标的位置信息和当前帧中目标的位置信息的差距。
前一帧中目标的位置信息和当前帧中目标的位置信息的差距的计算方法可以参考前述实施例。在一些实施例中,目标的位置信息包括:目标的边界框的中心点坐标信息;前一帧中目标的位置信息和当前帧中目标的位置信息的差距为前一帧中目标的边界框的中心点和当前帧中目标的边界框的中心点的距离。
第一检测模块530,用于在前一帧中目标的位置信息和当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据当前帧中目标的位置信息确定下一帧中目标对应的检测区域,目标对应的检测区域属于下一帧的全局图像的一部分,在下一帧的目标对应的检测区域中检测目标。
目标对应的检测区域的确定方法可以参考前述实施例。在一些实施例中,第一检测模块530用于根据当前帧中目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定当前帧中目标的边界框扩展后的坐标信息;根据当前帧中目标的边界框扩展后的坐标信息,在下一帧中将相同坐标信息所表示的区域作为目标对应的检测区域。
在一些实施例中,第一检测模块530用于将下一帧的目标对应的检测区域的图像输入目标检测模型,得到目标检测模型输出的一个或多个边界框的位置信息;在一个或多个边界框为一个的情况下,将边界框中的图像确定为目标;在一个或多个边界框为多个的情况下,针对每个边界框,将边界框的位置信息和当前帧中目标的位置信息进行比对,在该边界框的位置信息和当前帧中该目标的位置信息的差距小于第二预设 差距的情况下,将该边界框中的图像确定为该目标。
第二检测模块540,用于在前一帧中目标的位置信息和当前帧中目标的位置信息的差距大于第一预设差距的情况下,在下一帧的全局图像中检测目标。
关联模块550,用于将下一帧中检测到的目标与当前帧中目标的信息进行关联,以便对目标进行跟踪。
本公开的实施例中的目标跟踪装置可各由各种计算设备或计算机系统来实现,下面结合图6以及图7进行描述。
图6为本公开目标跟踪装置的一些实施例的结构图。如图6所示,该实施例的装置60包括:存储器610以及耦接至该存储器610的处理器620,处理器620被配置为基于存储在存储器610中的指令,执行本公开中任意一些实施例中的目标跟踪方法。
其中,存储器610例如可以包括系统存储器、固定非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。
图7为本公开目标跟踪装置的另一些实施例的结构图。如图7所示,该实施例的装置70包括:存储器710以及处理器720,分别与存储器610以及处理器620类似。还可以包括输入输出接口730、网络接口740、存储接口750等。这些接口730,740,750以及存储器710和处理器720之间例如可以通过总线760连接。其中,输入输出接口730为显示器、鼠标、键盘、触摸屏等输入输出设备提供连接接口。网络接口740为各种联网设备提供连接接口,例如可以连接到数据库服务器或者云端存储服务器等。存储接口750为SD卡、U盘等外置存储设备提供连接接口。
本领域内的技术人员应当明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用非瞬时性存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解为可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的 处理器执行的指令产生被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供被配置为实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的较佳实施例,并不用以限制本公开,凡在本公开的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (20)

  1. 一种目标跟踪方法,包括:
    获取视频的当前帧中目标的位置信息;
    根据所述当前帧中所述目标的位置信息确定所述视频的下一帧中所述目标对应的检测区域,所述目标对应的检测区域属于所述下一帧的全局图像的一部分;
    在所述下一帧的所述目标对应的检测区域中检测所述目标;
    将所述下一帧中检测到的所述目标与所述当前帧中所述目标的信息进行关联,以便对所述目标进行跟踪。
  2. 根据权利要求1所述的目标跟踪方法,其中,所述目标的位置信息包括所述目标的边界框的坐标信息,所述根据所述当前帧中所述目标的位置信息确定下一帧中所述目标对应的检测区域包括:
    根据所述当前帧中所述目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定所述当前帧中所述目标的边界框扩展后的坐标信息;
    根据所述当前帧中所述目标的边界框扩展后的坐标信息,在所述下一帧中将相同坐标信息所表示的区域作为所述目标对应的检测区域。
  3. 根据权利要求1所述的目标跟踪方法,其中,所述根据所述当前帧中所述目标的位置信息确定下一帧中所述目标对应的检测区域包括:
    确定所述当前帧的前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距;
    在所述前一帧中所述目标的位置信息和所述当前帧中目标的位置信息的差距小于或等于第一预设差距的情况下,根据所述当前帧中所述目标的位置信息确定下一帧中所述目标对应的检测区域。
  4. 根据权利要求3所述的目标跟踪方法,其中,所述目标的位置信息包括:所述目标的边界框的中心点坐标信息;
    所述前一帧中所述目标的位置信息和所述当前帧中目标的位置信息的差距为所述前一帧中所述目标的边界框的中心点和所述当前帧中目标的边界框的中心点的距 离。
  5. 根据权利要求1所述的目标跟踪方法,其中,在所述下一帧的所述目标对应的检测区域中检测所述目标包括:
    将所述下一帧的所述目标对应的检测区域的图像输入目标检测模型,得到所述目标检测模型输出的一个或多个边界框的位置信息;
    在所述一个或多个边界框为一个的情况下,将所述边界框中的图像确定为所述目标;
    在所述一个或多个边界框为多个的情况下,针对每个边界框,将所述边界框的位置信息和所述当前帧中所述目标的位置信息进行比对,在该边界框的位置信息和所述当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
  6. 一种目标跟踪方法,包括:
    获取视频的当前帧中目标的位置信息;
    确定所述当前帧的前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距;
    在所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距小于或等于第一预设差距的情况下,根据所述当前帧中所述目标的位置信息确定下一帧中所述目标对应的检测区域,所述目标对应的检测区域属于所述下一帧的全局图像的一部分,在所述下一帧的所述目标对应的检测区域中检测所述目标;
    在所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距大于第一预设差距的情况下,在所述下一帧的全局图像中检测所述目标;
    将所述下一帧中检测到的所述目标与所述当前帧中所述目标的信息进行关联,以便对所述目标进行跟踪。
  7. 根据权利要求6所述的目标跟踪方法,其中,所述目标的位置信息包括所述目标的边界框的坐标信息,所述根据所述当前帧中所述目标的位置信息确定下一帧中所述目标对应的检测区域包括:
    根据所述当前帧中所述目标的边界框的坐标信息,预设扩展长度和预设扩展宽度, 确定所述当前帧中所述目标的边界框扩展后的坐标信息;
    根据所述当前帧中所述目标的边界框扩展后的坐标信息,在所述下一帧中将相同坐标信息所表示的区域作为所述目标对应的检测区域。
  8. 根据权利要求6所述的目标跟踪方法,其中,所述目标的位置信息包括:所述目标的边界框的中心点坐标信息;
    所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距为所述前一帧中所述目标的边界框的中心点和所述当前帧中所述目标的边界框的中心点的距离。
  9. 根据权利要求6所述的目标跟踪方法,其中,在所述下一帧的所述目标对应的检测区域中检测所述目标包括:
    将所述下一帧的所述目标对应的检测区域的图像输入目标检测模型,得到所述目标检测模型输出的一个或多个边界框的位置信息;
    在所述一个或多个边界框为一个的情况下,将所述边界框中的图像确定为所述目标;
    在所述一个或多个边界框为多个的情况下,针对每个边界框,将所述边界框的位置信息和所述当前帧中所述目标的位置信息进行比对,在该边界框的位置信息和所述当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
  10. 一种目标跟踪装置,包括:
    信息获取模块,用于获取视频的当前帧中目标的位置信息;
    检测区域确定模块,用于根据所述当前帧中所述目标的位置信息确定所述视频的下一帧中所述目标对应的检测区域,所述目标对应的检测区域属于所述下一帧的全局图像的一部分;
    目标检测模块,用于在所述下一帧的所述目标对应的检测区域中检测所述目标;
    信息关联模块,用于将所述下一帧中检测到的所述目标与所述当前帧中所述目标的信息进行关联,以便对所述目标进行跟踪。
  11. 根据权利要求10所述的目标跟踪装置,其中,所述目标的位置信息包括所述目标的边界框的坐标信息;
    所述检测区域确定模块用于根据所述当前帧中所述目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定所述当前帧中所述目标的边界框扩展后的坐标信息,根据所述当前帧中所述目标的边界框扩展后的坐标信息,在所述下一帧中将相同坐标信息所表示的区域作为所述目标对应的检测区域。
  12. 根据权利要求10所述的目标跟踪装置,其中,
    所述检测区域确定模块用于确定所述当前帧的前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距,在所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距小于或等于第一预设差距的情况下,根据所述当前帧中所述目标的位置信息确定下一帧中所述目标对应的检测区域。
  13. 根据权利要求12所述的目标跟踪装置,其中,所述目标的位置信息包括:所述目标的边界框的中心点坐标信息;
    所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距为所述前一帧中所述目标的边界框的中心点和所述当前帧中所述目标的边界框的中心点的距离。
  14. 根据权利要求10所述的目标跟踪装置,其中,
    所述目标检测模块用于将所述下一帧的所述目标对应的检测区域的图像输入目标检测模型,得到所述目标检测模型输出的一个或多个边界框的位置信息;在所述一个或多个边界框为一个的情况下,将所述边界框中的图像确定为所述目标;在所述一个或多个边界框为多个的情况下,针对每个边界框,将所述边界框的位置信息和所述当前帧中所述目标的位置信息进行比对,在该边界框的位置信息和所述当前帧中该目标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
  15. 一种目标跟踪装置,包括:
    获取模块,用于获取视频的当前帧中目标的位置信息;
    差距确定模块,用于确定所述当前帧的前一帧中所述目标的位置信息和所述当前 帧中所述目标的位置信息的差距;
    第一检测模块,用于在所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距小于或等于第一预设差距的情况下,根据所述当前帧中所述目标的位置信息确定下一帧中所述目标对应的检测区域,所述目标对应的检测区域属于所述下一帧的全局图像的一部分,在所述下一帧的所述目标对应的检测区域中检测所述目标;
    第二检测模块,用于在所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距大于第一预设差距的情况下,在所述下一帧的全局图像中检测所述目标;
    关联模块,用于将所述下一帧中检测到的所述目标与所述当前帧中所述目标的信息进行关联,以便对所述目标进行跟踪。
  16. 根据权利要求15所述的目标跟踪装置,其中,
    所述第一检测模块用于根据所述当前帧中所述目标的边界框的坐标信息,预设扩展长度和预设扩展宽度,确定所述当前帧中所述目标的边界框扩展后的坐标信息;根据所述当前帧中所述目标的边界框扩展后的坐标信息,在所述下一帧中将相同坐标信息所表示的区域作为所述目标对应的检测区域。
  17. 根据权利要求15所述的目标跟踪装置,其中,所述目标的边界框的中心点坐标信息;
    所述前一帧中所述目标的位置信息和所述当前帧中所述目标的位置信息的差距为所述前一帧中所述目标的边界框的中心点和所述当前帧中所述目标的边界框的中心点的距离。
  18. 根据权利要求15所述的目标跟踪装置,其中,
    所述第一检测模块用于将所述下一帧的所述目标对应的检测区域的图像输入目标检测模型,得到所述目标检测模型输出的一个或多个边界框的位置信息;在所述一个或多个边界框为一个的情况下,将所述边界框中的图像确定为所述目标;在所述一个或多个边界框为多个的情况下,针对每个边界框,将所述边界框的位置信息和所述当前帧中所述目标的位置信息进行比对,在该边界框的位置信息和所述当前帧中该目 标的位置信息的差距小于第二预设差距的情况下,将该边界框中的图像确定为该目标。
  19. 一种目标跟踪装置,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行如权利要求1-9任一项所述的目标跟踪方法。
  20. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现权利要求1-9任一项所述方法的步骤。
PCT/CN2020/092556 2019-08-27 2020-05-27 目标跟踪方法、装置和计算机可读存储介质 WO2021036373A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910794278.3 2019-08-27
CN201910794278.3A CN111798487A (zh) 2019-08-27 2019-08-27 目标跟踪方法、装置和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021036373A1 true WO2021036373A1 (zh) 2021-03-04

Family

ID=72805439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092556 WO2021036373A1 (zh) 2019-08-27 2020-05-27 目标跟踪方法、装置和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111798487A (zh)
WO (1) WO2021036373A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347955A (zh) * 2020-11-12 2021-02-09 上海影卓信息科技有限公司 视频中基于帧预测的物体快速识别方法、系统及介质
CN113706555A (zh) * 2021-08-12 2021-11-26 北京达佳互联信息技术有限公司 一种视频帧处理方法、装置、电子设备及存储介质
CN113808162B (zh) * 2021-08-26 2024-01-23 中国人民解放军军事科学院军事医学研究院 目标跟踪方法、装置、电子设备及存储介质
CN113689460B (zh) * 2021-09-02 2023-12-15 广州市奥威亚电子科技有限公司 视频目标对象跟踪检测方法、装置、设备及存储介质
CN114220163B (zh) * 2021-11-18 2023-01-06 北京百度网讯科技有限公司 人体姿态估计方法、装置、电子设备及存储介质
CN114554300B (zh) * 2022-02-28 2024-05-07 合肥高维数据技术有限公司 基于特定目标的视频水印嵌入方法
CN116883915B (zh) * 2023-09-06 2023-11-21 常州星宇车灯股份有限公司 一种基于前后帧图像关联的目标检测方法及检测系统

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103584888A (zh) * 2013-12-02 2014-02-19 深圳市恩普电子技术有限公司 超声目标运动追踪方法
CN105654512A (zh) * 2015-12-29 2016-06-08 深圳羚羊微服机器人科技有限公司 一种目标跟踪方法和装置
CN107103268A (zh) * 2016-02-23 2017-08-29 中国移动通信集团浙江有限公司 一种目标跟踪方法和装置
CN107886048A (zh) * 2017-10-13 2018-04-06 西安天和防务技术股份有限公司 目标跟踪方法及系统、存储介质及电子终端
US20180130216A1 (en) * 2016-11-07 2018-05-10 Nec Laboratories America, Inc. Surveillance system using deep network flow for multi-object tracking
CN108805901A (zh) * 2018-05-04 2018-11-13 北京航空航天大学 一种基于多核dsp的视觉目标快速检测跟踪并行计算及融合方法
CN109671103A (zh) * 2018-12-12 2019-04-23 易视腾科技股份有限公司 目标跟踪方法及装置
CN109840919A (zh) * 2019-01-21 2019-06-04 长安大学 一种基于tld改进的跟踪方法
CN110188719A (zh) * 2019-06-04 2019-08-30 北京字节跳动网络技术有限公司 目标跟踪方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250863B (zh) * 2016-08-09 2019-07-26 北京旷视科技有限公司 对象追踪方法和装置
CN107292911B (zh) * 2017-05-23 2021-03-30 南京邮电大学 一种基于多模型融合和数据关联的多目标跟踪方法
CN109754412B (zh) * 2017-11-07 2021-10-01 北京京东乾石科技有限公司 目标跟踪方法、目标跟踪装置及计算机可读存储介质
CN108960090B (zh) * 2018-06-20 2023-05-30 腾讯科技(深圳)有限公司 视频图像处理方法及装置、计算机可读介质和电子设备
CN109325961B (zh) * 2018-08-27 2021-07-09 北京悦图数据科技发展有限公司 无人机视频多目标跟踪方法及装置
CN109635657B (zh) * 2018-11-12 2023-01-06 平安科技(深圳)有限公司 目标跟踪方法、装置、设备及存储介质

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103584888A (zh) * 2013-12-02 2014-02-19 深圳市恩普电子技术有限公司 超声目标运动追踪方法
CN105654512A (zh) * 2015-12-29 2016-06-08 深圳羚羊微服机器人科技有限公司 一种目标跟踪方法和装置
CN107103268A (zh) * 2016-02-23 2017-08-29 中国移动通信集团浙江有限公司 一种目标跟踪方法和装置
US20180130216A1 (en) * 2016-11-07 2018-05-10 Nec Laboratories America, Inc. Surveillance system using deep network flow for multi-object tracking
CN107886048A (zh) * 2017-10-13 2018-04-06 西安天和防务技术股份有限公司 目标跟踪方法及系统、存储介质及电子终端
CN108805901A (zh) * 2018-05-04 2018-11-13 北京航空航天大学 一种基于多核dsp的视觉目标快速检测跟踪并行计算及融合方法
CN109671103A (zh) * 2018-12-12 2019-04-23 易视腾科技股份有限公司 目标跟踪方法及装置
CN109840919A (zh) * 2019-01-21 2019-06-04 长安大学 一种基于tld改进的跟踪方法
CN110188719A (zh) * 2019-06-04 2019-08-30 北京字节跳动网络技术有限公司 目标跟踪方法和装置

Also Published As

Publication number Publication date
CN111798487A (zh) 2020-10-20

Similar Documents

Publication Publication Date Title
WO2021036373A1 (zh) 目标跟踪方法、装置和计算机可读存储介质
US11668571B2 (en) Simultaneous localization and mapping (SLAM) using dual event cameras
WO2021139484A1 (zh) 目标跟踪方法、装置、电子设备及存储介质
US9672634B2 (en) System and a method for tracking objects
US11037325B2 (en) Information processing apparatus and method of controlling the same
CN109543641B (zh) 一种实时视频的多目标去重方法、终端设备及存储介质
US9582711B2 (en) Robot cleaner, apparatus and method for recognizing gesture
Wang et al. Point cloud and visual feature-based tracking method for an augmented reality-aided mechanical assembly system
JP6454984B2 (ja) 深度画像に基づく手の位置確定方法と装置
JP2016099982A (ja) 行動認識装置、行動学習装置、方法、及びプログラム
WO2019057197A1 (zh) 运动目标的视觉跟踪方法、装置、电子设备及存储介质
JP6487642B2 (ja) 手指形状の検出方法、そのプログラム、そのプログラムの記憶媒体、及び、手指の形状を検出するシステム。
EP4158528A1 (en) Tracking multiple objects in a video stream using occlusion-aware single-object tracking
CN111382637A (zh) 行人检测跟踪方法、装置、终端设备及介质
CN111161325A (zh) 基于卡尔曼滤波与lstm的三维多目标跟踪方法
Wang et al. Immersive human–computer interactive virtual environment using large-scale display system
Kerdvibulvech Hand tracking by extending distance transform and hand model in real-time
CN111986229A (zh) 视频目标检测方法、装置及计算机系统
US11314968B2 (en) Information processing apparatus, control method, and program
Wang et al. Online adaptive multiple pedestrian tracking in monocular surveillance video
Bhuvaneswari et al. TRACKING MANUALLY SELECTED OBJECT IN VIDEOS USING COLOR HISTOGRAM MATCHING.
CN112230801A (zh) 应用于触摸轨迹的卡尔曼平滑处理方法、存储器及设备
JP2021077177A (ja) 動作認識装置、動作認識方法及び動作認識プログラム
CN111539995B (zh) 一种基于特征点轨迹的多目标跟踪方法
Starkov et al. Moving Object Detection in Video Streams Received from a Moving Camera.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20856351

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.06.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20856351

Country of ref document: EP

Kind code of ref document: A1