CN113112526B

CN113112526B - Target tracking method, device, equipment and medium

Info

Publication number: CN113112526B
Application number: CN202110462193.2A
Authority: CN
Inventors: 路金诚; 张伟; 谭啸; 孙昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2023-09-22
Anticipated expiration: 2041-04-27
Also published as: WO2022227771A1; CN113112526A

Abstract

The disclosure provides a target tracking method, a device, equipment and a medium, relates to the field of artificial intelligence, in particular to a computer vision and deep learning technology, and can be applied to intelligent traffic or smart city scenes. The implementation scheme is as follows: matching the tracking target with at least one detection target; and in response to determining that the at least one detected target does not include a detected target that matches at least one of the at least one portion of tracked targets, performing the following matching operation for each of the at least one tracked target: acquiring motion parameters of a tracking target corresponding to a history matching video frame; determining a predicted position parameter of a tracking target in a current video frame to be detected based on the motion parameter; and determining whether at least a portion of the detected targets include detected targets that match the tracked target based at least on the predicted position parameters of the tracked target and the respective position parameters of at least a portion of the detected targets.

Description

Target tracking method, device, equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to computer vision and deep learning techniques, applicable in intelligent traffic or smart city scenarios, and in particular to a target tracking method, apparatus, electronic device, computer readable storage medium and computer program product.

Background

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. The artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

Along with the continuous rising of the total number of expressway vehicles and the total mileage of the highway in China, the road traffic management faces new challenges, the conventional deployment of expressway cameras is spaced 3-5 km apart, the full coverage of the road area cannot be ensured, and the problems of fixed field of view, high cost of manpower and material resources and the like exist. The unmanned aerial vehicle, the motorcycle and other vehicles have the characteristics of strong maneuverability, large visual field and flexible deployment, and the installation of the monitoring cameras on the vehicles makes up the defects of the traditional video monitoring, thereby having positive effects on establishing an omnibearing, three-dimensional and intuitive monitoring system, realizing intelligent traffic management and improving the response speed of coping with emergencies.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.

Disclosure of Invention

The present disclosure provides a target tracking method, apparatus, electronic device, computer-readable storage medium, and computer program product.

According to an aspect of the present disclosure, there is provided a target tracking method including: acquiring at least one detection target in a current video frame to be detected; for each of at least a portion of the one or more tracked targets, matching the tracked target with each of the at least one detected target to determine whether the at least one detected target includes a detected target that matches the tracked target; and in response to determining that the at least one detected target does not include a detected target that matches at least one of the at least one portion of tracked targets, performing the following matching operation for each of the at least one tracked target: acquiring a history matching video frame comprising a tracking target, and moving parameters of the tracking target corresponding to the history matching video frame; determining a predicted position parameter of a tracking target in a current video frame to be detected based on the motion parameter; and determining whether at least a portion of the detection targets include detection targets matching the tracking target based at least on the predicted position parameters of the tracking target in the current video frame to be detected and the respective position parameters of at least a portion of the detection targets in the at least one detection target.

According to another aspect of the present disclosure, there is provided a target tracking apparatus including: an acquisition unit configured to acquire at least one detection target in a current video frame to be detected; a first matching unit configured to match, for each of at least a portion of the one or more tracking targets, the tracking target with each of the at least one detection target to determine whether the at least one detection target includes a detection target that matches the tracking target; a second matching unit configured to perform a matching operation on each of the at least one tracking target in response to determining that the at least one detection target does not include a detection target matching at least one of the at least one part of the tracking targets, wherein the second matching unit includes: a first acquisition subunit configured to acquire a history matching video frame including a tracking target, and a motion parameter of the tracking target corresponding to the history matching video frame; a first determining subunit configured to determine, based on the motion parameter, a predicted position parameter of the tracking target in the current video frame to be detected; and a second determination subunit configured to determine whether at least a part of the detection targets include detection targets matching the tracking target based on at least the predicted position parameter of the tracking target in the current video frame to be detected and the respective position parameters of at least a part of the detection targets.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the target tracking method described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the above-described target tracking method.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the above-described object tracking method.

According to one or more embodiments of the present disclosure, first stage matching is performed on one or more obtained tracking targets and detection targets in a current video frame to be detected, and then second stage matching is performed on tracking targets which are not successfully matched with any detection target in the matching results, so as to obtain a final matching result. The second stage matching is performed based on the position parameters of the tracking target in the current video frame to be detected and the position parameters of the detection target, which are obtained according to the motion parameters of the tracking target corresponding to the history matching video frame. Therefore, the tracking target and the detection target are subjected to two-stage matching based on different matching mechanisms, so that the tracking target and the detection target which are not successfully matched in the first stage can be further matched through the second stage matching, and the accuracy of matching the tracking target and the detection target is improved. In addition, the predicted position parameter of the tracking target in the current video frame to be detected is predicted by using the motion parameter of the tracking target in the history matching video frame, so that the second-stage matching is performed based on the predicted position parameter of the tracking target and the position parameter of the detection target, the matching between the tracking target and the detection target can be performed based on more sufficient information, and the accuracy of multi-target tracking can be further improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a flow chart of a target tracking method according to an exemplary embodiment of the present disclosure;

2 a-2 b illustrate application scenario diagrams according to exemplary embodiments of the present disclosure;

FIG. 3 illustrates a flow chart of a first stage matching according to an exemplary embodiment of the present disclosure;

FIG. 4 illustrates a flowchart of a target tracking method according to an exemplary embodiment of the present disclosure;

FIG. 5 illustrates a flow chart of a second stage matching according to an exemplary embodiment of the present disclosure;

FIG. 6 shows a block diagram of a target tracking device according to an embodiment of the present disclosure; and

Fig. 7 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.

The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.

In the related art, parameters such as a position of a target in a previous frame and a subsequent frame, a position of a detection frame where the target is detected, and an overlap ratio (cross-over ratio) are calculated to complete tracking of the target. When there is rapid movement between the camera and the target, for example, the unmanned aerial vehicle on-board camera shooting scene and the vehicle-mounted camera shooting scene, it is difficult to accurately correlate the targets of the front frame and the rear frame due to the large uncertainty of the displacement between the target frames.

In order to solve the above problems, the present disclosure first performs a first stage matching on one or more obtained tracking targets and a detection target in a current video frame to be detected, and then performs a second stage matching on a tracking target that is not successfully matched with any detection target in the matching result, so as to obtain a final matching result. The second stage matching is performed based on the position parameters of the tracking target in the current video frame to be detected and the position parameters of the detection target, which are obtained according to the motion parameters of the tracking target corresponding to the history matching video frame. Therefore, the tracking target and the detection target are subjected to two-stage matching based on different matching mechanisms, so that the tracking target and the detection target which are not successfully matched in the first stage can be further matched through the second stage matching, and the accuracy of matching the tracking target and the detection target is improved. In addition, the predicted position parameter of the tracking target in the current video frame to be detected is predicted by using the motion parameter of the tracking target in the history matching video frame, so that the second-stage matching is performed based on the predicted position parameter of the tracking target and the position parameter of the detection target, the matching between the tracking target and the detection target can be performed based on more sufficient information, and the accuracy of multi-target tracking can be further improved.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

According to an aspect of the present disclosure, a target tracking method is provided. As shown in fig. 1, the target tracking method may include: step S101, at least one detection target in a current video frame to be detected is obtained; step S102, for each of at least a portion of the one or more tracking targets, matching the tracking target with each of the at least one detection target to determine whether the at least one detection target includes a detection target matched with the tracking target; and step S103 of performing, for each of the at least one tracking targets, the following matching operation in response to determining that the at least one detection target does not include a detection target that matches at least one tracking target of the at least one part of tracking targets: step S104, acquiring a history matching video frame comprising the tracking target, and moving parameters of the tracking target corresponding to the history matching video frame; step S105, determining a predicted position parameter of the tracking target in the current video frame to be detected based on the motion parameter; and step S106, determining whether at least one part of the detection targets comprise detection targets matched with the tracking target at least based on the predicted position parameters of the tracking target in the current video frame to be detected and the position parameters corresponding to at least one part of the detection targets. Therefore, the tracking target and the detection target are subjected to two-stage matching based on different matching mechanisms, so that the tracking target and the detection target which are not successfully matched in the first stage can be further matched through the second stage matching, and the accuracy of matching the tracking target and the detection target is improved. In addition, the predicted position parameter of the tracking target in the current video frame to be detected is predicted by using the motion parameter of the tracking target in the history matching video frame, so that the second-stage matching is performed based on the predicted position parameter of the tracking target and the position parameter of the detection target, the matching between the tracking target and the detection target can be performed based on more sufficient information, and the accuracy of multi-target tracking can be further improved.

According to some embodiments, the current video frame to be detected may be taken by, for example, a camera onboard the drone. Through using unmanned aerial vehicle machine-mounted camera, can cover the region that fixed camera was difficult to shoot better to realize the target tracking task in a plurality of fields such as wisdom security protection, rescue and relief work, behavioral analysis, wisdom traffic, wisdom city.

According to some embodiments, the detection target may be a vehicle. Therefore, the unmanned aerial vehicle is used for monitoring the road vehicles, real-time road inspection is realized, and an image sequence or video image can be acquired on site at the first time when traffic accidents or other emergency situations occur, so that the target tracking task of a plurality of vehicles is completed.

Fig. 2a and 2b illustrate exemplary application scenarios of the present disclosure, such as two video frames that may be continuously captured by a drone while traveling over a screen, according to some embodiments. Wherein three cars 18, 33 and 12 are detected in fig. 2a, while the portion of the head of one car to the left has entered the frame, but is not detected; fig. 2b detects four cars, in addition to 18, 33 and 12, there is a car 25 just exposing only the head part. It can be seen from this scenario that when the drone travels in the same direction as the two vehicles (33, 12) on the right side of the screen, the position of the two vehicles between the two frames changes less, while the relative speed of the two vehicles (18, 25) facing each other and the drone is greater, so that the position thereof between the two frames changes equally much.

According to some embodiments, the at least one detection target may be obtained by target detection of a current video frame to be detected. For example, the target detection can be performed on the current video frame to be detected by using a trained deep learning neural network, so that at least one vehicle in the current video frame to be detected has a detection frame corresponding to each vehicle.

According to some embodiments, the one or more tracking targets may be, for example, vehicles detected in a historical video frame, for example, vehicles detected in a previous frame, vehicles that appear in a plurality of historical video frames but are not detected in a previous frame, or preset vehicles that are main tracking targets, which is not limited herein.

According to some embodiments, step S102, for each of at least a portion of the one or more tracking targets, matches the tracking target with each of the at least one detection target to determine whether the at least one detection target includes a detection target that matches the tracking target may be, for example, cascade matching based on the ReID features of the detection target and the ReID features of the tracking target, thereby obtaining a cascade matching result of the detection target and the tracking target. The cascade matching results may include, for example, associated tracking, unassociated tracking, and unassociated detection, corresponding to a successfully matched tracking target-detection target pair, an unsuccessfully matched tracking target, and an unsuccessfully matched detection target, respectively.

According to some embodiments, at least a portion of the tracked objects may be confirmed tracked objects, and the one or more tracked objects may further include unconfirmed tracked objects. The unacknowledged tracked object may be, for example, a tracked object that appears continuously less than a preset number in the historical video frame, and the acknowledged tracked object may be, for example, a tracked object that appears continuously more than the preset number in the historical video frame and appears less than a preset duration from the last time. The preset number and the preset time length can be set according to the requirements, the shorter the preset number and the preset time length, the fewer the historical tracking targets are reserved in the model, the fewer the tracking targets to be matched are, the better the performance of the model is, but the accuracy of the corresponding model is reduced; the longer the preset number and the preset duration are, the more historical tracking targets are reserved in the model, the more tracking targets need to be matched, the accuracy of the model is improved, and the performance of the model is reduced. The preset number may be, for example, 1, 3, 5 or other numbers, and the preset duration may be, for example, 1 frame, 3 frames, 5 frames or other frames, which are not limited herein.

According to some embodiments, as shown in fig. 3, step S102, for each of at least a portion of the one or more tracking targets, matching the tracking target with each of the at least one detection target to determine whether the at least one detection target includes a detection target that matches the tracking target may include: step S1021, obtaining a color histogram of each tracking target; step S1022, obtaining a color histogram of each detection target; and step S1023, matching each tracking target with each detection target of the at least one detection targets based on the color histogram of each tracking target and the color histogram of each detection target to determine whether the at least one detection target includes a detection target matched with the tracking target. Therefore, the color histogram is used as a matching basis for matching the tracking target and the detection target, so that the performance of the model can be greatly improved on the basis of keeping high accuracy, and the overall performance of the model is improved.

A color histogram is a statistical table reflecting the distribution of the colors of pixels of an image or an area in an image. The abscissa thereof represents a different gray scale or a different color, and the ordinate thereof represents the gray scale or the number of pixels or the number of pixel duty ratio corresponding to the color. The calculation cost of the color histogram is low, and the method has the advantages of invariance in translation, rotation and scaling, and the color histogram cannot be changed when the vehicle moves, turns and the like in the unmanned aerial vehicle road inspection scene. Illustratively, using the 24-dimensional color histogram feature of the detection box region as the ReID feature of the vehicle, the calculation method is as follows:

where P is a color histogram feature, ci represents the number of pixels of color i, and pi represents the number of pixels of color i.

According to some embodiments, step S1023, matching each tracking target with each of the at least one detection target based on the color histogram of each tracking target and the color histogram of each detection target to determine whether the at least one detection target includes a detection target that matches the tracking target may include: and calculating a cosine distance cost matrix of the color histogram of each tracking target and the color histogram of each detection target, and determining whether at least one detection target comprises a detection target matched with the tracking target or not based on the cost matrix. For example, a cost matrix term less than a threshold may be considered to be successful in detecting and tracking target associations. Therefore, the cost matrix aiming at each tracking target and each detection target can be obtained by calculating the cosine distance between the color histogram of each tracking target and the color histogram of each detection target, so that the matching relation between the tracking targets and the detection targets can be determined, and the performance requirements of the unmanned aerial vehicle under the road inspection scene can be met on the basis of ensuring the accuracy of a matching model by using the cosine distance and calculating the cost matrix.

According to some embodiments, as shown in fig. 4, the target tracking method further comprises: step S403, in response to determining that the at least one detection target includes a detection target matching one of at least a portion of the tracking targets, updates the tracking target based on the detection target. Step S401 to step S402 and step S404 in fig. 4 are similar to step S101 to step S103 in fig. 1, and are not described here. Therefore, the real-time updating of the tracking target is realized by updating the associated tracking target after the first-stage matching. The updating of the tracking target may include, for example, updating the tracking target ReID based on the ReID of the detection target, and may include updating the detection frame position, the detection frame width height, the movement direction, the movement speed, and other movement parameters of the tracking target in the current video frame, and so on. It will be appreciated that the predicted position parameter of the tracking target in the next frame may be determined immediately after the tracking target and the detection target are successfully associated, or the relevant parameter of the tracking target may be updated when the detection target in the next frame is matched, which is not limited herein.

According to some embodiments, step S103, in response to determining that the at least one detected target does not include a detected target matching at least one of the at least a portion of the tracked targets, performs a matching operation on each of the at least one tracked target. Wherein at least one tracking target may be an unassociated tracking that is not successfully matched with the detection target after cascade matching, so that a second stage of matching may be performed on these unassociated tracking.

According to some embodiments, as shown in fig. 4, the target tracking method may further include: step S405, a second stage matching operation is directly performed on the unacknowledged tracking target. Therefore, unacknowledged tracking is only subjected to matching once at most, namely, second-stage matching, so that excessive consumption of a large amount of computing resources caused by excessive tracking targets participating in the first-stage matching is avoided, and the performance of a target tracking model is further improved.

According to some embodiments, step S104, a history matching video frame including a tracking target is acquired, and the tracking target corresponds to a motion parameter of the history matching video frame. These history matching video frames may be the last frame in which the tracking target appears, or may be a plurality of history video frames, which is not limited herein. The motion parameters of the tracking target corresponding to the history matching video frame may include a motion direction, a motion speed, etc. of the tracking target in the history matching video frame. In addition, a detection frame of the tracking target in the history matching video frame can be obtained, wherein the detection frame comprises parameters such as the center position of the detection frame, the width and the height of the detection frame and the like.

According to some embodiments, in step S105, based on the motion parameter, the predicted position parameter of the tracking target in the current video frame to be detected is determined, and the predicted position parameter of the tracking target in the current video frame to be detected may be updated, for example, using a kalman filtering method. The predicted position parameters may include, for example, parameters such as a predicted center point position of a detection frame of the tracking target in the current video frame to be detected, a predicted width and height of the detection frame, and the like.

According to some embodiments, step S106 may be, for example: and determining whether at least one part of detection targets comprise detection targets matched with the tracking targets based on the motion parameters of the tracking targets corresponding to the history matched video frames, the predicted position parameters of the tracking targets in the current video frame to be detected and the respective corresponding position parameters of at least one part of detection targets in at least one detection target in the current video frame to be detected. Therefore, the motion parameters of the tracking target can be used as the basis for matching, so that the matching accuracy is further improved, and the problem of incorrect matching caused by the fact that the positions of the front frames and the rear frames of vehicles in different lanes are similar is avoided.

According to some embodiments, determining whether at least a portion of the detection targets include detection targets that match the tracking targets may include: determining a center point weighted distance between each detection target in at least a part of detection targets and the tracking target, wherein the center point weighted distance is calculated based on a movement direction of the tracking target, a connecting line direction of the tracking target and the detection target, a distance between the tracking target and the detection target and a preset weight; and determining whether the at least a portion of the detection targets include detection targets that match the tracking target based on the weighted distance of each of the at least a portion of the detection targets from the tracking target center point. Therefore, by using the matching method based on the weighted distance of the center point, the weighted distance of the center point calculated based on the moving direction of the tracking target, the connecting line direction of the tracking target and the detecting target, the distance between the tracking target and the detecting target and the preset weight can be used as the cost of matching, so that more specific and more logic matching basis is provided, and the matching accuracy is further improved. Illustratively, the center point weighted distance may be calculated by the following formula:

d＝(w ₀ +cosθ)d ₀

Wherein d is ₀ In order to detect the Euclidean distance between the center points of the target and the tracking target, cos theta is the included angle between the connecting line of the center point and the moving direction of the vehicle, and w ₀ Is the base weight coefficient.

According to some embodiments, as shown in fig. 5, the matching operation may further include: step S504, in response to determining that at least a portion of the detection targets include detection targets matching one of the at least one tracking target, updates the tracking target based on the detection targets. Step S501 to step S503 in fig. 5 are similar to step S104 to step S106 in fig. 1, and are not described here. Therefore, after the detection target and the tracking target are successfully matched, the tracking target is updated based on the detection target, so that the real-time updating of the tracking target is further realized.

According to some embodiments, updating the tracking target based on the detection target may include: determining the current video frame to be detected as a history matching video frame of the tracking target; and in response to determining that the tracking target is an unacknowledged tracking target, and the tracking target is detected in a plurality of consecutive history matching video frames greater than a preset number including the current video frame to be detected, updating the tracking target to an acknowledged tracking target. It will be appreciated that the update of the associated tracking target after the second stage matching is similar to the update of the associated tracking target after the first stage matching, and will not be described herein.

According to some embodiments, as shown in fig. 5, the matching operation may further include: step S505, in response to determining that one or more detection targets that do not match each of the at least one tracking target are included in at least a portion of the detection targets, determines the one or more detection targets as unacknowledged tracking targets. Therefore, the detection targets which are not successfully matched after two-stage matching are set as unacknowledged tracking targets, so that the real-time updating of the tracking targets is realized, and each detection target detected in the current video frame to be detected can be used as the tracking target of the next frame.

According to some embodiments, as shown in fig. 5, the matching operation may further include: step S506, for any tracking target in the at least one part of tracking targets, in response to determining that at least one part of detection targets does not include the detection target matched with the tracking target, determining whether to delete the tracking target according to the non-updated time length of the tracking target. The non-updated time length is the time interval from the history matching video frame of the last detected tracking target to the current video frame to be detected. Therefore, for the confirmed tracking target, whether the tracking target is deleted or not is determined according to the non-updated time length of the tracking target, and it can be ensured that the tracking target does not comprise the long-term non-updated tracking target or the temporarily-appearing non-confirmed target, so that the real-time updating of the tracking target is further realized.

According to some embodiments, step S506 may include: deleting the tracking target in response to determining that at least a portion of the detection targets do not include detection targets matching the tracking target and that the tracking target is an unacknowledged tracking target; determining an unexplored time of the tracking target in response to determining that at least a portion of the detection targets do not include detection targets matching the tracking target and that the tracking target is a confirmed tracking target; deleting the tracking target in response to the non-updated time of the tracking target being greater than a preset duration; and updating the tracking target in response to the non-updated time of the tracking target being less than or equal to a preset duration. Therefore, the tracking target which is not successfully matched after two phases are matched can be realized through the steps, and the tracking target is deleted or updated based on whether the tracking target is confirmed or not and the time of the tracking target which is not updated, so that the performance of the model is improved, and the real-time updating of the tracking target is further realized.

According to another aspect of the present disclosure, there is also provided a target tracking apparatus. As shown in fig. 6, the object tracking device 600 may include: an acquiring unit 610 configured to acquire at least one detection target in a current video frame to be detected; a first matching unit 620 configured to match, for each of at least a portion of the one or more tracking targets, the tracking target with each of the at least one detection target to determine whether the at least one detection target includes a detection target that matches the tracking target; a second matching unit 630 configured to perform a matching operation on each of the at least one tracking target in response to determining that the at least one detection target does not include a detection target matching at least one tracking target of the at least one part of tracking targets, wherein the second matching unit 630 includes: a first acquisition subunit 631 configured to acquire history matching video frames including tracking targets, and motion parameters of the tracking targets corresponding to the history matching video frames; a first determining subunit 632 configured to determine, based on the motion parameter, a predicted position parameter of the tracking target in the current video frame to be detected; and a second determining subunit 633 configured to determine whether at least a part of the detection targets includes a detection target matching the tracking target based on at least the predicted position parameter of the tracking target in the current video frame to be detected and the respective position parameter of at least a part of the detection targets.

The operations of the units 610 to 630 and the sub units 631 to 633 of the object tracking apparatus 600 are similar to those of the steps S101 to S106 described above, and will not be described here.

According to embodiments of the present disclosure, there is also provided an electronic device, a readable storage medium and a computer program product.

Referring to fig. 7, a block diagram of an electronic device 600 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the device 700, the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the device 700 to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 1302.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as the target tracking method. For example, in some embodiments, the target tracking method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the object tracking method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the target tracking method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. A target tracking method, comprising:

acquiring at least one detection target in a current video frame to be detected;

for each of at least a portion of one or more tracked targets, matching the tracked target with each of the at least one detected target to determine whether the at least one detected target includes a detected target that matches the tracked target, wherein the at least a portion of tracked targets are confirmed tracked targets that characterize tracked targets that occur consecutively more than a preset number of times in a plurality of historical video frames and occur less than a preset length of time since last time, the one or more tracked targets further including unconfirmed tracked targets that characterize tracked targets that occur consecutively less than a preset number of times in the plurality of historical video frames;

In response to determining that the at least one detected target does not include a detected target that matches at least one of the at least a portion of tracked targets, performing the following matching operation for each of the at least one tracked target:

acquiring a history matching video frame comprising the tracking target, and moving parameters of the tracking target corresponding to the history matching video frame;

determining a predicted position parameter of the tracking target in the current video frame to be detected based on the motion parameter; and

determining whether the at least one part of detection targets comprise detection targets matched with the tracking targets based on at least the predicted position parameters of the tracking targets in the current video frame to be detected and the position parameters corresponding to at least one part of detection targets in the at least one detection target; and

and directly executing the matching operation on the unacknowledged tracking target.

2. The method of claim 1, wherein the matching operation further comprises:

in response to determining that the at least a portion of the detected targets include one or more detected targets that do not match each of the at least one tracked target, the one or more detected targets are determined to be unacknowledged tracked targets.

3. The method of claim 1, wherein determining whether the at least a portion of the detected targets includes a detected target that matches the tracked target is based on a motion parameter of the tracked target corresponding to the history matching video frame, a predicted position parameter of the tracked target in the current video frame to be detected, and a respective position parameter of at least a portion of the detected targets in the current video frame to be detected.

4. The method of claim 3, wherein determining whether the at least a portion of the detection targets include detection targets that match the tracking target comprises:

determining a center point weighted distance between each detection target in the at least one part of detection targets and the tracking target, wherein the center point weighted distance is calculated based on a movement direction of the tracking target, a connecting line direction of the tracking target and the detection target, a distance between the tracking target and the detection target and a preset weight; and

based on the weighted distance of each of the at least a portion of the detection targets from the tracking target center point, it is determined whether the at least a portion of the detection targets include detection targets that match the tracking target.

5. The method of claim 1, wherein the matching operation further comprises:

for any tracking target in the at least one part of tracking targets, in response to determining that the at least one part of detection targets do not comprise detection targets matched with the tracking targets, determining whether to delete the tracking targets according to the non-updated time length of the tracking targets, wherein the non-updated time length is the time interval from the history matching video frame where the tracking target is detected last time to the current video frame to be detected.

6. The method of claim 1, wherein the matching operation further comprises:

in response to determining that the at least a portion of the detected targets includes a detected target that matches one of the at least one tracked target, the tracked target is updated based on the detected target.

7. The method of claim 6, wherein updating the tracking target based on the detection target comprises:

determining the current video frame to be detected as a history matching video frame of the tracking target; and

in response to determining that the tracking target is an unacknowledged tracking target and the tracking target is detected in a plurality of consecutive history matching video frames greater than a preset number including the current video frame to be detected, updating the tracking target to an acknowledged tracking target.

8. The method of claim 1, further comprising:

in response to determining that the at least one detected target includes a detected target that matches one of the at least a portion of tracked targets, the tracked target is updated based on the detected target.

9. The method of claim 1, wherein for each of at least a portion of the one or more tracked objects, matching the tracked object with each of the at least one detected object to determine whether the at least one detected object includes a detected object that matches the tracked object comprises:

acquiring a color histogram of each tracking target;

acquiring a color histogram of each detection target; and

and matching the tracking target with each of the at least one detection target based on the color histogram of each tracking target and the color histogram of each detection target to determine whether the at least one detection target includes a detection target matched with the tracking target.

10. The method of claim 9, wherein matching the tracking object with each of the at least one detection object based on the color histogram of each tracking object and the color histogram of each detection object to determine whether the at least one detection object includes a detection object that matches the tracking object comprises:

And calculating a cosine distance cost matrix of the color histogram of each tracking target and the color histogram of each detection target, and determining whether the at least one detection target comprises a detection target matched with the tracking target or not based on the cost matrix.

11. The method of claim 1, wherein the at least one detection target is a target detection of the current video frame to be detected.

12. The method of claim 1, wherein the current video frame to be detected is captured by an onboard camera of a drone.

13. The method of claim 1, wherein the detection target is a vehicle.

14. An object tracking device comprising:

an acquisition unit configured to acquire at least one detection target in a current video frame to be detected;

a first matching unit configured to match, for each of at least a portion of one or more tracked targets, the tracked target with each of the at least one detected target to determine whether the at least one detected target includes a detected target that matches the tracked target, wherein the at least a portion of tracked targets are confirmed tracked targets that represent tracked targets that occur consecutively more than a preset number of times in a plurality of historical video frames and occur less than a preset length of time since last time, the one or more tracked targets further include unconfirmed tracked targets that represent tracked targets that occur consecutively less than a preset number of times in the plurality of historical video frames;

A second matching unit configured to perform a matching operation on each of the at least one tracking target in response to determining that the at least one detection target does not include a detection target matching at least one tracking target of the at least one part of tracking targets, wherein the second matching unit includes:

a first acquisition subunit configured to acquire a history matching video frame including the tracking target, and a motion parameter of the tracking target corresponding to the history matching video frame;

a first determining subunit configured to determine, based on the motion parameter, a predicted position parameter of the tracking target in the current video frame to be detected; and

a second determining subunit configured to determine, based at least on a predicted position parameter of the tracking target in the current video frame to be detected and a respective position parameter of at least a part of the at least one detection target, whether the at least part of the detection targets includes a detection target that matches the tracking target; and

and a third matching unit configured to directly perform the matching operation on the unacknowledged tracking target.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the method comprises the steps of

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.