CN113177968A

CN113177968A - Target tracking method and device, electronic equipment and storage medium

Info

Publication number: CN113177968A
Application number: CN202110462257.9A
Authority: CN
Inventors: 辛颖; 冯原; 李超; 张滨; 王云浩; 王晓迪; 谷祎; 龙翔; 彭岩; 郑弘晖; 贾壮; 韩树民
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-07-27

Abstract

The disclosure provides a target tracking method, a target tracking device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical fields of computer vision, deep learning and the like, and can be applied to intelligent cloud and safety inspection scenes. The specific implementation scheme is as follows: acquiring a current video frame; determining a detection frame from the current video frame, the detection frame comprising: a target to be tracked; determining a plurality of prediction tracking frames corresponding to a target to be tracked, wherein the plurality of prediction tracking frames respectively have a plurality of corresponding prediction tracking information; determining a target tracking frame from a plurality of prediction tracking frames according to the detection frame; and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to a plurality of pieces of predicted tracking information, and the accuracy and the efficiency of target tracking in the safety inspection scene can be effectively improved, so that the real-time requirement of target tracking in the safety inspection scene is effectively met.

Description

Target tracking method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision, deep learning and the like, can be applied to intelligent cloud and security inspection scenes, and particularly relates to a target tracking method and device, electronic equipment and a storage medium.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning technology, a deep learning technology, a big data processing technology, a knowledge map technology and the like.

With the continuous development of Artificial Intelligence (AI) technology, some production scenes are generally inspected safely by computer vision, deep learning and other technologies in the AI to identify whether there are illegal behaviors such as smoking, calling, putting up, falling down, throwing things and the like.

Disclosure of Invention

A target tracking method, an apparatus, an electronic device, a storage medium and a computer program product are provided.

According to a first aspect, there is provided a target tracking method, comprising: acquiring a current video frame; determining a detection frame from the current video frame, the detection frame comprising: a target to be tracked; determining a plurality of prediction tracking frames corresponding to a target to be tracked, wherein the plurality of prediction tracking frames respectively have a plurality of corresponding prediction tracking information; determining a target tracking frame from a plurality of prediction tracking frames according to the detection frame; and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to a plurality of pieces of predicted tracking information.

According to a second aspect, there is provided a target tracking apparatus comprising: the acquisition module is used for acquiring a current video frame; a first determining module for determining a detection frame from a current video frame, the detection frame comprising: a target to be tracked; the second determination module is used for determining a plurality of predicted tracking frames corresponding to the target to be tracked, and the plurality of predicted tracking frames are respectively provided with a plurality of corresponding predicted tracking information; the third determining module is used for determining a target tracking frame from the plurality of predicted tracking frames according to the detection frame; and the tracking module is used for tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to a plurality of pieces of predicted tracking information.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a target tracking method as set forth by embodiments of the present disclosure.

According to a fourth aspect, a non-transitory computer-readable storage medium is presented having stored thereon computer instructions for causing a computer to perform a target tracking method as presented by embodiments of the present disclosure.

According to a fifth aspect, a computer program product is presented, comprising a computer program which, when executed by a processor, implements the object tracking method presented by an embodiment of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic structural view of a YoloV3 model in the related art;

FIG. 3 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a schematic flow chart diagram of a target tracking method according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a fifth embodiment of the present disclosure;

FIG. 8 is a schematic diagram according to a sixth embodiment of the present disclosure;

FIG. 9 is a block diagram of an electronic device for implementing a target tracking method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram according to a first embodiment of the present disclosure.

It should be noted that an execution subject of the target tracking method of this embodiment is a target tracking device, the device may be implemented in a software and/or hardware manner, the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.

The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to the technical field of computer vision, deep learning and the like, and can be applied to intelligent cloud and safety inspection scenes to improve the accuracy and efficiency of target tracking in the safety inspection scene, so that the real-time requirement of target tracking in the safety inspection scene is effectively met.

Wherein, Artificial Intelligence (Artificial Intelligence), english is abbreviated as AI. The method is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

Deep learning is the intrinsic law and expression level of the learning sample data, and the information obtained in the learning process is very helpful for the interpretation of data such as characters, images and sounds. The final goal of deep learning is to make a machine capable of human-like analytical learning, and to recognize data such as characters, images, and sounds.

Computer vision means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eye observation or transmitted to an instrument for detection.

In a safety inspection scene, for example, in a safety operation production environment of a factory, inspection scenes such as helmet wearing detection, smoking detection, call making detection, and the like need to be performed on a worker, and it should be noted that generally, in this scene, target tracking performed by the worker is performed to ensure normal safety operation.

As shown in fig. 1, the target tracking method includes:

s101: and acquiring the current video frame.

A frame of video for which detection processing is currently performed may be referred to as a current video frame, where the current video frame belongs to a video.

That is to say, the acquired video may include one or more frames of video frames, and when the target tracking method is executed in the embodiment of the present disclosure, each frame in the video may be correspondingly processed to perform target tracking, so that a frame of video where detection processing is performed at a current time point may be referred to as a current video frame.

The video may be a video collected by a camera in a factory inspection scene, for example, a target tracking device may read a real-time video stream of each monitoring camera in the inspection scene, and extract each frame of video frame to obtain a current video frame.

In other embodiments, the video may be a video acquired in real time, or a video acquired at a fixed time, or may also be a recorded video acquired from a database, which is not limited in this respect.

In other embodiments, the videos may be collected by, for example and without limitation, various cameras (e.g., a head-mounted camera), an unmanned aerial vehicle, a robot, an AI chip, and any other possible monitoring device in a factory inspection scene, without limitation.

S102: determining a detection frame from the current video frame, the detection frame comprising: and (5) a target to be tracked.

After the current video frame is obtained, the target detection and identification can be performed on the current video frame, so that one or more detection frames are identified from the current video frame, and the detection frames comprise corresponding targets to be tracked.

That is to say, the embodiment of the present disclosure supports tracking one or more targets, and each of the one or more targets may be referred to as a target to be tracked, which is not limited herein.

The target to be tracked may be a target framed by a local image region carrying some specific behavior features.

For example, assuming that the whole-body image of the worker a is included in the current video frame, accordingly, if the worker a is subjected to the helmet wearing detection, the head framed in the head partial image of the worker a identified from the current video frame may be regarded as the target to be tracked.

For another example, if the current video frame includes a whole-body image of the worker a, correspondingly, if the worker a is subjected to smoking detection, the mouth framed in the mouth partial image of the worker a and the hand framed in the hand partial image identified in the current video frame may be both used as the target to be tracked, and so on, which is not limited thereto.

In other embodiments, any object (e.g., a person, a vehicle, an animal, and any other object) identified from the current video frame may also be referred to as an object to be tracked, without limitation, according to the need of the actual object tracking scene.

In combination with the above factory inspection scene, in the embodiment of the present disclosure, the target framed by the local image region carrying some specific behavior characteristics can be used as the target to be tracked by the staff in combination with the scene requirement of the safety inspection, and no limitation is imposed on the target.

The embodiments of the present disclosure may determine, from the current video frame, a detection frame containing an object to be tracked (i.e., the head, mouth, and hand of the worker), and the detection frame may be determined according to, for example, the size and the position of the object to be tracked in the current video frame, which is not limited thereto.

In some embodiments, a deep learning based object detection method may be employed to determine the detection box from the current video frame.

For example, a deep learning Two-Stage Object Detection method (Two-Stage) may be used to determine the Detection frame from the current video frame, and the Two-Stage Object Detection method (Two-Stage) may include, for example, a spatial pyramid pooling-based deep network (SPPNet), an Object Detection algorithm (Fast register-CNN, Fast-RCNN), a Real-Time Object Detection network (forward read-Time Object Detection with Region distribution Networks, Fast-RCNN), a Feature Pyramid Network (FPN), and other algorithms.

For example, the detection frame may be determined by a single-Stage target detection method (One-Stage) represented by an object detection algorithm (young Only Look One, Yolo).

In other embodiments, the detection frame may be determined from the current video frame in any other possible manner, which is not limited to this.

In the embodiment of the present disclosure, a target detection algorithm based on the YoloV3 algorithm may be used to determine a detection frame from a current video frame, the YoloV3 model may balance speed and accuracy by changing a size of a model structure, as shown in fig. 2, fig. 2 is a schematic structural diagram of a YoloV3 model in the related art, grid 1, grid 2, and grid 3 represent grids (box) detected by the YoloV3 model, and scale 1, scale 2, and scale 3 represent a division range (scale) of the current video frame. The detection frame is determined by using the YoloV3 model, so that the identification and analysis efficiency and accuracy of the detection frame can be effectively improved.

S103: and determining a plurality of predicted tracking frames corresponding to the target to be tracked, wherein the plurality of predicted tracking frames respectively have a plurality of corresponding predicted tracking information.

After the detection frame is determined from the current video frame, a plurality of predictive tracking frames corresponding to the target to be tracked can be determined, and each predictive tracking frame is configured to have corresponding predictive tracking information.

The prediction tracking frame may be a detection frame predicted for a target to be tracked, and the prediction tracking frame may specifically exist in a video frame of one or more frames after a current video frame, that is, in this embodiment of the present disclosure, a detection frame is detected from the current video frame first, and the detection frame includes: and then, assisting the follow-up to adopt some matching modes to determine the most accurate tracking frame from the plurality of prediction tracking frames, thereby realizing the target tracking based on the whole video.

The above-mentioned tracking frame predicted from the subsequent video frame may be referred to as a predicted tracking frame, and the predicted tracking information corresponding to the predicted tracking frame may specifically be based on the relative position information in the video frame to which the predicted tracking frame belongs and/or the actual tracking state of the predicted tracking frame, where the actual tracking state may include: the tracking state is confirmed or not confirmed, where the confirmed tracking state indicates that the predicted tracking frame has been successfully matched with the detection frame, and the tracking state is not confirmed, which may indicate that the predicted tracking frame has not been successfully matched with the detection frame or has not been matched with the detection frame, which is not limited in this respect.

It is understood that a video is composed of consecutive video frames, and a current video frame of an embodiment of the present disclosure may be followed by a plurality of consecutive video frames. For a clearer description of the present embodiment, the current video frame may be understood as a first video frame, and the subsequent video frames may be understood as a second video frame and a third video frame, wherein n is a positive integer greater than 1, and the first video frame to the nth video frame form an integral video.

For example, in the embodiment of the present disclosure, a plurality of predicted tracking frames corresponding to an object to be tracked may be determined, that is, the relative position information of the object to be tracked in a second video frame and a third video frame, may be determined, where the position information that may occur in an nth video frame is relative to the video frame to which the nth video frame belongs, and in an actual detection scenario, in a process of performing object detection on each video frame, generally, one or more objects may be initially detected, and thus, one or more predicted tracking frames corresponding to the object to be tracked may also be one or more, in the embodiment of the present disclosure, a plurality of predicted tracking frames corresponding to the object to be tracked are first determined, each of the plurality of predicted tracking frames has a corresponding plurality of predicted tracking information, so as to assist in subsequently determining, from among the plurality of predicted tracking frames, a predicted tracking frame that is adapted to the object to be tracked, see in particular the examples which follow.

S104: and determining a target tracking frame from the plurality of predicted tracking frames according to the detection frame.

After determining the plurality of predictive tracking frames corresponding to the target to be tracked, the target tracking frame may be determined from the plurality of predictive tracking frames according to the detection frame, where the predictive tracking frame determined from the plurality of predictive tracking frames and adapted to the target to be tracked may be referred to as a target tracking frame, and the predictive tracking frame adapted to the target to be tracked may indicate that the target framed by the target tracking frame and the target to be tracked are the same target, for example, belong to the head, the hand, the mouth, and the like of the same worker.

Optionally, in some embodiments, the target tracking frame is determined from the plurality of predicted tracking frames according to the detection frame, and the detection frame and the plurality of predicted tracking frames may be respectively matched according to the plurality of predicted tracking information, so as to obtain a plurality of matching results respectively corresponding to the plurality of predicted tracking frames; and selecting a target matching result meeting the set condition from the plurality of matching results, and taking the predictive tracking frame corresponding to the target matching result as a target tracking frame, so as to realize the reference of corresponding predictive tracking information, perform matching calculation on the detection frame and the predictive tracking frame, and adopt the set condition to assist in selecting the target tracking frame corresponding to the target matching result. Therefore, the accuracy of determining the target tracking frame can be effectively improved by matching the calculation mode and referring to the corresponding predicted tracking information, the determination efficiency of the target tracking frame is improved, and the target tracking process is effectively optimized.

After the plurality of predictive tracking frames corresponding to the target to be tracked are determined, the detection frame and the plurality of predictive tracking frames may be respectively matched according to the plurality of predictive tracking information, so as to obtain a plurality of matching results respectively corresponding to the plurality of predictive tracking frames.

The matching result may be presented in the form of a score, for example, and the degree of association between the detection frame and each of the predictive tracking frames may be reflected by the matching result, so that the set condition may be that the matching score is greater than a score threshold, where the detection frame, the plurality of predictive tracking frames, and the corresponding plurality of predictive tracking information may be input into a pre-trained matching model, so as to obtain a plurality of matching scores output by the matching model, which is not limited.

The setting condition, such as but not limited to a threshold condition or any other possible condition, may be specifically set adaptively according to the actual target tracking requirement, which is not limited in this respect.

The embodiment of the disclosure may compare a plurality of matching results with the set condition, where a matching result satisfying the set condition may be referred to as a target matching result, the number of the target matching results may be one or more, and then a predicted tracking frame corresponding to the target matching result is used as the target tracking frame.

It should be understood that the above example is only illustrative of the method for determining the target tracking frame, and in practical applications, the target tracking frame may also be determined in any other possible manner, for example, the target tracking frame may also be determined according to the position of the predicted tracking frame in the video frame and the position of the detection frame in the current video frame, and the determination manner of the target tracking frame is not particularly limited herein.

S105: and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to a plurality of pieces of predicted tracking information.

The predicted tracking information corresponding to the target tracking frame may be referred to as target tracking information, and in combination with the above example, the target tracking information may include: confirmed tracking status or unconfirmed tracking status, or, may further include: the target tracking frame is based on the relative position information in the video frame to which the target tracking frame belongs, and the method is not limited.

Therefore, the tracking of the target to be tracked according to the target tracking information corresponding to the target tracking frame can be assisted, that is, the embodiment of the disclosure supports tracking of the target to be tracked according to the target tracking information corresponding to the target tracking frame, and analyzing the behavior of the staff to which the target to be tracked belongs according to the target in the target tracking frame in each frame of video frames obtained by tracking the target to be tracked, for example, analyzing whether the tracked target wears a safety helmet, smokes, calls, and the like, without limitation.

In the embodiment, the accuracy and the efficiency of target tracking under a safety inspection scene can be effectively improved by obtaining the current video frame, determining the detection frame containing the target to be tracked from the current video frame, determining the plurality of prediction tracking frames corresponding to the target to be tracked, determining the target tracking frame from the plurality of prediction tracking frames according to the detection frame, and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, so that the real-time requirement of target tracking in the safety inspection scene is effectively met.

Fig. 3 is a schematic diagram according to a second embodiment of the present disclosure.

As shown in fig. 3, the target tracking method includes:

s301: and acquiring the current video frame.

S302: determining a detection frame from the current video frame, the detection frame comprising: and (5) a target to be tracked.

S303: and determining a plurality of predicted tracking frames corresponding to the target to be tracked, wherein the plurality of predicted tracking frames respectively have a plurality of corresponding predicted tracking information.

For the description of S301 to S303, reference may be made to the above embodiments, which are not described herein again.

S304: and performing cascade matching on the detection frame and the first predictive tracking frame according to the plurality of pieces of predictive tracking information.

That is to say, the embodiment of the present disclosure may adopt a targeted matching manner for the detection frame and the plurality of prediction tracking frames, so that personalized information of different prediction tracking frames can be considered in the matching process to ensure the matching accuracy.

The first predicted trace frame belongs to a plurality of predicted trace frames, and the number of the first predicted trace frames may be one or more.

And the first predictive tracking frame in the present disclosure may be a plurality of predictive tracking frames, and the predictive tracking information includes: the trace box for which the trace status has been confirmed, that is, if the predicted trace information includes: the predicted tracking frame information can comprise the confirmed tracking state and the unconfirmed tracking state, and the embodiment of the disclosure can realize that the first predicted tracking frame is determined based on the predicted tracking information, so that the determination efficiency of the first predicted tracking frame can be improved, the determination process is optimized, and the determination accuracy of the first predicted tracking frame can be ensured.

That is to say, in this embodiment, the detection frame and the first predictive tracking frame may be subjected to cascade matching according to a plurality of pieces of predictive tracking information, where a cascade matching algorithm in the related art may be adopted to perform cascade matching on the detection frame and the first predictive tracking frame, so that the first predictive tracking frame and the detection frame in the confirmed tracking state are matched in time, and the matching accuracy is ensured and the matching efficiency is greatly improved.

S305: and matching the detection frame with the second prediction tracking frame according to the combination of the plurality of pieces of prediction tracking information and the cross-over ratio information, wherein the cross-over ratio information is the cross-over ratio information between the detection frame and the second prediction tracking frame, and the first prediction tracking frame and the second prediction tracking frame are the same or different.

That is to say, the embodiment of the present disclosure may further match the detection frame and the second predictive tracking frame in combination with the cross comparison information between the detection frame and the second predictive tracking frame, and since the cross comparison information is fused in the matching process, flexibility of matching processing is achieved, so that the matching processing logic of the predictive tracking frame and the detection frame can flexibly adapt to the application scene requirements, the matching process is optimized, the information represented by the matching result is more comprehensive, and the matching accuracy and the matching effect are effectively improved.

The second predicted trace frame belongs to a plurality of predicted trace frames, and the number of the second predicted trace frames may be one or more.

In the present disclosure, before matching the detection frame and the second predictive tracking frame according to the combination of the multiple predictive tracking information and the cross-over ratio information, the second predictive tracking frame may be determined from the multiple predictive tracking frames, and optionally, the first predictive tracking frame that is not successfully cascade-matched may be used as the second predictive tracking frame; and/or if the predictive tracking information includes: if the tracking state is not confirmed, the corresponding predicted tracking frame is determined as a second predicted tracking frame. Therefore, the second prediction tracking frame can be determined based on the prediction tracking information, the determination efficiency of the second prediction tracking frame is improved, the determination process is optimized, and the first prediction tracking frame which is not successfully cascaded and matched can be used as the second prediction tracking frame for matching again, so that the information can be ensured not to be lost, and the integrity and comprehensiveness of matching are improved.

The above-mentioned Intersection over Union (IoU), which is a criterion for measuring the accuracy of detecting a corresponding object in a specific data set, can be measured with IoU if the task of deriving a prediction range in the output is performed, and the Intersection ratio information IoU between the detection frame and the second prediction tracking frame can be used to describe the overlapping ratio between the detection frame and the second prediction tracking frame (the overlapping ratio can be used to describe the degree of overlapping between the detection frame and the second prediction tracking frame), and the Intersection ratio information can be used as a reference content when matching the detection frame and the second prediction tracking frame.

In the embodiment of the disclosure, cascade matching of the first prediction tracking frame and the detection frame in the confirmed tracking state is supported, matching of the second prediction tracking frame and the detection frame in the unconfirmed tracking state is supported by combining the cross-over comparison information, and matching of the first prediction tracking frame and the detection frame in the unsuccessfully cascade matching is supported by combining the cross-over comparison information, so that the integrity and the comprehensiveness of matching can be effectively ensured, target tracking loss is avoided, target tracking accuracy is guaranteed, and a target tracking effect is improved.

In the embodiment of the disclosure, the first predictive tracking frame and the second predictive tracking frame may be the same or different, the specific division manner may be configured adaptively according to the actual application scenario of the target tracking, and when the first predictive tracking frame and the second predictive tracking frame are divided in any other possible manner, the detection frame may also be matched with the first predictive tracking frame and the second predictive tracking frame in any other possible manner, and adaptive adjustment may be performed, without limitation.

In the embodiments of the present disclosure, the method for matching the detection frame and the prediction tracking frame includes, but is not limited to, using a cascade matching and/or a cross-over information manner, that is, any other possible manner may also be used for matching, for example, using an artificial intelligence model for matching, or using an engineering manner for matching.

The "first" and "second" in the above embodiments of the present disclosure are used for distinguishing fields, and are not used as limitations to possible implementations of the embodiments of the present disclosure.

S306: and selecting a target matching result meeting the set condition from the plurality of matching results, and taking the predicted tracking frame corresponding to the target matching result as a target tracking frame.

S307: and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to a plurality of pieces of predicted tracking information.

For the description of S306-S307, reference may be made to the above embodiments, which are not described herein again.

In this embodiment, in the embodiment, by obtaining the current video frame, determining the detection frame including the target to be tracked from the current video frame, determining the plurality of predictive tracking frames corresponding to the target to be tracked, determining the target tracking frame from the plurality of predictive tracking frames according to the detection frame, and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, the accuracy and the efficiency of target tracking in the safety inspection scene can be effectively improved, so that the real-time requirement of target tracking in the safety inspection scene is effectively met. In addition, the first prediction tracking frame and the detection frame of the confirmed tracking state can be subjected to cascade matching, the second prediction tracking frame and the detection frame of the unconfirmed tracking state can be matched by combining the cross-over ratio information, and the first prediction tracking frame and the detection frame which are unsuccessfully subjected to cascade matching can be matched by combining the cross-over ratio information. The first prediction tracking frame and the second prediction tracking frame are determined based on the tracking state information, so that the calculation speed can be increased, the calculation process is optimized, the cascade matching algorithm is utilized, and the cross-comparison information is combined for matching, so that the integrity and comprehensiveness of matching can be effectively guaranteed, the target tracking loss is avoided, the target tracking accuracy is guaranteed, and the target tracking effect is improved.

Fig. 4 is a schematic diagram according to a third embodiment of the present disclosure.

As shown in fig. 4, the target tracking method includes:

s401: and acquiring the current video frame.

S402: determining a detection frame from the current video frame, the detection frame comprising: and (5) a target to be tracked.

S403: and determining a plurality of predicted tracking frames corresponding to the target to be tracked, wherein the plurality of predicted tracking frames respectively have a plurality of corresponding predicted tracking information.

For the description of S401 to S403, reference may be made to the above embodiments, which are not described herein again.

S404: and determining depth information corresponding to the target to be tracked in the detection frame.

In this embodiment, when performing cascade matching on the detection frame and the first predictive tracking frame according to a plurality of pieces of predictive tracking information, depth information corresponding to an object to be tracked in the detection frame may be first determined, and the depth information may be used to assist in performing cascade matching on the detection frame and the first predictive tracking frame.

The depth information of the target to be tracked in the video frame to which the target to be tracked belongs may be referred to as depth information corresponding to the target to be tracked, and the depth information may also be understood as a depth feature, and the depth information includes, for example and without limitation, a pixel, a position, a size, an image feature, and any other possible information of the tracked object in the detection frame, and the depth information is not specifically limited herein.

When the depth information corresponding to the target to be tracked is determined, the depth information can be obtained by adopting a Time of Flight (TOF) technology, namely, the sensor emits modulated near infrared light and reflects the modulated near infrared light after encountering an object, and the distance of the shot scenery is converted by the sensor through calculating the Time difference or phase difference between light emission and reflection so as to generate the depth information, which is not limited.

S405: and generating a first cosine distance matrix by combining the predicted tracking information corresponding to the first predicted tracking frame according to the depth information.

After determining the depth information corresponding to the target to be tracked in the detection frame, a first cosine distance matrix may be generated by combining the predicted tracking information corresponding to the first predicted tracking frame according to the depth information, where the first cosine distance matrix may include a plurality of elements, each element may be used to represent the similarity of a corresponding feature between the detection frame and the first predicted tracking frame, and the corresponding feature may be, for example, an image feature, a position feature, and the like, which is not limited herein.

The corresponding features between the detection frame and the first prediction tracking frame may be, for example, color features and relative position features between corresponding pixel points in the image area covered by the detection frame and the image area covered by the first prediction tracking frame, which is not limited herein.

S406: a first similarity between the detection box and the first predictive tracking box is determined.

The determined overall similarity between the detection frame and the first prediction tracking frame may be referred to as a first similarity, and the first similarity may be specifically characterized by using a mahalanobis distance, or may also be determined in any other possible manner, which is not limited thereto.

Each element in the first cosine distance matrix may be used to represent the similarity of the corresponding feature between the detection frame and the first predicted tracking frame, and the overall similarity between the detection frame and the first predicted tracking frame may be referred to as a first similarity.

S407: and updating a first target element in the first cosine distance matrix according to the first similarity to obtain a first target matrix, wherein the element value of the first target element is greater than the value corresponding to the maximum first similarity in the plurality of first similarities.

After the first similarity is determined, updating a first target element in the first cosine distance matrix according to the first similarity to obtain a first target matrix.

Wherein, the element value in the first cosine distance matrix, which is greater than the value corresponding to the maximum first similarity, may be referred to as a first target element, i.e., the element in the first cosine distance matrix which is greater than the maximum mahalanobis distance is referred to as a first target element.

In this embodiment, the first target element in the first cosine distance matrix may be updated, and the updated first cosine distance matrix is referred to as a first target matrix.

S408: and processing the first target matrix by adopting a Hungarian algorithm to carry out cascade matching.

And further, processing the first target matrix by adopting a Hungarian algorithm to carry out cascade matching.

Therefore, in the embodiment of the disclosure, the detection frame and the first predictive tracking frame may be subjected to cascade matching based on the determined first cosine distance matrix and the overall similarity between the detection frame and the first predictive tracking frame, so that the cascade matching is assisted to be completed based on the similarity, the matching accuracy is ensured, the operation complexity of the cascade matching is reduced, and the overall target tracking efficiency is improved. In addition, the cosine distance matrix, the Mahalanobis distance algorithm and the Hungary algorithm are adopted to carry out cascade matching calculation between the detection frame and the first prediction tracking frame, so that the cascade matching process can be optimized, and the cascade matching accuracy between the detection frame and the first prediction tracking frame is improved.

S409: and generating a second cosine distance matrix according to the intersection ratio information.

In the embodiment of the present disclosure, when the detection frame and the second predictive tracking frame are matched according to the multiple pieces of predictive tracking information in combination with the cross-over ratio information, a second cosine distance matrix may be first generated according to the cross-over ratio information.

The cosine distance matrix generated according to the intersection ratio Information (IOU) between the detection frame and the second prediction tracking frame may be referred to as a second cosine distance matrix, and the second cosine distance matrix may include a plurality of elements, each of which may be respectively used to represent the intersection ratio information between the detection frame and the second prediction tracking frame, for example, each row of the second cosine matrix may represent the intersection ratio information between the second prediction tracking frame and each detection result.

S410: a second similarity between the detection frame and the second predictive tracking frame is determined.

The determined overall similarity between the detection frame and the second prediction tracking frame may be referred to as a second similarity, and the second similarity may be specifically characterized by using a mahalanobis distance, or may also be determined by using any other possible manner, which is not limited thereto.

Each element in the second cosine distance matrix may be used to represent the similarity of the corresponding feature between the detection frame and the second predicted tracking frame, and the overall similarity between the detection frame and the second predicted tracking frame may be referred to as a second similarity.

S411: and updating second target elements in the second cosine distance matrix according to the second similarity to obtain a second target matrix, wherein the element value of the second target elements is greater than the value corresponding to the maximum second similarity in the plurality of second similarities.

And after the second similarity is determined, updating a second target element in the second cosine distance matrix according to the second similarity to obtain a first target matrix.

Wherein, the element value in the second cosine distance matrix, which is greater than the value corresponding to the maximum second similarity, may be referred to as a second target element, i.e., the element in the second cosine distance matrix which is greater than the maximum mahalanobis distance is referred to as a second target element.

In this embodiment, the second target element in the second cosine distance matrix may be updated, and the updated second cosine distance matrix is referred to as a second target matrix.

S412: the second target matrix is processed using the hungarian algorithm for matching.

And further, processing the second target matrix by adopting a Hungarian algorithm to perform matching.

Therefore, in the embodiment of the disclosure, the cross-over ratio information matching between the detection frame and the second prediction tracking frame may be performed based on the second cosine distance matrix determined according to the cross-over ratio information and the overall similarity between the detection frame and the second prediction tracking frame, so that matching is assisted and completed based on the similarity degree, the matching accuracy is ensured, the matching operation complexity is reduced, and the overall target tracking efficiency is improved. In addition, the cosine distance matrix is adopted, the cross-comparison information and the Hungarian algorithm are combined to carry out matching calculation between the detection box and the second prediction tracking box, so that the matching process can be optimized, and the matching accuracy between the detection box and the second prediction tracking box is improved.

It should be understood that the above example is only an exemplary illustration of the matching process in combination with the cross-comparison information, and the matching of the detection frame and the prediction tracking frame may be implemented in any other possible manner in practical operation, and is not limited in detail herein.

S413: and selecting a target matching result meeting the set condition from the plurality of matching results, and taking the predicted tracking frame corresponding to the target matching result as a target tracking frame.

S414: and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to a plurality of pieces of predicted tracking information.

For the description of S413-S414, reference may be made to the above embodiments, which are not described herein again.

In the embodiment, the accuracy and the efficiency of target tracking under a safety inspection scene can be effectively improved by obtaining the current video frame, determining the detection frame containing the target to be tracked from the current video frame, determining the plurality of prediction tracking frames corresponding to the target to be tracked, determining the target tracking frame from the plurality of prediction tracking frames according to the detection frame, and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, so that the real-time requirement of target tracking in the safety inspection scene is effectively met. In addition, in the operation of carrying out cascade matching and matching by combining cross-over ratio information, the cosine distance matrix and the similarity can be combined for matching, so that matching is assisted and completed based on the similarity, the matching accuracy is guaranteed, meanwhile, the matching operation complexity is reduced, and the overall target tracking efficiency is improved. In addition, various algorithms such as a cosine distance matrix, a mahalanobis distance algorithm and a hungarian algorithm are adopted in the matching process, the accuracy of cascade matching and combination cross-comparison matching is respectively ensured, and the overall accuracy of the matching process is further improved.

Fig. 5 is a schematic diagram according to a fourth embodiment of the present disclosure.

As shown in fig. 5, the target tracking method includes:

s501: and acquiring the current video frame.

S502: determining a detection frame from the current video frame, the detection frame comprising: and (5) a target to be tracked.

For the description of S501-S502, reference may be made to the above embodiments, which are not described herein again.

S503: and performing tracking frame prediction on the target to be tracked by adopting a Kalman filtering algorithm to obtain a plurality of candidate tracking frames.

Optionally, in some embodiments, after determining the plurality of predicted tracking frames corresponding to the target to be tracked, performing tracking frame prediction on the target to be tracked by using a kalman filter algorithm to obtain a plurality of candidate tracking frames.

The tracking frame obtained by preliminarily predicting the target to be tracked by using the kalman filter algorithm may be referred to as a candidate tracking frame, that is, a video frame (the video frame may be, for example, any one of a second video frame and a third video frame, which is not limited to this) is predicted by using the kalman filter algorithm, so that one or more candidate tracking frames may be obtained for each frame of the video frame.

S504: a plurality of confidences corresponding to the plurality of candidate tracking frames, respectively, is determined.

After the tracking frame prediction is performed on the target to be tracked by adopting the kalman filtering algorithm to obtain a plurality of candidate tracking frames, a plurality of confidence degrees respectively corresponding to the plurality of candidate tracking frames may be triggered and determined, and the confidence degrees may be used to represent the probability of successful matching between the candidate tracking frames and the detection frame.

S505: and determining a target confidence coefficient from the confidence coefficients, and taking a candidate tracking frame corresponding to the target confidence coefficient as a predicted tracking frame.

In some embodiments, for example, the target confidence may be determined from the multiple confidences by using a threshold, or the target confidence may also be determined by using any other possible manner, and then the candidate tracking frame corresponding to the target confidence is used as the predicted tracking frame.

Therefore, in the embodiment, the Kalman filtering algorithm is adopted to preliminarily predict the target to be tracked to obtain a plurality of candidate tracking frames, so that the comprehensiveness of the predicted tracking frames can be ensured, and the loss of information is avoided. And further, fine-grained screening is carried out on the candidate tracking frame according to the confidence coefficient, so that the range of the predicted tracking frame can be reduced, and the resource occupation of later-stage matching calculation can be reduced. And moreover, the predictive tracking frame is determined according to the confidence coefficient, and the accuracy of the predictive tracking frame can be ensured.

It should be understood that the above examples are only illustrative for determining multiple predicted trace frames, and other alternative ways or algorithms may be used to determine multiple predicted trace frames in practical applications, and are not limited in this respect.

S506: and determining a target tracking frame from the plurality of predicted tracking frames according to the detection frame.

S507: and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to a plurality of pieces of predicted tracking information.

For the description of S506-S507, reference may be made to the above embodiments, which are not described herein again.

In the embodiment, the accuracy and the efficiency of target tracking in a safety inspection scene can be effectively improved by acquiring the current video frame, determining the detection frame containing the target to be tracked from the current video frame, determining the plurality of prediction tracking frames corresponding to the target to be tracked, determining the target tracking frame from the plurality of prediction tracking frames according to the detection frame, and tracking the target to be tracked according to the target tracking information corresponding to the target tracking frame, so that the real-time requirement of target tracking in the safety inspection scene is effectively met. And moreover, a Kalman filtering algorithm is adopted to preliminarily predict the target to be tracked to obtain a plurality of candidate tracking frames, so that the comprehensiveness of the predicted tracking frames can be ensured, and the loss of information is avoided. And further, fine-grained screening is carried out on the candidate tracking frame according to the confidence coefficient, so that the range of the predicted tracking frame can be reduced, and the resource occupation of later-stage matching calculation can be reduced. And moreover, the predictive tracking frame is determined according to the confidence coefficient, and the accuracy of the predictive tracking frame can be ensured.

For example, the target tracking method provided by the embodiment of the disclosure may be implemented by a target detection model and a target tracking model in practical application. Fig. 6 is a schematic flowchart of a target tracking method according to an embodiment of the present disclosure, and as shown in fig. 6, for a video frame of a factory inspection video, information such as a position and a confidence of a tracked object (e.g., a head, a hand, and a mouth), and image features of each detection frame may be obtained through yolo v3 (a target detection model).

Further, the detection frames are filtered according to the confidence degrees, multiple frames on one target are eliminated through non-maximum inhibition to obtain the detection frames, then, a target tracking method in the embodiment of the disclosure can be adopted for the detection frames, a prediction tracking frame of a video frame of the target to be tracked behind the current video frame is predicted by combining Kalman filtering Kalman, parameters and feature sets are updated for each prediction tracking frame, the prediction tracking frame in a confirmed tracking state and the prediction tracking frame in an unconfirmed tracking state are distinguished, matching of the detection frames and each prediction tracking frame is achieved, and therefore judgment of target disappearance and new target appearance is conducted. The method comprises the following steps:

1) and performing cascade matching on the prediction tracking frames with confirmed tracking states: and calculating a cosine distance matrix between the depth information of each target newly detected by the current video frame and the feature set stored in each predictive tracking frame for a plurality of predictive tracking frames in the same disappearance time. And calculating the Mahalanobis distance between the prediction tracking frame and the detection frame on the basis of the cosine distance matrix of the calculated features. Setting elements larger than the maximum mahalanobis distance in the cosine distance matrix as the cosine matrix distance larger than the maximum mahalanobis distance; and (3) assigning by using a Hungarian algorithm and taking a cosine distance matrix as input, and after the assignment is finished, classifying unmatched detection, unmatched tracking and matched pair tracking.

2) Matching between the prediction tracking frame on the non-cascade matching, the prediction tracking frame of the unconfirmed tracking state and the detection frame on the non-cascade matching based on the cross-over ratio information is specifically realized by calculating a cosine distance matrix, wherein each row of the matrix represents the cross-over ratio information between one prediction tracking frame and each detection result. Setting elements larger than the maximum mahalanobis distance in the cosine distance matrix as the maximum mahalanobis distance, then using a Hungary algorithm to assign by taking the cosine distance matrix as input, and counting matched detection boxes, unmatched detection boxes and unmatched prediction tracking boxes in sequence after assignment is finished; and in the matching process, updating parameters, deleting unmatched prediction tracking frames possibly, initializing the unmatched detection frames into new prediction tracking frames, and updating the feature set of the reserved prediction tracking frames.

Fig. 7 is a schematic diagram according to a fifth embodiment of the present disclosure.

As shown in fig. 7, the target tracking device 70 includes:

an obtaining module 701, configured to obtain a current video frame;

a first determining module 702, configured to determine a detection frame from a current video frame, the detection frame comprising: a target to be tracked;

a second determining module 703, configured to determine multiple predicted tracking frames corresponding to the target to be tracked, where the multiple predicted tracking frames respectively have multiple corresponding predicted tracking information;

a third determining module 704, configured to determine a target tracking frame from the plurality of predicted tracking frames according to the detection frame; and

the tracking module 705 is configured to track the target to be tracked according to target tracking information corresponding to the target tracking frame, where the target tracking information belongs to multiple pieces of predicted tracking information.

Optionally, in some embodiments of the present disclosure, as shown in fig. 8, fig. 8 is a schematic diagram according to a sixth embodiment of the present disclosure, and the target tracking apparatus 80 includes: an obtaining module 801, a first determining module 802, a second determining module 803, a third determining module 804, and a tracking module 805, wherein the third determining module 804 includes:

a matching submodule 8041, configured to match the detection frame and the multiple prediction tracking frames according to the multiple pieces of prediction tracking information, so as to obtain multiple matching results corresponding to the multiple prediction tracking frames, respectively;

the selecting sub-module 8042 is configured to select a target matching result that meets a set condition from the multiple matching results, and use a predicted tracking frame corresponding to the target matching result as a target tracking frame.

Optionally, in some embodiments of the present disclosure, the matching sub-module 8041 is specifically configured to: according to the plurality of pieces of predicted tracking information, cascade matching is carried out on the detection frame and the first predicted tracking frame; and matching the detection frame with the second prediction tracking frame according to the combination of the plurality of pieces of prediction tracking information and the cross-over ratio information, wherein the cross-over ratio information is the cross-over ratio information between the detection frame and the second prediction tracking frame, and the first prediction tracking frame and the second prediction tracking frame are the same or different.

Optionally, in some embodiments of the present disclosure, the matching sub-module 8041 is specifically configured to: if the predicted tracking information includes: and if the tracking state is confirmed, determining the corresponding predicted tracking frame as a first predicted tracking frame.

Optionally, in some embodiments of the present disclosure, the matching sub-module 8041 is specifically configured to: taking the first prediction tracking frame which is not successfully cascaded and matched as a second prediction tracking frame; and/or if the predictive tracking information includes: if the tracking state is not confirmed, the corresponding predicted tracking frame is determined as a second predicted tracking frame.

Optionally, in some embodiments of the present disclosure, the matching sub-module 8041 is specifically configured to: determining depth information corresponding to a target to be tracked in the detection frame; generating a first cosine distance matrix by combining the predicted tracking information corresponding to the first predicted tracking frame according to the depth information; determining a first similarity between the detection frame and the first predictive tracking frame; updating a first target element in the first cosine distance matrix according to the first similarity to obtain a first target matrix, wherein the element value of the first target element is greater than the value corresponding to the maximum first similarity in the plurality of first similarities; and processing the first target matrix by adopting a Hungarian algorithm to carry out cascade matching.

Optionally, in some embodiments of the present disclosure, the matching sub-module 8041 is specifically configured to: generating a second cosine distance matrix according to the cross-over ratio information; determining a second similarity between the detection frame and a second predictive tracking frame; updating a second target element in the second cosine distance matrix according to the second similarity to obtain a second target matrix, wherein the element value of the second target element is greater than the value corresponding to the maximum second similarity in the plurality of second similarities; and processing the second target matrix by using a Hungarian algorithm to perform matching.

Optionally, in some embodiments of the present disclosure, as shown in fig. 8, the second determining module 803 includes:

the prediction submodule 8031 is configured to perform tracking frame prediction on a target to be tracked by using a kalman filtering algorithm to obtain multiple candidate tracking frames;

a first determining submodule 8032, configured to determine a plurality of confidence levels corresponding to the plurality of candidate tracking frames, respectively; and

the second determining sub-module 8033 is configured to determine a target confidence from the multiple confidences, and use a candidate tracking frame corresponding to the target confidence as a predicted tracking frame.

It is understood that the target tracking device 80 in fig. 8 of the present embodiment and the target tracking device 70 in the foregoing embodiment, the obtaining module 801 and the obtaining module 701 in the foregoing embodiment, the first determining module 802 and the first determining module 702 in the foregoing embodiment, the second determining module 803 and the second determining module 703 in the foregoing embodiment, and the third determining module 804 and the third determining module 704 and the tracking module 805 in the foregoing embodiment and the tracking module 705 in the foregoing embodiment may have the same functions and structures.

It should be noted that the foregoing explanation of the target tracking method is also applicable to the target tracking apparatus of the present embodiment, and is not repeated herein.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 is a block diagram of an electronic device for implementing a target tracking method of an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, for example, the target tracking method.

For example, in some embodiments, the target tracking method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM903 and executed by the computing unit 901, one or more steps of the object tracking method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the target tracking method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the target tracking methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable object tracking device such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A target tracking method, comprising:

acquiring a current video frame;

determining a detection frame from among the current video frames, the detection frame comprising: a target to be tracked;

determining a plurality of prediction tracking frames corresponding to the target to be tracked, wherein the plurality of prediction tracking frames respectively have a plurality of corresponding prediction tracking information;

determining a target tracking frame from the plurality of predicted tracking frames according to the detection frame; and

and tracking the target to be tracked according to target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to the plurality of pieces of predicted tracking information.

2. The method of claim 1, wherein said determining a target tracking frame from among said plurality of predictive tracking frames based on said detection frame comprises:

according to the plurality of pieces of predicted tracking information, matching the detection frame and the plurality of predicted tracking frames respectively to obtain a plurality of matching results corresponding to the plurality of predicted tracking frames respectively;

and selecting a target matching result meeting set conditions from the matching results, and taking a prediction tracking frame corresponding to the target matching result as the target tracking frame.

3. The method of claim 2, wherein said matching the detection box and the plurality of predictive tracking boxes, respectively, based on the plurality of predictive tracking information comprises:

according to the plurality of pieces of predicted tracking information, cascade matching is carried out on the detection frame and the first predicted tracking frame;

and matching the detection frame with a second prediction tracking frame according to the combination of the plurality of pieces of prediction tracking information and combination ratio information, wherein the combination ratio information is the combination ratio information between the detection frame and the second prediction tracking frame, and the first prediction tracking frame and the second prediction tracking frame are the same or different.

4. The method of claim 3, further comprising, before said cascade matching the detection box and the first predictive tracking box according to the plurality of predictive tracking information:

if the predictive tracking information includes: and if the tracking state is confirmed, determining the corresponding predicted tracking frame as the first predicted tracking frame.

5. The method of claim 3, further comprising, before said matching said detection box and said second predictive tracking box according to said plurality of predictive tracking information in combination with cross-over ratio information:

taking the first predictive tracking frame which is not successfully cascaded and matched as the second predictive tracking frame; and/or

If the predictive tracking information includes: if the tracking state is not confirmed, determining the corresponding predicted tracking frame as the second predicted tracking frame.

6. The method of claim 3, wherein said cascade matching the detection box and the first predictive tracking box according to the plurality of predictive tracking information comprises:

determining depth information corresponding to the target to be tracked in the detection frame;

generating a first cosine distance matrix according to the depth information by combining with the predicted tracking information corresponding to the first predicted tracking frame;

determining a first similarity between the detection box and the first predictive tracking box;

updating a first target element in the first cosine distance matrix according to the first similarity to obtain a first target matrix, wherein the element value of the first target element is greater than the value corresponding to the maximum first similarity in the first similarities; and

and processing the first target matrix by adopting a Hungarian algorithm to carry out cascade matching.

7. The method of claim 3, wherein said matching the detection box and the second predictive tracking box according to the plurality of predictive tracking information in combination with cross-over ratio information comprises:

generating a second cosine distance matrix according to the cross-over ratio information;

determining a second similarity between the detection frame and the second predictive tracking frame;

updating a second target element in the second cosine distance matrix according to the second similarity to obtain a second target matrix, wherein the element value of the second target element is greater than the value corresponding to the maximum second similarity in the plurality of second similarities; and

processing the second target matrix using the Hungarian algorithm to perform the matching.

8. The method of any of claims 1-7, wherein the determining a plurality of predictive tracking frames corresponding to the target to be tracked comprises:

adopting a Kalman filtering algorithm to predict a tracking frame of the target to be tracked so as to obtain a plurality of candidate tracking frames;

determining a plurality of confidence degrees respectively corresponding to the plurality of candidate tracking frames; and

and determining a target confidence degree from the confidence degrees, and taking a candidate tracking frame corresponding to the target confidence degree as the predicted tracking frame.

9. An object tracking device, comprising:

the acquisition module is used for acquiring a current video frame;

a first determining module, configured to determine a detection frame from among the current video frames, where the detection frame includes: a target to be tracked;

a second determining module, configured to determine multiple predicted tracking frames corresponding to the target to be tracked, where the multiple predicted tracking frames respectively have multiple corresponding predicted tracking information;

a third determining module, configured to determine a target tracking frame from the plurality of predicted tracking frames according to the detection frame; and

and the tracking module is used for tracking the target to be tracked according to target tracking information corresponding to the target tracking frame, wherein the target tracking information belongs to the plurality of pieces of predicted tracking information.

10. The apparatus of claim 9, wherein the third determining means comprises:

the matching sub-module is used for respectively matching the detection frame and the plurality of predictive tracking frames according to the plurality of predictive tracking information to obtain a plurality of matching results respectively corresponding to the plurality of predictive tracking frames;

and the selection submodule is used for selecting a target matching result meeting set conditions from the plurality of matching results and taking a prediction tracking frame corresponding to the target matching result as the target tracking frame.

11. The apparatus according to claim 10, wherein the matching submodule is specifically configured to:

12. The apparatus according to claim 11, wherein the matching sub-module is specifically configured to:

13. The apparatus according to claim 11, wherein the matching sub-module is specifically configured to:

14. The apparatus according to claim 11, wherein the matching sub-module is specifically configured to:

15. The apparatus according to claim 11, wherein the matching sub-module is specifically configured to:

16. The apparatus of any of claims 9-15, wherein the second determining means comprises:

the prediction submodule is used for predicting the tracking frame of the target to be tracked by adopting a Kalman filtering algorithm so as to obtain a plurality of candidate tracking frames;

a first determining sub-module, configured to determine a plurality of confidence levels corresponding to the plurality of candidate tracking frames, respectively; and

and the second determining submodule is used for determining a target confidence coefficient from the confidence coefficients and taking a candidate tracking frame corresponding to the target confidence coefficient as the predicted tracking frame.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.