CN112991393A

CN112991393A - Target detection and tracking method and device, electronic equipment and storage medium

Info

Publication number: CN112991393A
Application number: CN202110405768.7A
Authority: CN
Inventors: 马向军; 马原
Original assignee: Beijing Pengsi Technology Co ltd
Current assignee: Beijing Pengsi Technology Co ltd
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2021-06-18

Abstract

The embodiment of the disclosure discloses a target detection and tracking method, a device, an electronic device and a storage medium, wherein the method comprises the following steps: receiving an image sequence, wherein the image sequence comprises a plurality of image frames with continuous image acquisition time; the method comprises the steps of carrying out period division on an image sequence, identifying the Nth image frame of each period as a detection frame, and identifying a preset number of image frames which are continuous behind the detection frame as tracking frames which belong to the same period as the detection frame; carrying out target detection and tracking processing on the image sequence according to the sequence of the image acquisition time to obtain target detection and tracking information; in the target detection and tracking process, a target is detected from a detection frame, and the detected target is tracked from a predetermined number of tracking frames belonging to the same period as the detection frame. The technical scheme can improve the processing speed of target detection and tracking, and can ensure the real-time performance, so that the target detection and tracking scheme can be suitable for the scene of real-time snapshot on site.

Description

Target detection and tracking method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image technologies, and in particular, to a target detection and tracking method and apparatus, an electronic device, and a storage medium.

Background

With the development of internet technology, target detection and tracking are more and more widely applied to various scenes. The detection and tracking of the target is an important component in the image processing technology and comprises two subtasks of target detection and target tracking. Target detection may be understood as a process of classifying a target object after detecting it in an image. The target tracking can be understood as a process of continuously obtaining the motion state of a target in subsequent frames by taking a certain image frame of an image sequence as a starting point and utilizing a target given by manual selection or target detection.

Although the object detection method alone can well obtain the positions of all the objects and label the categories of the objects, the processing speed of detection is slow. The method for detecting and tracking the target by using the target alone needs to manually set the initial position of the target to be tracked, and secondly, the newly appeared target cannot be processed, although the speed is high, the method cannot be applied to an actual scene.

Although the prior art has a scheme of integrating target detection and target tracking into the same system, for example, a scheme of performing sampling detection on an image frame sequence and then tracking according to a detection result, the target tracking speed of the scheme is slow, and the effect of multi-path real-time tracking cannot be achieved, so that the scheme cannot be applied to real-time snapshot scenes such as traffic and the like. Therefore, how to increase the real-time processing speed of target tracking to meet the requirement of real-time capturing scenes such as traffic is one of the technical problems to be solved in the field.

Disclosure of Invention

The embodiment of the disclosure provides a target detection and tracking method and device, electronic equipment and a computer-readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a target detection and tracking method, including:

receiving an image sequence, wherein the image sequence comprises a plurality of image frames with continuous image acquisition time;

the image sequence is divided into periods, the Nth image frame of each period is identified as a detection frame, and a predetermined number of image frames which are continuous after the detection frame are identified as tracking frames which belong to the same period as the detection frame; wherein N is a positive integer;

carrying out target detection and tracking processing on the image sequence according to the sequence of image acquisition time to obtain target detection and tracking information; in the process of target detection and tracking, a target is detected from the detection frame, and the detected target is tracked from a preset number of tracking frames belonging to the same period with the detection frame.

Further, in the target detection and tracking process, a target which is tracked in the previous period and is not successfully matched in the detection frame of the current period is determined as a tracking-completed target, and an uncompleted tracking target is continuously tracked in the tracking frame of the current period, wherein the uncompleted tracking target comprises a target which belongs to the same target as the target detected in the detection frame of the current period and is tracked in the previous period, and a target which is detected in the detection frame of the current period and is not tracked in the previous period.

Further, the target detection and tracking processing is performed on the image sequence according to the sequence of the image acquisition time to obtain target detection and tracking information, and the method comprises the following steps:

acquiring an unprocessed image frame with the earliest image acquisition time from the image sequence;

when the image frame is a detection frame, detecting a target from the detection frame;

when the image frame is a tracking frame, the target detected in the detection frame in the same period is tracked in the tracking frame.

Further, the detecting a target from the detection frame when the image frame is a detection frame includes:

detecting a first target from the detection frame;

matching the first target with a second target tracked in the previous period;

and taking the second target which is not matched with the first target as a tracking-finished target, and outputting target detection and tracking information obtained by tracking the tracking-finished target in the last period.

acquiring a target detection frame from the detection frame by using a pre-trained target detection model;

acquiring a target category corresponding to the image in the target detection frame by using a pre-trained target recognition model;

and screening the target from the target detection box according to the target category.

Further, screening the target from the target detection box according to the target category includes:

screening candidate detection frames with the target category as a preset category from the target detection frames;

determining the candidate detection frame with the confidence coefficient larger than a second confidence coefficient threshold value output by the target detection model as a target detection frame corresponding to the target; the second confidence threshold is greater than a first confidence threshold, and the first confidence threshold is a confidence threshold used by the target detection model to detect the target detection frame from a detection frame.

Further, the target is a human body or a human face.

carrying out human body detection and tracking processing on the image sequence according to the sequence of the image acquisition time to obtain human body detection and tracking information, and carrying out human face detection and tracking processing on the image sequence according to the sequence of the image acquisition time to obtain human face detection and tracking information;

associating face detection and tracking information and human body detection and tracking information belonging to the same person;

and outputting target detection and tracking information in each image frame according to the correlation result.

Further, associating the face detection and tracking information and the body detection and tracking information belonging to the same person, comprising:

determining an overlapping area of a human face and a human body in the same image frame according to the human face detection and tracking information and the human body detection and tracking information;

establishing an initial association relationship between the human face detection and tracking information and the human body detection and tracking information according to the area size of the overlapping region;

on the basis of the initial incidence relation, determining the face detection and tracking information and the human body detection and tracking information with a matching relation by using a Hungarian algorithm;

determining the similarity of the human face in the human body detection and tracking information and the human face in the human body detection and tracking information in an image frame aiming at the human face detection and tracking information and the human body detection and tracking information which have a matching relation;

and establishing a final target association relationship between the human body detection and tracking information and the human face detection and tracking information, wherein the similarity is greater than a similarity threshold value.

In a second aspect, an embodiment of the present invention provides an object detecting and tracking apparatus, including:

a receiving module configured to receive an image sequence comprising a plurality of image frames that are temporally consecutive in image acquisition;

a dividing module configured to divide the image sequence into periods, identify an nth image frame of each period as a detection frame, and identify a predetermined number of image frames following the detection frame as tracking frames belonging to the same period as the detection frame; wherein N is a positive integer;

the processing module is configured to perform target detection and tracking processing on the image sequence according to the sequence of image acquisition time to obtain target detection and tracking information; in the process of target detection and tracking, a target is detected from the detection frame, and the detected target is tracked from a preset number of tracking frames belonging to the same period with the detection frame.

The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the apparatus includes a memory configured to store one or more computer instructions that enable the apparatus to perform the corresponding method, and a processor configured to execute the computer instructions stored in the memory. The apparatus may also include a communication interface for the apparatus to communicate with other devices or a communication network.

In a third aspect, the disclosed embodiments provide an electronic device, comprising a memory, a processor, and a computer program stored on the memory, wherein the processor executes the computer program to implement the method of any one of the above aspects.

In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for use by any one of the above apparatuses, the computer instructions, when executed by a processor, being configured to implement the method of any one of the above aspects.

In a fifth aspect, the disclosed embodiments provide a computer program product comprising computer instructions that, when executed by a processor, implement the method of any one of the above aspects.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

in the process of target detection and tracking for the acquired image sequence, after the image sequence is divided into a plurality of periods according to the sequence of image acquisition time, the nth image frame (N is a positive integer, such as the first frame) in each period, namely the detection frame, is subjected to target detection, and the subsequent frame in the same period, namely the tracking frame, is subjected to target tracking.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

fig. 1 shows a flow diagram of a target detection and tracking method according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram illustrating an implementation flow of target detection and tracking processing according to an embodiment of the present disclosure.

Fig. 3 shows a schematic implementation flow diagram of a target detection and tracking process according to an embodiment of the present disclosure.

Fig. 4 shows a block diagram of a target detecting and tracking device according to an embodiment of the present disclosure.

Fig. 5 is a schematic structural diagram of an electronic device suitable for implementing a target detection and tracking method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, actions, components, parts, or combinations thereof, and do not preclude the possibility that one or more other features, numbers, steps, actions, components, parts, or combinations thereof are present or added.

It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The details of the embodiments of the present disclosure are described in detail below with reference to specific embodiments.

Fig. 1 shows a flow diagram of a target detection and tracking method according to an embodiment of the present disclosure. As shown in fig. 1, the target detecting and tracking method includes the following steps:

in step S101, receiving an image sequence, the image sequence including a plurality of image frames whose image acquisition times are continuous;

in step S102, the image sequence is divided into periods, the nth image frame of each period is identified as a detection frame, and a predetermined number of image frames following the detection frame are identified as tracking frames belonging to the same period as the detection frame; wherein N is a positive integer;

in step S103, performing target detection and tracking processing on the image sequence according to the sequence of image acquisition time to obtain target detection and tracking information; in the process of target detection and tracking, a target is detected from the detection frame, and the detected target is tracked from a preset number of tracking frames belonging to the same period with the detection frame.

In this embodiment, the image sequence may be a plurality of images with continuous image acquisition time acquired by the image acquisition device in real time on the acquisition site. The image acquisition device can be a camera fixedly installed on an acquisition site, such as a building, a road and the like, or a movable image acquisition device, such as a handheld electronic device with an image acquisition function, a vehicle and the like. The image acquisition device can continuously acquire images or videos of the target site and transmit the acquired images or videos to the processing device for processing. The image sequence may be a plurality of image frames which are continuously acquired by the image acquisition device and have a chronological order relationship, or a plurality of continuous image frames in a section of video acquired by the image acquisition device.

After receiving the image sequence acquired by the image acquisition device in real time, the processing device may store the image sequence in the cache according to the order of the image acquisition time. For example, it may be stored in a first-in-first-out manner in the buffer queue.

In some embodiments, in order to increase the processing speed, the image sequence is divided into cycles, each cycle includes one detection frame and a predetermined number of tracking frames following the detection frame, wherein the nth frame (N is a positive integer) of each cycle may be identified as the detection frame, and the other frames are identified as the tracking frames within the cycle. For example, a cycle comprises a total of 10 image frames, wherein the 1 st frame may be identified as a detection frame and the following 9 frames as tracking frames. In some embodiments, the buffer size may also be set to a size that stores a plurality of image frames in one cycle, for example, the buffer may be a size that stores 10 image frames.

In the implementation process of the embodiment of the present disclosure, two processing flows may be included, that is, a processing flow of image receiving and caching and a target detection and tracking flow, and the two processing flows may be executed in parallel by different processing threads respectively. In the processing flow of image receiving and buffering, each time an image frame is received, the corresponding thread may first determine whether the current buffer is full, that is, whether an image sequence of one cycle has been received, if so, discard the current frame, and if not, store the currently received image frame into the buffer queue.

In the target detection and tracking processing flow, the image frames can be extracted from the buffer queue according to the sequence of image acquisition time, namely the image frames are extracted from the buffer queue according to the first-in first-out principle. When one image frame is extracted from the buffer queue, the corresponding thread can perform target detection and tracking processing on the image frame. In the processing procedure, it may be determined whether the image frame is a detection frame in one period, for example, a first frame of one period, and if the image frame is a detection frame, a preset target detection mode, for example, a pre-trained target detection model, may be used to detect a target from the detection frame. The target may be one or more of a human, a vehicle, an animal, a building, etc., and may be determined according to actual needs, and is not limited herein. For example, in a road traffic scenario, the target may be a person or a vehicle.

After an object is detected from a detection frame, the detected object may be tracked in a tracking frame following the detection frame. That is, for the image frames in the same period, the target is detected from the detection frame by using a target detection method, and then the target is tracked from the tracking frame in the period by using a target tracking method, for example, a pre-trained target tracking model.

The following illustrates the implementation process of the target detection and tracking method in this embodiment.

Fig. 2 is a schematic diagram illustrating an implementation flow of target detection and tracking processing according to an embodiment of the present disclosure. As shown in fig. 2, assuming that a buffer queue with a size of storing one cycle image frame is applied on the processing device, the processing device may start a main thread and a sub-thread, which are respectively used for implementing a processing flow of image receiving and buffering and a processing flow of target detection and tracking.

A main thread can be started in advance for the processing flow of image receiving and caching, after the main thread receives the image frame, whether a cache queue is full is judged, and if the cache queue is full, the currently received image frame is discarded; and if the buffer queue is not full, storing the currently received image frame into the buffer queue, and sending a semaphore to the sub-thread, wherein the semaphore is used for informing the sub-thread that one image frame is currently stored into the buffer queue.

The method also can be used for starting a processing flow of a sub-thread for target detection and tracking in advance, wherein the sub-thread firstly judges whether the cache is empty or not after detecting that an image frame is received from the outside, and if the cache is empty, the sub-thread continues to wait; and if not, taking out one image frame from the buffer queue for processing. It should be noted that the principle of fetching the image frame from the buffer queue is a first-in first-out principle, that is, the image frame is fetched first before the image acquisition time is processed, and then the image frame is fetched later after the image acquisition time is processed.

In the process of processing the taken image frame, the sub-thread firstly judges whether the current image frame is a detection frame or not, and if the current image frame is the detection frame, the sub-thread detects a target from the detection frame by using a preset target detection mode so as to determine one or more targets appearing in the detection frame; if the frame is a tracking frame, tracking the target detected in the detection frame of the current period by using a target tracking mode to obtain the information of the target in the tracking frame. The target detection and tracking information obtained by detection and tracking in the detection frame and the tracking frame may include, but is not limited to, position information of the target in the image frame, image frame identification, and the like, for example, when the target is a human face, the target detection and tracking information may include a human face frame in the image frame identified as X.

In some embodiments, during the process of storing the image frames in the buffer queue, the main thread may identify the image frames as detection frames or tracking frames according to the received order, for example, a first received image frame is identified as a detection frame, a subsequent predetermined number of image frames may be identified as tracking frames, and a subsequent image frame is identified as a detection frame of a next period, so as to loop. In other embodiments, when the sub-thread extracts the image frames from the buffer queue, the image frames may be identified as detection frames or tracking frames according to the order of extraction, for example, the first extracted image frame is identified as a detection frame, a predetermined number of subsequent image frames are identified as tracking frames, and the next image frame is identified as a detection frame of the next period, so as to perform the loop.

In the process of performing target detection and tracking on the acquired image sequence, after the image sequence is divided into a plurality of periods according to the sequence of image acquisition time, the nth image frame (N is a positive integer, for example, may be a first frame) in each period, that is, the detection frame, is subjected to target detection, and the subsequent frame, that is, the tracking frame, in the same period is subjected to target tracking.

In an optional implementation manner of this embodiment, in the target detection and tracking process, a target that has been tracked in a previous cycle and has not been successfully matched in a detection frame of a current cycle is determined as a tracking-completed target, and tracking of an incomplete tracking target is continued in the tracking frame of the current cycle, where the incomplete tracking target includes a target that belongs to the same target as a target detected in the detection frame of the current cycle and has been tracked in the previous cycle, and a target that has been detected in the detection frame of the current cycle and has not been tracked in the previous cycle.

In this alternative implementation, for the detection frames in each period, one or more targets may be detected by using a predetermined target detection manner, such as a pre-trained target detection model, and the targets may be represented by using target information including position information in the image frames. For a first period in the image sequence, a target can be detected from a first frame, namely a detection frame, and the target detected from the detection frame is tracked in a subsequent tracking frame; for other periods, after a target is detected from the first frame, that is, the detection frame, of the other period, the target may be subjected to correlation matching with the target tracked in the previous period by using a preset matching strategy, and then it is determined whether the two matched targets belong to the same target, if the target tracked in the previous period can be matched in the detection frame of the current period and is determined to belong to the same target, the target may be considered to exist in the current period, the target may be determined to be an incomplete tracking target, and tracking may be continued in the current period. In some embodiments, a hungarian matching algorithm may be used to perform correlation matching on a target tracked in a previous cycle and a target detected in a detection frame of a current cycle, and meanwhile, a ReID (pedestrian re-identification) deep learning model may be used to determine whether a matching confidence coefficient is the same target in the hungarian matching algorithm. If the target tracked in the previous period is not matched in the detection frame in the current period, that is, the target or some targets tracked in the previous period is not detected in the detection frame in the current period, the target may be considered to have left the image acquisition field, and the tracking of the target may be ended, so that the target may be determined as a target whose tracking has been completed, and the target whose tracking has been completed may not be tracked in the tracking frame in the current period. It should be noted that the target detection and tracking information of the completed tracking target may be obtained according to the tracking result in the previous cycle, and the target detection and tracking information of the completed tracking target may be output.

It should be noted that, besides the completed tracking target and the incomplete tracking target, there is a type of target, that is, a target that newly appears in the current cycle. The target newly appearing in the current cycle is a target detected in the detection frame of the current cycle and not tracked in the previous cycle. The newly appeared target in the current period can also be regarded as an incomplete tracking target, and tracking can be performed in the tracking frame of the current period.

In an optional implementation manner of this embodiment, step S103, that is, the step of performing target detection and tracking processing on the image sequence according to the order of image acquisition time to obtain target detection and tracking information, further includes the following steps:

In this alternative implementation, the image sequence may include a plurality of consecutive image frames that have been received and stored in a buffer queue. In the process of target detection and tracking, the unprocessed image frames in the image sequence are subjected to target detection and tracking according to the sequence of image acquisition. Therefore, the image frame with the earliest image acquisition time can be obtained from the unprocessed image frames for target detection and tracking processing. In the processing process, whether the image frame is a detection frame or not is determined, and if the image frame is the detection frame, target detection processing can be performed, namely, a preset target detection mode is used for detecting a target from the image frame. If the tracking frame is the tracking frame, the target detected in the detection frame of the period in which the tracking frame is located can be tracked in a target tracking mode. In the practical application process, after the image frame is judged to be the detection frame, the image frame can be sent to a pre-trained target detection model for target detection, and after the image frame is judged to be the tracking frame, the image frame can be sent to the pre-trained target tracking model for target tracking. After the target detection model detects the target, the information of the detected target can be sent to the target tracking model, so that the target tracking model can perform target tracking on a subsequent tracking frame according to the newly received information of the target.

In an optional implementation manner of this embodiment, when the image frame is a detection frame, the step of detecting a target from the detection frame further includes the following steps:

detecting a first target from the detection frame;

matching the first target with a second target tracked in the previous period;

In this optional implementation manner, after one or more first targets are detected from a detection frame in a target detection manner, when a period in which the detection frame is located is not a first period, correlation matching may be performed between the first target detected in the detection frame and a second target tracked in a previous period. The matching process may be understood as a process of determining whether the first target detected in the detection frame of the current period and the second target tracked in the previous period are the same target. In some embodiments, when an object is artificially detected, correlation matching can be performed by using a hungarian matching algorithm based on an IOU, and meanwhile, a ReID (pedestrian re-identification) deep learning model can be used for judging matching confidence in the hungarian matching algorithm. The method reduces the problem of tracking breakage by applying the pedestrian re-identification algorithm to the sequence tracking problem.

As described above, if the second target tracked in the previous cycle can be matched in the detection frame of the current cycle, the second target can be considered to be still present in the current cycle, the second target can be determined as an incomplete tracking target, and tracking in the current cycle can be continued.

If the second target tracked in the previous period is not matched in the detection frame in the current period, that is, the first target matched with the second target does not exist in the detection frame in the current period, the second target may be considered to have left the image acquisition site, and the tracking of the second target may be finished, so that the second target may be determined as a tracking-finished target, and the tracking-finished target may not be tracked in the tracking frame in the current period. It should be noted that the target detection and tracking information of the completed tracking target may be obtained according to the tracking result in the previous cycle, and the target detection and tracking information of the completed tracking target may be output.

In this optional implementation, when the currently extracted image frame is a detection frame, a target detection frame in the image frame may be detected by using a target detection model. The object detection model may output an object detection box in the image frame, and the image within the object detection box may be considered to be the image area where the object is located. In order to enable the target detection and tracking to be more efficient and the tracking result to be more accurate, the pre-trained target recognition model can be used for recognizing the target type of the image in the detected target detection frame. And then screening a target to be tracked finally according to the target category of the image in the target detection frame, wherein the target category can be divided into a plurality of categories such as complete targets, incomplete targets, non-targets and the like based on the actual application requirements, and in some embodiments, the target detection frame corresponding to the category with complete targets can be screened out to serve as a target to be tracked continuously subsequently. And the screened target detection frame is used as a target detection frame corresponding to the final target and is provided for the target tracking model, so that the target tracking model tracks the target from a tracking frame subsequent to the detection frame. In this way, the false detection rate of the target can be reduced.

In an optional implementation manner of this embodiment, the step of screening out the target from the target detection box according to the target category further includes the following steps:

In this optional implementation manner, in order to further improve the efficiency and accuracy of target detection and tracking, the candidate detection frames screened out for the target category may be further screened. In this embodiment, after the target type in each target detection frame is identified by the target identification model, the target detection frame corresponding to the target type with a complete target may be determined as a candidate detection frame, and then the target to be tracked finally is screened from the candidate detection frame.

In some embodiments, the condition of the screening may be determined by using the confidence of the candidate detection frames output by the target detection model, the candidate detection frames with the confidence greater than the second confidence threshold may be determined as the target detection frames corresponding to the target, and the second confidence threshold is set to be greater than a first confidence threshold, which is used by the target detection model to determine whether a certain block region in the image frame is the confidence threshold of the target detection frame, and it is understood that the confidence of the target detection frames output by the target detection model is greater than the first confidence threshold. In the embodiment of the disclosure, the target detection model initially utilizes a smaller first confidence threshold to screen out the target detection frame, and then a larger second confidence threshold is used to filter the candidate detection frame, so that on one hand, the missing rate of the target detection can be reduced through the first confidence threshold, and on the other hand, the accuracy of the target detection can be improved through the second confidence threshold.

The following examples are given.

Fig. 3 shows a schematic implementation flow diagram of a target detection and tracking process according to an embodiment of the present disclosure. As shown in fig. 3, the sub-thread performing the target detection and tracking process may extract the image frames in the order of buffering from the buffering queue, and if the extracted detection frame is, input the detection frame to the target detection model to detect the target in the image frame. In some embodiments, the first confidence threshold for determining whether a target is in the target detection model may be set to f1(0 ≦ f1 ≦ 1), and if d target detection boxes for d targets are obtained, d being an integer greater than or equal to 0, the d target detection boxes are recorded as R1. Then, the corresponding region of each target detection frame in the image frame is subjected to matting, and the corresponding matting region is input into a target identification model to identify the target category. Taking a human body as an example, if the output of the object recognition type includes the following 8 categories:

a. a human body is intact;

b. a relatively complete human body;

c. completely riding a human body;

d. incomplete cycling human body;

e. upper body human body;

f. a lower body;

g. a human stump;

h. is not a human body.

The object class of each object detection box in the image frame can be obtained by the object recognition model and recorded as R2.

Selecting a target with complete targets in a target detection frame from R2, taking the category in 8 above as an example, selecting target detection frames with target categories a, b, c, d, e from R2, and further screening the selected target detection frames according to the confidence level output by the target detection model, where the screening principle is that the confidence level is greater than a second confidence level threshold f2(f2> f1), so as to obtain the screened target detection frames, and the screened target detection frames meet the condition of a new target, so that a new target can be created for the screened target detection frames, and a target tracker is initialized, and a target identifier is assigned to each new target, and the target tracker can select any known target tracking algorithm, then continue to extract image frames from the cache queue, and if a tracking frame is extracted, continuously tracking, recording the image frame with the most complete target and the largest area of the target in the image frame in the period as the optimal frame of the target, recording the target identification of the optimal frame until the tracking frame is the last frame in the period, and recording the target information of each target tracked in the last frame. The target tracking algorithm may utilize, for example, the KCF algorithm of the prior art.

In the above processing procedure, the target detection frame which does not satisfy the new target condition can be directly deleted. And for the target detection frame meeting the newly-built target condition, if the target detection frame is the first period of target detection and tracking processing, directly putting the related information of the newly-built target into a tracking queue, and performing target tracking by a target tracking algorithm; and when the current detection frame is not the first period, matching the target detected in the detection frame of the current period with the tracking target of the previous period, if the target is matched, continuing tracking by using a target tracking algorithm, and if the target is not matched, judging whether the target tracked in the previous period leaves the image acquisition field, if the target tracked in the previous period is not detected in the current detection frame, considering that the target leaves, otherwise, considering that the target does not leave.

If the target leaves, recording the optimal target detection and tracking information corresponding to the target leaving for extracting the target characteristics, and storing the target characteristics into a characteristic library. Taking the target as an example, the optimal target detection and tracking information may be used to extract the ReID features of the person and save them to a feature library.

If the target detection frame does not leave the target detection frame, extracting target features from the target detection frame detected in the detection frame of the current period, comparing the similarity of the target features with the target features in the feature library, and if the target features are similar, determining that the target detection frame in the detection frame of the current period corresponds to a tracking target corresponding to the similar target features in the previous period, determining the tracking target as an incomplete tracking target, putting the tracking target into a tracking queue for continuous tracking, wherein the target identification is still the target identification in the previous period; and if the feature library does not have similar target features, the target detection frame in the detection frame of the current period can be considered to correspond to a new target, and the new target is put into the tracking queue to continue tracking.

In an optional implementation manner of this embodiment, the target is a human body or a human face.

In an application scenario of target detection and tracking for a human, a target in the embodiment of the present disclosure may be a human face or a human body. In a conventional target detection method or target tracking method, in a scene in which a target is artificially detected, the target is usually set as a whole human body, and the human body includes a human face. It should be noted that, in the human face detection and tracking process, the target only relates to the head region of the human, and in the human body detection and tracking process, the target relates to the whole human body, including the head and body regions. In the embodiment of the disclosure, target detection and tracking are performed on the human face and the human body respectively, and then final target detection and tracking information can be obtained by associating and binding the human face and the human body belonging to the same person.

In the optional implementation manner, in order to obtain more accurate target detection and tracking information in an application scenario of human detection and tracking, the human body and the human face are regarded as two tracking problems in the embodiment of the present disclosure, the human face and the human body in the image sequence are respectively detected and tracked by using the target detection and tracking method provided by the embodiment of the present disclosure, and then the human body detection and tracking information is associated with the human face detection and tracking information, so that associated information is established between the human face and the human body belonging to the same person, so that the two processes of human body detection and tracking and human face detection and tracking can be mutually assisted, and more effective detection and tracking information can be obtained while the tracked person is stably detected.

In the actual implementation process, after the sub-thread extracts the image frame from the buffer queue, if the image frame is a detection frame, the image frame can be respectively sent to a face detector and a human body detector, the face detector can select an existing face detection model for face detection, and the human body detector can select an existing human body detection model for human body detection. The detection result of the human face detector is sent into the human face tracker for human face tracking, and the detection result of the human body detector is sent into the human body tracker for human body tracking; if the tracking frame is taken out from the buffer queue, the tracking frame is respectively sent to a human face tracker for human face tracking and sent to a human body tracker for human body tracking.

For each image frame, after face detection and tracking information and human body detection and tracking information are obtained, association between the face and the human body can be performed, that is, the face and the human body belonging to the same person can be associated, and the face detection and tracking information and the human body detection and tracking information corresponding to the associated face and human body are associated and output when output. It should be noted that the face detection and tracking information corresponding to the image frame may include position information of a face in the image frame (which may be output in the form of a face frame), a target identifier corresponding to the face, an image frame identifier corresponding to the information, and the like, and the human body detection and tracking information may include position information of a human body in the image frame (which may be output in the form of a human body frame), a target identifier corresponding to the human body, an image frame identifier corresponding to the information, and the like.

In the practical application process, the face detection and tracking and the human body detection and tracking can be executed in parallel, and after the face detection and tracking information and the human body detection and tracking information of the same image frame are obtained, the face and the human body in the image frame can be associated. It should be noted that, for the detection frame, the face detection and tracking information and the human body detection and tracking information may be obtained from the target detection information output by the target detection algorithm, and for the tracking frame, the face detection and tracking information and the human body detection and tracking information may be obtained from the target detection and tracking information output by the target tracking algorithm.

It should be further noted that, in the finally output target detection result, optimal target detection and tracking information in one period may be included, and the optimal detection and tracking information may include optimal face detection and tracking information and optimal human body detection and tracking information (for example, a face frame or a human body frame with the most complete target and the largest area in a plurality of tracking frames), a target identifier, an identifier of an image frame where the optimal target detection and tracking information is located, and the like. In addition, after the human body and the human face are associated, human body detection and tracking information associated with the optimal human face or human face detection and tracking information associated with the optimal human body can be output. For example, the optimal detection and tracking information tracked to the target a in one cycle is in the tracking frame X, the optimal human body detection and tracking information is in the tracking frame Y, and the human body detection and tracking information of the target a is associated in the tracking frame X, the human face detection and tracking information of the target a is associated in the tracking frame Y, and the target detection and tracking information output for the cycle includes the optimal human face detection and tracking information and the associated human body detection and tracking information of the target a in the tracking frame X, the optimal human body detection and tracking information and the associated human face detection and tracking information in the tracking frame Y. Of course, it is understood that for a target for which no associated face or body is found, only the optimal face detection and tracking information or the optimal body detection and tracking information may be output.

In an optional implementation manner of this embodiment, the step of associating the face detection and tracking information and the human body detection and tracking information belonging to the same person further includes the following steps:

In the optional implementation manner, in order to establish an association relationship between the face detection and tracking information and the human body detection and tracking information corresponding to the same person, an optimal matching relationship between the face detection and tracking information and the human body detection and tracking information can be found by using the hungarian algorithm in the embodiment of the present disclosure. In the Hungarian algorithm, an initial incidence relation between two groups of information to be matched is determined firstly, and then the initial incidence relation is optimized by using the thought of the Hungarian algorithm, so that an optimal matching relation is obtained finally.

The initial association relationship may be determined by using the area size of the overlapping region between the human face frame in the human face detection and tracking information and the human body frame in the human body detection and tracking information. In some embodiments, the initial association relationship may be determined using the IOU size between the face and the body. The calculation manner of the IOU may be a ratio of an overlapping area between the face frame and the body frame in the image frame to a sum of areas of the face frame and the body frame. The larger the IOU value is, the more relevant the face detection and tracking information and the human body detection and tracking information are, that is, the more likely the face in the face detection and tracking information and the human body in the human body detection and tracking information belong to the same person. A threshold may be preset, and when the IOU value is greater than the threshold, it is determined that the face detection and tracking information and the human body detection and tracking information have an initial association relationship, otherwise, the face detection and tracking information and the human body detection and tracking information do not have an initial association relationship. The above process may be understood as establishing an initial association between face detection and tracking information and body detection and tracking information that may belong to the same person. It should be noted that there may be a one-to-many or many-to-many association relationship in the initial association relationship.

Then, the Hungarian algorithm can be utilized to search for the optimal matching relationship from the face detection and tracking information and the human body detection and tracking information with the initial association relationship, namely the matching relationship between the face detection and tracking information and the human body detection and tracking information which are most likely to be the same person, and the matching relationship with one-to-one can be obtained through the method.

In order to further determine the validity of the matching relationship obtained by the hungarian algorithm, the embodiment of the present disclosure also verifies the matching relationship by means of similarity calculation. It should be noted that the human body detection and tracking information includes position information of the human body in the image frame, which can be understood as a human body frame in the image frame, and the human body frame may include image information of the entire human body. The face detection and tracking information includes position information of a face in the image frame, which can be understood as a face frame in the image frame, and the face frame includes only image information of a head part. In order to determine whether the face included in the face detection and tracking information having a matching relationship and the human body included in the human body detection and tracking information are the same person, a similarity between an image region of the face included in the face detection and tracking information in the image frame and an image region of the face within the human body included in the human body detection and tracking information in the image frame may be calculated. In some embodiments, for the human detection and tracking information, a part of image regions of the human face in the upper region of the human body frame may be selected, and the similarity between the image regions and the image regions of the human face included in the corresponding human face detection and tracking information in the image frame may be calculated. Under the condition that the similarity is greater than or equal to the similarity threshold, the human body detection and tracking information and the human face detection and tracking information can be considered to belong to the same person, and a final target association relationship can be established. And for the condition that the similarity is smaller than the similarity threshold, the human body detection and tracking information and the human face detection and tracking information do not belong to the same person, and no target association relationship exists.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.

Fig. 4 shows a block diagram of a target detecting and tracking apparatus according to an embodiment of the present disclosure, and as shown in fig. 4, the apparatus may be implemented as part of or all of an electronic device by software, hardware or a combination of both. The target detecting and tracking device includes:

a receiving module 401 configured to receive an image sequence comprising a plurality of image frames which are consecutive in image acquisition time;

a dividing module 402 configured to divide the image sequence into periods, identify an nth image frame of each period as a detection frame, and identify a predetermined number of image frames consecutive after the detection frame as tracking frames belonging to the same period as the detection frame; wherein N is a positive integer;

the processing module 403 is configured to perform target detection and tracking processing on the image sequence according to the sequence of image acquisition time to obtain target detection and tracking information; in the process of target detection and tracking, a target is detected from the detection frame, and the detected target is tracked from a preset number of tracking frames belonging to the same period with the detection frame.

The target detection and tracking device in this embodiment corresponds to the target detection and tracking device method described above, and specific details may be referred to the above description of the target detection and tracking device, which is not described herein again.

Fig. 5 is a schematic structural diagram of an electronic device suitable for implementing the target detection and tracking method according to the embodiment of the present disclosure.

As shown in fig. 5, the electronic device 500 includes a processing unit 501, which may be implemented as a CPU, GPU, FPGA, NPU, or the like processing unit. The processing unit 501 may perform various processes in the embodiments of any one of the methods described above of the present disclosure according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing unit 501, the ROM502, and the RAM503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.

In particular, according to embodiments of the present disclosure, any of the methods described above with reference to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing any of the methods of the embodiments of the present disclosure. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A target detection and tracking method, comprising:

2. The method according to claim 1, wherein in the target detection and tracking process, a target that has been tracked in a previous cycle and has not been successfully matched in a detection frame of a current cycle is determined as a tracking target that has been completed, and tracking of an incomplete tracking target including a target that belongs to the same target as the target detected in the detection frame of the current cycle and has been tracked in the previous cycle and a target detected in the detection frame of the current cycle and has not been tracked in the previous cycle is continued in the tracking frame of the current cycle.

3. The method of claim 1, wherein performing target detection and tracking processing on the image sequence according to the sequence of image acquisition time to obtain target detection and tracking information comprises:

4. The method of claim 3, wherein said detecting a target from a detection frame when the image frame is the detection frame comprises:

detecting a first target from the detection frame;

matching the first target with a second target tracked in the previous period;

5. The method of claim 3, wherein said detecting a target from a detection frame when the image frame is the detection frame comprises:

screening the target from the target detection box according to the target category;

preferably, the screening the target from the target detection box according to the target category includes:

6. The method of any one of claims 1-5, wherein the object is a human body or a human face.

7. The method according to any one of claims 1 to 5, wherein the performing target detection and tracking processing on the image sequence according to the sequence of image acquisition time to obtain target detection and tracking information comprises:

outputting target detection and tracking information in each image frame according to the correlation result;

preferably, associating the face detection and tracking information and the body detection and tracking information belonging to the same person includes:

8. An object detection and tracking apparatus, comprising:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory, wherein the processor executes the computer program to implement the method of any of claims 1-7.

10. A computer readable storage medium having computer instructions stored thereon or a computer program product comprising computer instructions; wherein the computer instructions, when executed by a processor, implement the method of any of claims 1-7.