CN116402844A

CN116402844A - Pedestrian tracking method and device

Info

Publication number: CN116402844A
Application number: CN202310394649.5A
Authority: CN
Inventors: 陈国荣
Original assignee: Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp
Current assignee: Shanghai Micro Motor Research Institute 21st Research Institute Of China Electronics Technology Corp
Priority date: 2023-04-13
Filing date: 2023-04-13
Publication date: 2023-07-07

Abstract

The application provides a pedestrian tracking method and device, wherein the method comprises the following steps: determining a target frame of a target pedestrian according to the pedestrian contained in the first frame image of the video; processing the first frame image according to a multi-target tracking algorithm to obtain tracking frames respectively corresponding to pedestrians in the first frame image; determining a target tracking frame according to the intersection ratio result between each tracking frame in the first frame image and the target frame; respectively extracting features of images in tracking frames corresponding to the target tracking frames in each frame of image of the video to obtain target features; fusing the target features corresponding to all the images of each frame to obtain fused target features; and determining the position of the target pedestrian in the frame image of the lost target pedestrian according to the fusion target characteristics. According to the method and the device, the target pedestrians are tracked based on the fused target characteristics after fusion, so that the universality and the accuracy of the target pedestrians can be improved.

Description

Pedestrian tracking method and device

Technical Field

The invention belongs to the technical field of vision tracking, and particularly relates to a pedestrian tracking method and device.

Background

The vision tracking technology is an important subject in the field of computer vision, has important research significance, and has wide application prospects in many aspects such as military guidance, video monitoring, robot vision navigation, man-machine interaction, medical diagnosis and the like. With continuous intensive research by researchers, visual target tracking has progressed in breakthrough in recent decades, so that a visual tracking algorithm is not only limited to a traditional machine learning method, but also combines methods such as artificial intelligence hot tide, deep learning, a related filter and the like in recent years, and a robust, accurate and stable result is obtained.

The existing visual single-target tracking task refers to predicting the state of a target in a subsequent frame based on the target position of the initial frame (first frame) given by the video sequence. The single-target tracking target is to realize stable tracking irrelevant to category, but the single-target tracking model trained by the existing public data quantity has the problems of poor generalization and high false recognition rate, so that the single-target tracking model cannot be used in a real scene, and therefore, the pedestrian tracking method based on the single-target algorithm and capable of prompting the tracking effect is to be researched urgently.

Disclosure of Invention

In view of the above, the present invention provides a method and apparatus for tracking pedestrians, and is mainly aimed at improving the tracking effect of the target pedestrians.

According to a first aspect of the present invention, there is provided a pedestrian tracking method comprising:

determining a target frame of a target pedestrian according to the pedestrian contained in the first frame image of the video;

processing the first frame image according to a multi-target tracking algorithm to obtain tracking frames respectively corresponding to pedestrians in the first frame image;

determining a target tracking frame according to the intersection ratio result between each tracking frame in the first frame image and the target frame;

respectively extracting features of images in tracking frames corresponding to the target tracking frames in each frame of image of the video to obtain target features;

fusing all the target features corresponding to each frame of image to obtain fused target features;

and determining the position of the target pedestrian in the frame image of the lost target pedestrian according to the fusion target characteristics.

According to a second aspect of the present invention, there is provided a pedestrian tracking apparatus comprising:

the target frame acquisition module is used for determining a target frame of a target pedestrian according to the pedestrian contained in the first frame image of the video;

The tracking frame acquisition module is used for processing the first frame image according to a multi-target tracking algorithm to obtain tracking frames respectively corresponding to pedestrians in the first frame image;

the target tracking frame determining module is used for determining a target tracking frame according to the intersection ratio result between each tracking frame in the first frame image and the target frame;

the target feature acquisition module is used for respectively extracting features of images in the tracking frames corresponding to the target tracking frames in each frame of image of the video to obtain target features;

the fusion module is used for fusing all the target features corresponding to each frame of image to obtain fusion target features;

and the tracking module is used for determining the position of the target pedestrian in the frame image of the lost target pedestrian according to the fusion target characteristics.

According to a third aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the method described above.

According to a fourth aspect of the present invention there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterised in that the computer program when executed by the processor implements the steps of the method described above.

By means of the technical scheme, the technical scheme provided by the embodiment of the invention has at least the following advantages:

according to the pedestrian tracking method and device, the target frame of the target pedestrian is determined according to the pedestrian contained in the first frame image of the video; processing the first frame image according to a multi-target tracking algorithm to obtain tracking frames corresponding to pedestrians in the first frame image respectively; determining a target tracking frame according to the intersection ratio result between each tracking frame in the first frame image and the target frame; respectively extracting features of images in tracking frames corresponding to the target tracking frames in each frame of image of the video to obtain target features; fusing the target features corresponding to all the images of each frame to obtain fused target features; and finally, determining the position of the target pedestrian in the frame image of the lost target pedestrian according to the fusion target characteristics. The method and the device not only can improve the universality and accuracy of target pedestrian tracking, but also can track target pedestrians in different scenes, and greatly improve the effect of target pedestrian tracking in actual use.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

fig. 1 shows an application scenario schematic diagram of a pedestrian tracking method provided by an embodiment of the present invention;

FIG. 2 shows a flowchart of a pedestrian tracking method provided by an embodiment of the present invention;

FIG. 3 illustrates a flowchart of another pedestrian tracking method provided by an embodiment of the present invention;

FIG. 4 is a flow chart illustrating yet another pedestrian tracking method provided by an embodiment of the present invention;

FIG. 5 illustrates a flow chart of yet another pedestrian tracking method provided by an embodiment of the invention;

FIG. 6 is a schematic block diagram of a pedestrian tracking apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of an electronic device for implementing a method of an embodiment of the invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented, in accordance with an embodiment of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In an embodiment of the present disclosure, the server 120 may run one or more services or software applications that enable the pedestrian tracking method to be performed.

In some embodiments, server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In some embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof that are executable by one or more processors. A user

operating client devices

101, 102, 103, 104, 105, and/or 106 may in turn utilize one or more client applications to interact with server 120 to utilize the services provided by these components. It should be appreciated that a variety of different system configurations are possible, which may differ from system 100. Accordingly, FIG. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may receive the first classification result using

client devices

101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that the present disclosure may support any number of client devices.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and the like. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFTWindows, APPLEiOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., google chromeos); or include various mobile operating systems, such as MICROSOFTWindowsMobileOS, iOS, windowsPhone, android. Portable handheld devices may include cellular telephones, smart phones, tablet computers, personal Digital Assistants (PDAs), and the like. Wearable devices may include head mounted displays (such as smart glasses) and other devices. The gaming system may include various handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a number of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. For example only, the one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture that involves virtualization (e.g., one or more flexible pools of logical storage devices that may be virtualized to maintain virtual storage devices of the server). In various embodiments, server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above as well as any commercially available server operating systems. Server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, etc.

In some implementations, server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some implementations, the server 120 may be a server of a distributed system or a server that incorporates a blockchain. The server 120 may also be a cloud server, or an intelligent cloud computing server or intelligent cloud host with artificial intelligence technology. The cloud server is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPS, virtualPrivateServer) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of databases 130 may be used to store information such as audio files and object files. The data store 130 may reside in a variety of locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In some embodiments, the data store used by server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve the databases and data from the databases in response to the commands.

In some embodiments, one or more of databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key value stores, object stores, or conventional stores supported by the file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Referring to fig. 2, a method 200 of pedestrian tracking in accordance with some embodiments of the present disclosure includes the steps of:

201. and determining a target frame of the target pedestrian according to the pedestrian contained in the first frame image of the video.

Here, the first frame image including the video of the target pedestrian is analyzed based on the single target tracking algorithm, and the target frame corresponding to the target pedestrian is determined.

In some embodiments, determining a target frame of a target pedestrian from a pedestrian contained in a first frame image of a video, step 201 includes the steps of: acquiring a video containing a target pedestrian; extracting a first frame image in a video, and obtaining the position information of a target pedestrian in the first frame image; and determining a target frame comprising the target pedestrian according to the position information of the target pedestrian in the first frame image.

It should be noted that, in the present application, the open source packet opencv is adopted as a cv2.Select roi function, a target pedestrian is framed from the first frame image, and the obtained return value is [ x ] ₀ ,y ₀ ,w,h]The return value is the position information of the matching of the target frame containing the target pedestrian in the first frame image, wherein x ₀ X-axis coordinate, y representing upper left corner of target frame ₀ Representing the y-axis coordinate of the upper left corner of the target frame, w represents the width value of the target frame, and h represents the height value of the target frame.

202. And processing the first frame image according to the multi-target tracking algorithm to obtain tracking frames respectively corresponding to the pedestrians in the first frame image.

It should be noted that, the tracking ID and the position of the pedestrian contained in each frame of image may be obtained by processing the video according to the multi-target tracking algorithm. The method uses a byte algorithm, and the principle is that a Kalman filter is used for carrying out cross-correlation (IOU) matching on a target and the existing tracking, the ID of the tracking is used after the matching is successful, and a new tracking ID is started for the target frame after the matching is failed. Thus, in this application, the bytearck algorithm is run from the first frame image, and the algorithm returns the tracking IDs corresponding to all tracking frames existing in each frame image after the first frame of the video, and the positions of the targets corresponding to the tracking IDs in each frame image after the first frame.

203. And determining the target tracking frame according to the intersection ratio result between each tracking frame in the first frame image and the target frame.

And if the matching is successful, determining the tracking frame successfully matched with the target frame as the target tracking frame. The method comprises the steps of extracting a foreground from a first frame image based on a multi-target tracking algorithm, and effectively locating tracking frames corresponding to pedestrians in the first frame image respectively so as to reduce false recognition conditions of the pedestrians.

Further, determining the target tracking frame according to the result of the cross-over ratio between each tracking frame in the first frame image and the target frame, may further include: and calculating the intersection ratio results of each tracking frame in the first frame image and the target frame respectively, and taking the tracking frame with the intersection ratio greater than a preset threshold value as the target tracking frame.

And once the intersection ratio (IOU) between the tracking frames and the target frames is larger than a preset threshold, the preset threshold is preferably 0.7, the tracking frame is used as the target tracking frame, the tracking ID corresponding to the tracking frame is used as the ID matched with the target pedestrian, and the tracking frame is the position of the target pedestrian in the first frame image.

204. And respectively extracting the characteristics of images in the tracking frames corresponding to the target tracking frames in each frame of image of the video to obtain target characteristics.

Here, the target features include appearance information features and face information features, and by acquiring the appearance information features and the face information features in the target tracking frame, the target pedestrians can be tracked based on the appearance information features and the face information features by the target features obtained by fusion in the subsequent steps.

In some embodiments, the step 204 may include performing feature extraction on images in a tracking frame corresponding to the target tracking frame in each frame of image of the video to obtain the target feature, where the step may include: extracting features of images in the tracking frames corresponding to the target tracking frames in the first frame images to obtain target features corresponding to the first frame images; taking a tracking ID corresponding to the target tracking frame in the first frame image as a target tracking ID; taking a tracking frame corresponding to the target tracking ID in each frame of image after the first frame as a target tracking frame corresponding to each frame of image after the first frame; and respectively extracting the characteristics of the images in the target tracking frames in each frame of images after the first frame to obtain the target characteristics respectively corresponding to each frame of images after the first frame.

Determining the position of a tracking frame corresponding to the target tracking frame in the first frame image according to the tracking ID of the target tracking frame, extracting the image corresponding to the tracking frame from the first frame image, and then extracting the characteristics to obtain the target characteristics corresponding to the first frame image; and determining a tracking frame corresponding to the target tracking ID of the target tracking frame in each frame of images after the first frame according to the tracking ID corresponding to the target tracking frame in the first frame of images, extracting images in the target tracking frame in each frame of images after the first frame from the corresponding frame of images respectively, and extracting features of each extracted image to obtain the target features respectively corresponding to each frame of images after the first frame.

In still other embodiments, the feature extraction is performed on the images in the tracking frames corresponding to the target tracking frame in each frame of image of the video, so as to obtain the target feature, and referring to fig. 3, step 204 includes the following steps:

2041. if the kth frame image does not have a tracking frame corresponding to the target tracking frame, taking the fusion target feature corresponding to the kth-1 frame image as the fusion target feature corresponding to the kth frame image; k is more than or equal to 2, and k is less than the number of image frames contained in the video.

Here, for the kth frame image, by running the bytetrack algorithm, all tracking IDs existing in the kth frame image are obtained, and if the tracking ID of the target tracking frame is not in the kth frame image, it is indicated that the target pedestrian has lost tracking, and at this time, the tracking frame needs to be re-matched. Because the target pedestrian is lost in the kth frame image, the fusion target feature corresponding to the k-1 frame image is reserved to the kth frame image, so that the target features corresponding to different frame images can be fused later.

2042. And sequentially extracting features of images in the prediction tracking frames corresponding to the pedestrians in the continuous frame images after the kth frame to obtain prediction target features corresponding to the prediction tracking frames.

Here, the number of image frames corresponding to the successive frame images is smaller than the number of frame images corresponding to the current frame image.

And (3) by running a bytetrack algorithm, obtaining prediction tracking frames corresponding to pedestrians in continuous frame images after a kth frame and tracking IDs corresponding to the prediction tracking frames, and sequentially extracting features of images in the prediction tracking frames in the continuous frame images after the kth frame according to the tracking IDs to obtain prediction target features corresponding to the prediction tracking frames.

For example, by running the byte algorithm, 5 prediction tracking frames existing in the k+1st frame image are obtained, tracking IDs of the 5 prediction tracking frames are 5, 6, 7, 8, and 9, respectively, and then the continuous frame image after the k+1st frame may also exist these 5 prediction tracking frames, and tracking IDs corresponding to the 5 prediction tracking frames respectively: 5. 6, 7, 8, 9. And then, sequentially extracting the features of the images in the prediction tracking frames with tracking IDs of 5, 6, 7, 8 and 9 respectively corresponding to the continuous frame images after the kth frame, so as to obtain the prediction target features which correspond to the continuous frame images after the kth frame respectively and correspond to the 5 prediction tracking frames respectively.

Further, feature extraction is performed on images in the prediction tracking frames corresponding to the pedestrians in the continuous frame images after the kth frame, so as to obtain prediction target features corresponding to the prediction tracking frames, see fig. 4, step 2042 includes the following steps:

20421. and taking the (k+1) th frame image as a predicted frame image, and respectively extracting features of images in predicted tracking frames corresponding to pedestrians in the predicted frame image to obtain target features to be compared corresponding to the predicted tracking frames.

Specifically, by running a bytetrack algorithm, a prediction tracking frame corresponding to each pedestrian and a tracking ID corresponding to each prediction tracking frame are obtained in the k+1 frame image, and according to the tracking IDs, the images in each prediction tracking frame in the k+1 frame image are sequentially subjected to feature extraction to obtain target features to be compared corresponding to each prediction tracking frame and corresponding to the k+1 frame image, namely target features to be compared corresponding to each prediction tracking frame are obtained.

20422. And judging that a predicted tracking frame matched with the target pedestrian exists in the predicted frame image according to the comparison result of cosine similarity between each target feature to be compared and the fusion target feature corresponding to the k-1 frame image.

For example, if the comparison result of the cosine similarity between the target feature a to be compared in the k+1th frame image and the fusion target feature corresponding to the k-1 th frame image is greater than a certain threshold, in this application, the threshold is preferably 0.6, then it is determined that the prediction tracking frame a corresponding to the target feature a to be compared in the k+1th frame image is matched with the target pedestrian.

20423. If not, taking the adjacent frame image after the predicted frame image as a new predicted frame image, and jumping to the step of respectively extracting the characteristics of the images in the predicted tracking frames corresponding to the pedestrians in the predicted frame image until the predicted frame image is judged to have the predicted tracking frame matched with the target pedestrian.

For example: in the k+1st frame image as a predicted frame image, if no predicted tracking frame matched with the target pedestrian exists, taking the adjacent frame image after the predicted frame image, namely the k+2st frame image, as a new predicted frame image, and jumping the step to step 20421' to respectively extract the characteristics of the images in the predicted tracking frames corresponding to the pedestrians in the predicted frame image, and judging whether the predicted tracking frame matched with the target pedestrian exists in the k+2st frame image or not through step 20422; if the k+2 frame image still does not have the predictive tracking frame matched with the target pedestrian, taking the adjacent frame image after the k+2 frame image as a new predictive frame image, and repeating the steps 20421 to 20422 until a certain frame image after the k+2 frame image is judged to have the predictive tracking frame matched with the target pedestrian.

When there is no prediction tracking frame matching with the target pedestrian in the predicted frame image, the fusion target feature corresponding to the kth-1 frame image is used as the fusion target feature corresponding to the predicted frame image, for example: and if the k+1st frame image does not have a predictive tracking frame matched with the target pedestrian, taking the fusion target feature corresponding to the k-1 st frame image as the fusion target feature corresponding to the k+1st frame image.

20424. And taking the target feature to be compared corresponding to the predicted tracking frame matched with the target pedestrian as the predicted target feature corresponding to the predicted frame image.

For example: in the k+1st frame image, if the tracking ID of the predicted tracking frame matched with the target pedestrian is 5, taking the target feature to be compared corresponding to the predicted tracking frame with the tracking ID of 5 as the predicted target feature corresponding to the predicted frame image.

20425. And sequentially extracting the features of the images in the predicted tracking frames corresponding to the candidate target tracking frames in the continuous frame images after the predicted frame images to obtain the predicted target features corresponding to the continuous frame images after the predicted frame images.

Here, the feature extraction is sequentially performed on the images in the prediction tracking frames corresponding to the candidate target tracking frames in the continuous frame images after the predicted frame image, so as to obtain the predicted target features corresponding to the continuous frame images after the predicted frame image, for example, when the (k+1) th frame image is the predicted frame image, the step 20425 may include the following steps: taking the tracking ID corresponding to the alternative target tracking frame in the k+1st frame image as an alternative target tracking ID; taking a predicted tracking frame corresponding to the candidate target tracking ID in the continuous frame images after the k+1 frame as the candidate target tracking respectively corresponding to the continuous frame images after the k+1 frame; and sequentially extracting features of images corresponding to each alternative target tracking in the continuous frame images after the k+1 frame to obtain predicted target features corresponding to the continuous frame images after the k+1 frame, namely obtaining the predicted target features corresponding to the continuous frame images after the predicted frame images.

For example: in the k+1st frame image, if the tracking ID of the alternative target tracking frame is 5, the tracking ID is 5 as the alternative target tracking ID, and at this time, the alternative target tracking ID is 5; taking a predicted target frame with the tracking ID of 5 of the continuous frame image after the k+1 frame as a target frame to be subjected to feature extraction, namely, realizing the following target tracking that a predicted tracking frame corresponding to the tracking ID of 5 in the continuous frame image after the k+1 frame is respectively corresponding to the continuous frame image after the k+1 frame, for example: and respectively taking images in each prediction tracking frame with the tracking ID of 5 in the k+2 frame image and the k+3 frame image … … current frame image as candidate target tracking, and then carrying out feature extraction on the images in each prediction tracking frame with the tracking ID of 5, so as to obtain each prediction target feature corresponding to each continuous frame image after the k+1 frame.

2043. And determining the matching times between each prediction tracking frame and the target pedestrian according to the comparison result of cosine similarity between each prediction target feature and the fusion target feature corresponding to the k-1 frame image.

Specifically, a fused target feature matched with the k-1 frame image is obtained, wherein the fused target feature comprises an appearance information feature P _k-1 And face information feature F _k-1 The method comprises the steps of carrying out a first treatment on the surface of the Then the k+2 frame image and the k+3 frame image are processedThe appearance information features in the target features corresponding to the current frame image of … … are respectively compared with the cosine similarity of the fused appearance information features matched with the k-1 frame image, and the cosine similarity is compared with P _k-1 The detection images corresponding to the appearance information feature features p with the cosine similarity larger than the preset threshold value are determined to be successfully matched, wherein the preset threshold value is preselected to be 0.6, and after the detection images are determined to be successfully matched, the number of times that the appearance corresponding to the detection images is matched is increased by 1. For example: in the k+1st frame image, if the comparison result of cosine similarity between the appearance information feature corresponding to the predicted tracking frame with the tracking ID of 5 and the fused appearance information feature corresponding to the k-1 st frame image is matching, the matching number of the predicted tracking frame with the tracking ID of 5 is recorded as 1 time; in the k+2 frame image, if the comparison result of cosine similarity between the appearance information feature corresponding to the predicted tracking frame with the tracking ID of 5 and the fused appearance information feature corresponding to the k-1 frame image is also matching, the matching number of the predicted tracking frame with the tracking ID of 5 is recorded as 2 times; in the k+3 frame image, if the comparison result of the cosine similarity between the appearance information feature corresponding to the predicted tracking frame with the tracking ID of 5 and the fusion appearance information feature corresponding to the k-1 frame image is also matching, the matching number of the predicted tracking frame with the tracking ID of 5 is recorded as 3 times.

The face information characteristic F in each target characteristic in the k+2 frame image and the k+3 frame image … … current frame image is respectively matched with the fused face information characteristic F of the k-1 frame image _k-1 Cosine similarity comparison is performed and the result is compared with F _k-1 The detected images corresponding to the face information feature features f with cosine similarity larger than a preset threshold are determined to be successfully matched, wherein the preset threshold is preselected to be 0.5, and after the detected images are determined to be successfully matched, the number of times of matching the faces corresponding to the detected images is increased by 1. For example: in the k+1th frame image, if the comparison result of cosine similarity between the face information feature corresponding to the predicted tracking frame with the tracking ID of 5 and the fused face information feature corresponding to the k-1 st frame image is matching, the matching number of the predicted tracking frame with the tracking ID of 5 is recorded as 1 time; in the k+2 frame image, if trackingThe comparison result of cosine similarity between the face information features corresponding to the predicted tracking frame with the ID of 5 and the fused face information features corresponding to the k-1 frame image is also matching, and the matching number of the predicted tracking frame with the tracking ID of 5 is recorded as 2 times; in the k+3 frame image, if the comparison result of cosine similarity between the face information feature corresponding to the predicted tracking frame with the tracking ID of 5 and the fused face information feature corresponding to the k-1 frame image is also matching, the matching number of the predicted tracking frame with the tracking ID of 5 is recorded as 3 times.

2044. Updating the predicted tracking frame with the matching times exceeding a preset threshold value as a target tracking frame, and extracting the characteristics of images in the predicted target frame corresponding to the updated target tracking frame in the continuous frame images after the kth frame.

For example: when the appearance corresponding to the predicted tracking frame with the tracking ID of 5 is matched for 20 times or the face matching time is matched for 5 times, the predicted tracking frame is considered as a target tracking frame, the tracking ID of the target tracking frame is updated, the tracking ID of the target tracking frame is recorded as the tracking ID with the ID of 5 as the target tracking frame, the position information of the predicted tracking frame is taken as the position information of a target pedestrian, and the updating of the target tracking frame in the continuous frame images after the kth frame is completed; and extracting the characteristics of images in a predicted target frame with the tracking ID of 5 in the continuous frame images after the kth frame.

In other embodiments, the target features include appearance information features and face information features; the feature extraction is performed on the images in the tracking frames corresponding to the target tracking frame in each frame of image of the video, so as to obtain the target feature, referring to fig. 5, step 204 includes the following steps:

2045. and extracting images in the tracking frames corresponding to the target tracking frames in each frame of image.

Specifically, the tracking ID of the target tracking frame in each frame of image and the corresponding position information thereof are obtained, and the image corresponding to the target tracking frame is segmented from the current frame of image according to the tracking ID and the position information of the target tracking frame, so that the extraction of the image positioned in the target tracking frame is completed.

2046. And obtaining appearance information features corresponding to the target pedestrians according to the extracted images in the tracking frames corresponding to the target tracking frames.

It should be noted that, in the present application, the extracted image in the target tracking frame is input into the re-ID model, where the re-ID model used in the present invention is a pplcnet model built in the open source frame paddleextraction, where the extracted image in the target tracking frame is input into the pplcnet model to obtain an appearance information feature belonging to the target pedestrian in 128 dimensions, and the appearance information feature is normalized and stored.

2047. And detecting the faces in each frame of image to obtain all face frames in each frame of image and the face information characteristics corresponding to the face frames.

It should be noted that, the application acquires face frames corresponding to all faces in each frame of image and face information features corresponding to the face frames by using an open source package.

2048. And detecting the human body in the target tracking frame to obtain a human body frame corresponding to the image in the target tracking frame.

The multi-target tracking algorithm is adopted to track target pedestrians in the images, so that the foreground and the background in the images can be accurately distinguished, and the human frames can be effectively positioned, so that various false identifications are greatly reduced.

2049. And taking the unique face frame in the human frame as the face frame of the target pedestrian, and taking the face information characteristic corresponding to the face frame as the face information characteristic.

If one and only one face frame is located in the human body frame corresponding to the target pedestrian, the unique face frame located in the human body frame is used as the face frame of the target pedestrian, and the face information features corresponding to the face frame are normalized to obtain the face information features corresponding to the target pedestrian.

205. And fusing the target features corresponding to each frame of image to obtain fused target features.

Here, the target features corresponding to all the frame images are fused according to the following calculation formula:

P ₁ ＝p ₁

P _k ＝(0.9P _k-1 +0.1p _k )/||0.9P _k-1 +0.1p _k || ₂ ,k＝2,3,4,...

F ₁ ＝f ₁

F _k ＝(0.9F _k-1 +0.1f _k )/||0.9F _k-1 +0.1f _k || ₂ ,k＝2,3,4,...

wherein P is ₁ And p ₁ Are all appearance information features corresponding to the first frame image, F ₁ And f ₁ Are all first frame images; corresponding face information features, P _k F, for the corresponding fused appearance information feature of the kth frame _k Is the face information characteristic, p, corresponding to the kth frame after fusion _k For the appearance information feature corresponding to the kth frame, f _k For the face information feature corresponding to the kth frame, P _k-1 For the fused appearance information feature corresponding to the k-1 frame, F _k-1 Is the corresponding fused face information characteristic of the k-1 frame, p _k-1 For the appearance information feature corresponding to the k-1 frame, f _k-1 Is the face information feature corresponding to the k-1 frame.

206. And determining the position of the target pedestrian in the frame image of the lost target pedestrian according to the fusion target characteristics.

By taking the fused target feature obtained after the target feature in each frame image obtained according to the multi-target tracking algorithm is fused as the target feature for tracking the pedestrian, the position of the target pedestrian can be accurately positioned from the pedestrians in the frame image of the lost target pedestrian, and further the new tracking ID of the target pedestrian in the frame image can be obtained.

According to the pedestrian tracking method, the target frame of the target pedestrian is determined according to the pedestrian contained in the first frame image of the video; processing the first frame image according to a multi-target tracking algorithm to obtain tracking frames corresponding to pedestrians in the first frame image respectively; determining a target tracking frame according to the intersection ratio result between each tracking frame in the first frame image and the target frame; respectively extracting features of images in tracking frames corresponding to the target tracking frames in each frame of image of the video to obtain target features; fusing the target features corresponding to all the images of each frame to obtain fused target features; and finally, determining the position of the target pedestrian in the frame image of the lost target pedestrian according to the fusion target characteristics. The method and the device not only can improve the universality and accuracy of target pedestrian tracking, but also can track target pedestrians in different scenes, and greatly improve the effect of target pedestrian tracking in actual use.

Further, as an implementation of the method shown in fig. 2, an embodiment of the present invention provides a pedestrian tracking apparatus, as shown in fig. 6, including:

the tracking frame acquisition module is used for processing the first frame image according to the multi-target tracking algorithm to obtain tracking frames respectively corresponding to pedestrians in the first frame image;

the fusion module is used for fusing the target characteristics corresponding to each frame of image to obtain fusion target characteristics;

Further, the target feature acquisition module includes:

the first target feature acquisition unit is used for extracting features of images in the tracking frames corresponding to the target tracking frames in the first frame images to obtain target features corresponding to the first frame images;

A target tracking ID determining unit, configured to use a tracking ID corresponding to the target tracking frame in the first frame image as a target tracking ID;

the target tracking frame determining unit is used for taking the tracking frame corresponding to the target tracking ID in each frame of image after the first frame as the target tracking frame respectively corresponding to each frame of image after the first frame;

the first feature extraction unit is used for extracting features of images in the target tracking frame in each frame of images after the first frame respectively to obtain target features corresponding to each frame of images after the first frame respectively.

Further, the target feature acquisition module includes:

the second target feature acquisition unit is used for taking the fusion target feature corresponding to the k-1 frame image as the fusion target feature corresponding to the k frame image if the tracking frame corresponding to the target tracking frame does not exist in the k frame image; k is more than or equal to 2, and k is less than the number of frames of images contained in the video;

the prediction target feature acquisition unit is used for sequentially extracting features of images in the prediction tracking frames corresponding to the pedestrians in the continuous frame images after the kth frame to obtain prediction target features corresponding to the prediction tracking frames; the frame numbers of the images corresponding to the continuous frame images are smaller than the frame numbers corresponding to the current frame images;

The matching frequency determining unit is used for determining the matching frequency between each prediction tracking frame and the target pedestrian according to the comparison result of cosine similarity between each prediction target feature and the fusion target feature corresponding to the k-1 frame image;

and the second feature extraction unit is used for updating the predicted tracking frame with the matching times exceeding a preset threshold value into a target tracking frame and extracting features of images in the predicted target frame corresponding to the updated target tracking frame in the continuous frame images after the kth frame.

Further, the prediction target feature acquisition unit includes:

the target feature determination subunit to be compared is used for taking the (k+1) th frame image as a predicted frame image, and respectively extracting features of images in predicted tracking frames corresponding to pedestrians in the predicted frame image to obtain target features to be compared corresponding to the predicted tracking frames;

the prediction tracking frame judging subunit is used for judging whether a prediction tracking frame matched with a target pedestrian exists in the prediction frame image according to the comparison result of cosine similarity between each target feature to be compared and the fusion target feature corresponding to the k-1 frame image;

The first prediction target feature determining subunit is used for taking the adjacent frame image after the predicted frame image as a new predicted frame image if the predicted frame image is not available, and jumping to the step of respectively extracting features of images in the predicted tracking frames corresponding to the pedestrians in the predicted frame image until the predicted tracking frames matched with the target pedestrians are judged to exist in the predicted frame image;

a second predicted target feature determining subunit, configured to take a target feature to be compared corresponding to a predicted tracking frame matched with the target pedestrian as a predicted target feature corresponding to the predicted frame image

And the first prediction target feature determining subunit is used for sequentially extracting features of images in the prediction tracking frames corresponding to the alternative target tracking frames in the continuous frame images after the prediction frame images to obtain the prediction target features corresponding to the continuous frame images after the prediction frame images.

Further, the second prediction target feature determination subunit includes:

an alternative target tracking ID determining subunit, configured to use a tracking ID corresponding to an alternative target tracking frame in the k+1st frame image as an alternative target tracking ID;

an alternative target tracking determination subunit, configured to use a predicted tracking frame corresponding to the alternative target tracking ID in the continuous frame image after the k+1 frame as an alternative target tracking corresponding to the continuous frame image after the k+1 frame;

And the third prediction target feature determining subunit is used for sequentially extracting features of images corresponding to each alternative target tracking in the continuous frame images after the k+1 frame to obtain prediction target features corresponding to the continuous frame images after the k+1 frame.

Further, the target features include appearance information features and face information features; the target feature acquisition module comprises:

an image extraction unit, configured to extract an image in a tracking frame corresponding to a target tracking frame in each frame of image;

the appearance information feature extraction unit is used for obtaining appearance information features corresponding to the target pedestrians according to the extracted images in the tracking frames corresponding to the target tracking frames;

the face information feature extraction unit is used for detecting faces in each frame of image to obtain all face frames in each frame of image and face information features corresponding to the face frames;

the human body frame determining unit is used for detecting the human body in the target tracking frame to obtain a human body frame corresponding to the image in the target tracking frame;

the face information feature determining unit is used for taking a unique face frame positioned in the human frame as the face frame of the target pedestrian, and taking the face information feature corresponding to the face frame as the face information feature.

Further, the target frame acquisition module includes:

a video acquisition unit for acquiring a video including a target pedestrian;

the position information acquisition unit is used for extracting a first frame image in the video and acquiring the position information of the target pedestrian in the first frame image;

and the target frame determining unit is used for determining a target frame comprising the target pedestrian according to the position information of the target pedestrian in the first frame image.

According to the pedestrian tracking device, the target frame of a target pedestrian is determined according to the pedestrian contained in the first frame image of the video; processing the first frame image according to a multi-target tracking algorithm to obtain tracking frames corresponding to pedestrians in the first frame image respectively; determining a target tracking frame according to the intersection ratio result between each tracking frame in the first frame image and the target frame; respectively extracting features of images in tracking frames corresponding to the target tracking frames in each frame of image of the video to obtain target features; fusing the target features corresponding to all the images of each frame to obtain fused target features; and finally, determining the position of the target pedestrian in the frame image of the lost target pedestrian according to the fusion target characteristics. The method and the device not only can improve the universality and accuracy of target pedestrian tracking, but also can track target pedestrians in different scenes, and greatly improve the effect of target pedestrian tracking in actual use.

It should be noted that: in the pedestrian tracking apparatus provided in the above embodiment, when image segmentation is implemented, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the pedestrian tracking device and the pedestrian tracking method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments, which are not described in detail herein.

According to another aspect of the present disclosure, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to the present disclosure.

According to another aspect of the present disclosure there is also provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method according to the present disclosure.

Referring to fig. 7, a block diagram of an electronic device 700 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the electronic device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be any type of device capable of inputting information to the electronic device 700, the input unit 706 may receive input numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a trackpad, a trackball, a joystick, a microphone, and/or a remote control. The output unit 707 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, an object/audio output terminal, a vibrator, and/or a printer. Storage unit 708 may include, but is not limited to, magnetic disks, optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through computer networks, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above, such as method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM702 and/or the communication unit 709. One or more of the steps of the method 200 described above may be performed when a computer program is loaded into RAM703 and executed by the computing unit 701. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method 200 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims

1. A pedestrian tracking method, comprising:

2. The method according to claim 1, wherein the step of extracting features of images in a tracking frame corresponding to the target tracking frame in each frame of image of the video to obtain target features includes:

extracting features of images in the tracking frames corresponding to the target tracking frames in the first frame image to obtain target features corresponding to the first frame image;

Taking a tracking ID corresponding to the target tracking frame in the first frame image as a target tracking ID;

taking a tracking frame corresponding to the target tracking ID in each frame of image after the first frame as a target tracking frame corresponding to each frame of image after the first frame;

and respectively extracting the characteristics of the images in the target tracking frames in each frame of images after the first frame to obtain the target characteristics respectively corresponding to each frame of images after the first frame.

3. The method according to claim 1, wherein the step of extracting features of images in a tracking frame corresponding to the target tracking frame in each frame of image of the video to obtain target features includes:

if the kth frame image does not have the tracking frame corresponding to the target tracking frame, taking the fusion target feature corresponding to the kth-1 frame image as the fusion target feature corresponding to the kth frame image; k is more than or equal to 2, and k is less than the number of image frames contained in the video;

sequentially extracting features of images in the prediction tracking frames corresponding to pedestrians in the continuous frame images after the kth frame to obtain prediction target features corresponding to the prediction tracking frames; the frame numbers of the images corresponding to the continuous frame images are smaller than the frame numbers of the images contained in the video;

According to the comparison result of cosine similarity between each predicted target feature and the fusion target feature corresponding to the k-1 frame image, determining the matching times between each predicted tracking frame and the target pedestrian;

updating the predicted tracking frame with the matching times exceeding a preset threshold value into the target tracking frame, and extracting features of images in the predicted target frame corresponding to the updated target tracking frame in the continuous frame images after the kth frame.

4. A method according to claim 3, wherein the feature extraction is performed on images in prediction tracking frames corresponding to pedestrians respectively in the successive frame images after the kth frame respectively, so as to obtain prediction target features corresponding to the prediction tracking frames respectively, and the method comprises:

taking the (k+1) th frame image as a predicted frame image, and respectively extracting features of images in predicted tracking frames corresponding to pedestrians in the predicted frame image to obtain target features to be compared corresponding to the predicted tracking frames;

judging whether a predicted tracking frame matched with the target pedestrian exists in the predicted frame image or not according to the comparison result of cosine similarity between each target feature to be compared and the fusion target feature corresponding to the k-1 frame image;

If not, taking the adjacent frame image after the predicted frame image as a new predicted frame image, and jumping to the step of extracting the characteristics of the images in the predicted tracking frames corresponding to the pedestrians respectively in the predicted frame image until judging that the predicted tracking frames matched with the target pedestrians exist in the predicted frame image;

taking the target feature to be compared corresponding to the predicted tracking frame matched with the target pedestrian as a predicted target feature corresponding to the predicted frame image;

and sequentially extracting the features of the images in the prediction tracking frames corresponding to the alternative target tracking frames in the continuous frame images after the prediction frame images to obtain the prediction target features corresponding to the continuous frame images after the prediction frame images.

5. The method according to claim 4, wherein the sequentially performing feature extraction on images in the prediction tracking frames corresponding to the candidate target tracking frames in the continuous frame images after the prediction frame image to obtain the prediction target features corresponding to the continuous frame images after the prediction frame image respectively, includes:

taking the tracking ID corresponding to the alternative target tracking frame in the k+1st frame image as an alternative target tracking ID;

Taking a prediction tracking frame corresponding to the alternative target tracking ID in the continuous frame images after the k+1 frame as the alternative target tracking respectively corresponding to the continuous frame images after the k+1 frame;

and sequentially extracting features of images corresponding to each alternative target tracking in the continuous frame images after the k+1 frame to obtain predicted target features corresponding to the continuous frame images after the k+1 frame.

6. The method of claim 1, wherein the target features include appearance information features and face information features; the step of extracting the features of the images in the tracking frames corresponding to the target tracking frame in each frame of image of the video to obtain the target features includes:

extracting images in a tracking frame corresponding to the target tracking frame in each frame of image;

obtaining appearance information features corresponding to the target pedestrians according to the extracted images in the tracking frames corresponding to the target tracking frames;

detecting the faces in each frame of image to obtain all face frames in each frame of image and the corresponding face information characteristics of the face frames;

Detecting a human body in the target tracking frame to obtain a human body frame corresponding to an image in the target tracking frame;

and taking the unique face frame in the human frame as the face frame of the target pedestrian, and taking the face information characteristic corresponding to the face frame as the face information characteristic.

7. The method according to claim 1, wherein determining the target frame of the target pedestrian from the pedestrian contained in the first frame image of the video comprises:

acquiring a video containing a target pedestrian;

extracting a first frame image in the video, and obtaining the position information of the target pedestrian in the first frame image;

and determining a target frame comprising the target pedestrian according to the position information of the target pedestrian in the first frame image.

8. A pedestrian tracking device, comprising:

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method according to any one of claims 1 to 7.