WO2022001925A1 - 行人追踪方法和设备,及计算机可读存储介质 - Google Patents

行人追踪方法和设备,及计算机可读存储介质 Download PDF

Info

Publication number
WO2022001925A1
WO2022001925A1 PCT/CN2021/102652 CN2021102652W WO2022001925A1 WO 2022001925 A1 WO2022001925 A1 WO 2022001925A1 CN 2021102652 W CN2021102652 W CN 2021102652W WO 2022001925 A1 WO2022001925 A1 WO 2022001925A1
Authority
WO
WIPO (PCT)
Prior art keywords
pedestrian
trajectory
multimodal
tracking
feature
Prior art date
Application number
PCT/CN2021/102652
Other languages
English (en)
French (fr)
Inventor
窦笑
申光
侯春华
李东方
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP21833495.1A priority Critical patent/EP4174716A4/en
Priority to US18/013,874 priority patent/US20230351794A1/en
Publication of WO2022001925A1 publication Critical patent/WO2022001925A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • G06T2207/20044Skeletonization; Medial axis transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20072Graph-based image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the present application relates to the field of communication technology.
  • An aspect of the embodiments of the present application provides a pedestrian tracking method, which includes: performing pedestrian trajectory analysis on video images collected by a preset surveillance camera to generate a pedestrian trajectory picture set; performing multimodal feature extraction on the pedestrian trajectory picture set, and forming Pedestrian multi-modal database; and, inputting the pedestrian multi-modal database into the trained multi-modal recognition system to track the pedestrian and generate the movement trajectory of the pedestrian in the preset surveillance camera.
  • a pedestrian tracking device including: a memory, a processor, a program stored on the memory and executable on the processor, and a device for implementing connection and communication between the processor and the memory a data bus; the program is executed by the processor to implement at least one step of the pedestrian tracking method provided by the embodiments of the present application.
  • Yet another aspect of the embodiments of the present application provides a computer-readable storage medium on which one or more programs are stored, and the one or more programs can be executed by one or more processors, so as to implement the methods provided by the embodiments of the present application. At least one step of the pedestrian tracking method.
  • FIG. 1 is a flowchart of a pedestrian tracking method provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a pedestrian tracking method provided by an embodiment of the present application.
  • FIG. 3 is a structural block diagram of a pedestrian tracking system provided by an embodiment of the present application.
  • This application proposes a multi-modality-based cross-mirror tracking and retrieval system, which is based on multi-target pedestrian tracking, combined with pedestrian re-identification network, pedestrian quality analysis, pedestrian attribute analysis, face recognition and camera temporal and spatial location information , to further improve the accuracy and speed of cross-mirror tracking retrieval.
  • an embodiment of the present application provides a pedestrian tracking method, which includes the following steps S110 to 130 .
  • step S110 a pedestrian trajectory analysis is performed on the video images collected by the preset surveillance cameras to generate a pedestrian trajectory picture set.
  • step S120 multi-modal feature extraction is performed on the pedestrian trajectory picture set, and a pedestrian multi-modal database is formed.
  • step S130 the pedestrian multi-modal database is input into the trained multi-modal recognition system, and pedestrian tracking is performed to generate the movement trajectory of the pedestrian in the preset surveillance camera.
  • the pedestrian tracking method may further include: receiving the target pedestrian trajectory, extracting the multimodal feature of the target pedestrian, and searching the pedestrian multimodal database for the first pedestrian matching the multimodal feature of the target pedestrian.
  • Pedestrian trajectory combine the target pedestrian trajectory and the first pedestrian trajectory to generate a second pedestrian trajectory, and query the pedestrian trajectory matching the second pedestrian trajectory in the pedestrian multimodal database;
  • Pedestrian trajectories generate the movement trajectories of target pedestrians in preset surveillance cameras.
  • the pedestrian tracking method may further include: selecting images whose quality parameters are within a preset range from a pedestrian trajectory picture set, and performing feature extraction on the images whose selected quality parameters are within the preset range.
  • the influence factor of each modal parameter in the multimodal recognition system can be adjusted according to the training set to obtain a trained multimodal recognition system.
  • the picture names in the pedestrian trajectory picture set may include: trajectory identification (ID), video frame number, picture shooting time, and/or location information.
  • generating the movement track of the pedestrian in the preset surveillance camera may include: analyzing the movement rule of the pedestrian according to the graph structure of the distribution topology of the surveillance camera.
  • the spatiotemporal topological relationship of the surveillance camera can be combined with the appearance model matching algorithm of the target, and the graph structure of the surveillance camera topology can be used to analyze the rules of pedestrian movement and transfer, so as to impose spatiotemporal constraints on the cross-mirror tracking of pedestrians. If the tracking target disappears at a certain node (camera), the target detection is performed at the node that is reachable by a few steps adjacent to it, and then matching and association are performed.
  • the spatial relationship defines whether edges are established between nodes, and the direction of the edges.
  • This spatiotemporal constraint can be used to significantly reduce the number of samples to be queried, reduce query time and improve retrieval performance.
  • connection relationship between the camera nodes and the initial movement time can be estimated.
  • the edge weights of the camera network topology can be obtained by continuous correction in combination with the interval time of pedestrian re-identification.
  • the adjacent nodes in the camera network topology centered on this node are first determined according to the track to be queried, and then the time range of the query data in the adjacent nodes is limited by combining the edge weights. Trajectory matching is performed within the corresponding time range of each neighboring node.
  • the target is not matched within the recommended time range, it will be further queried within the expanded time range. If it is still not found, it will be queried at the next adjacent node centered on the node.
  • the multimodal features may include one or more of the following: pedestrian features, face features, and pedestrian attribute features.
  • Pedestrian features may include one or more of the following: tall, short, fat, and thin body shape features and posture features.
  • the facial feature information may include one or more of the following: face shape features, facial expression features, and skin color features.
  • the pedestrian attribute information may include one or more of the following: hairstyle length, hairstyle color, clothing style, clothing color, and carried items.
  • Embodiments of the present application further provide a pedestrian tracking device, including a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing connection and communication between the processor and the memory.
  • a pedestrian tracking device including a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for implementing connection and communication between the processor and the memory.
  • the program is executed by the processor, at least one step of the pedestrian tracking method provided by the embodiment of the present application may be implemented, for example, the steps shown in FIG. 1 .
  • Embodiments of the present application further provide a computer-readable storage medium on which one or more programs are stored, and the one or more programs can be executed by one or more processors to implement the pedestrian tracking provided by the embodiments of the present application At least one step of the method, eg, the steps shown in FIG. 1 .
  • the embodiment of the present application discloses a system for retrieving and tracking the same pedestrian under different cameras by using multimodal information fusion.
  • the method for implementing pedestrian tracking in the system may include the following steps S1-S6.
  • step S1 videos of different cameras in the monitoring area are acquired.
  • step S2 pedestrian detection is performed on the obtained offline video, and pedestrian trajectory extraction is completed, and the image names in the corresponding pedestrian trajectory picture set are named after the trajectory ID, video frame number, and corresponding time and location (for example, 0001_00025_202003210915_NJ), and save under a subfolder named after the track ID.
  • step S3 through the pedestrian quality analysis, the images whose quality parameters are within the preset range are extracted from the pedestrian trajectory picture set; Best 5 (top5) pedestrian trajectories.
  • step S4 the pedestrian re-identification network, face recognition network and pedestrian attribute network are used to extract the pedestrian features, face features (if no face data is detected) and pedestrian attribute features of the top 5 pedestrian trajectories, respectively.
  • Feature extraction Complete save the three features and (track ID, video frame number, time, location) to the database.
  • the pedestrian re-identification network is a technology that uses computer vision technology to determine whether there is a specific pedestrian in an image or a video sequence, and the pedestrian feature is determined through the pedestrian re-identification network.
  • the pedestrian attribute network is used to extract pedestrian attributes.
  • the pedestrian attribute is a semantic description of the appearance of pedestrians. Different parts of the human body have different attributes. For example, the related attributes of the human head include “short hair” and “long hair”; Attributes include “long-sleeve”, “short-sleeve”, “dress”, and “shorts”; attributes related to carry items include “backpack”, “single-shoulder bag”, “handbag” and “no carry”. Pedestrian attributes can be selected and subdivided in different environments and occasions to facilitate pedestrian re-identification.
  • the pedestrian attribute information is associated with the appearance information of the person, and is more specific semantic information. When performing pedestrian comparison or pedestrian retrieval, irrelevant data can be filtered according to the similarity of pedestrian attributes.
  • step S5 a batch of manually labeled test sets are used to optimize the multimodal weight parameters.
  • step S6 the final detection result is output.
  • the solution provided by the embodiment of the present application integrates multi-modal information such as face, pedestrian attributes, time and space, so that retrieval is more robust and adaptable. complex real-world scenarios.
  • FIG. 3 is a structural block diagram of a cross-mirror pedestrian tracking system based on multimodal retrieval provided by an embodiment of the present application.
  • the system may include: a data acquisition and trajectory extraction module, an optimal trajectory extraction module, a feature extraction and multimodal information storage module, a multimodal weight parameter adjustment module, and a retrieval interaction and result display module .
  • the data acquisition and trajectory extraction module obtains offline video images from the monitored video units.
  • Each monitoring unit is only responsible for data storage and extraction in its own area. Automatically mark the track ID, picture frame number, time information and location information for pedestrian pictures.
  • the optimal trajectory extraction module selects 5 pedestrian trajectory pictures with relatively complete pedestrian quality and large time interval from pedestrian trajectories.
  • the feature extraction and multimodal information storage module extracts pedestrian, face and pedestrian attribute features, and saves these three features, trajectory ID, and pedestrian trajectory picture time and space information into the pedestrian multimodal database.
  • the multimodal weight parameter adjustment module uses a batch of labeled test sets to optimize the weights of multimodal parameter values, and finally achieves its own optimal modal parameters for different data sets.
  • the retrieval interaction and result display module can provide interface operation track search track and image search track, and can display the optimal track and the optimal track ranking under each camera, and can search for video through the picture frame number in the track. in the track and play it in real time.
  • the multimodal-based cross-mirror tracking and retrieval system can obtain offline videos of the monitoring area, perform pedestrian retrieval for pedestrians in the video, use trajectory tracking algorithm to extract pedestrian trajectories, and complete the search for each picture.
  • Track ID, video frame number, and corresponding time and location are compounded and named, and the best 5 pedestrian pictures in the track are extracted through pedestrian quality analysis.
  • the face, pedestrian and pedestrian attribute features are extracted from all the trajectory images.
  • all the multi-modal information is stored in the database.
  • the parameters of the multi-modal system are adaptively adjusted using the test set, and finally the pedestrian trajectory search across lenses is completed, and the results are displayed on the interface.
  • this method reduces the workload to a large extent, and has high accuracy while being highly efficient. This solution can realize pedestrian retrieval across lenses, which provides quite strong support for smart security and safe cities. .
  • step S1 the retrieval area is determined, and the offline video monitored in the area is obtained.
  • the area can be relatively fixed places such as shopping malls, office buildings, residential quarters, and communities, and the offline video should be in a certain time period, at least Surveillance video of the same day. Save the video locally, and mark the camera ID, position and start time. In this embodiment, three lenses with different angles are selected, and the camera IDs are C0, C1 and C2.
  • step S2 pedestrian detection and trajectory tracking are performed on the offline videos under each camera.
  • the pictures in the corresponding pedestrian trajectory picture set are named after the trajectory ID, video frame number, and corresponding time and location (for example, 0001_00025_202003210915_NJ), and are saved in the subfolder named after the trajectory ID.
  • the pedestrian detection model uses the SSD (Single Shot MultiBox Detector, deep learning target detection algorithm) algorithm to obtain the position box and bounding box of the pedestrian in the current frame, and uses the Hungarian tracking algorithm to obtain the pedestrian trajectory.
  • SSD Single Shot MultiBox Detector, deep learning target detection algorithm
  • step S3 the pedestrian quality analysis model is used for the pedestrian trajectory obtained in the previous step.
  • the human skeleton key point detection algorithm is used, and the completeness of the pedestrian is judged by the number of skeleton key points. If the pedestrian skeleton in the picture is critical If the number of points is equal to the preset value, it is judged that the obtained picture information of the pedestrian is complete.
  • the selected key points include: head, shoulders, palms, and soles of feet. For the trajectory with many pedestrian pictures, extract 5 pictures with better quality and more scattered time in the trajectory, as the top5 pedestrian trajectory.
  • step S4 the pedestrian re-identification network, face recognition network and pedestrian attribute network are used to extract the pedestrian features, face features (if no face data is detected) and pedestrian attribute features of the top 5 pedestrian trajectories, respectively.
  • Feature extraction Complete save the three feature sums (track ID, video frame number, time, location) into the pedestrian multimodal database.
  • step S5 because the data set for cross-camera tracking has very strict requirements on the scene and there are no resources available on the network, a training set constructed by us is used, and three offline surveillance videos under three different shots are extracted.
  • the cameras are named C0, C1, and C2, respectively.
  • Pedestrian multi-object detection tracking, review and manual annotation are then performed on offline videos.
  • the query data belongs to the camera C0, and the gallery data to be queried belongs to the two cameras C1 and C2.
  • Use the marked training set to optimize the multi-modal recognition system, and complete a series of multi-modal information storage through the above-mentioned S4 steps (this database belongs to the multi-modal weight optimization database, which is the same as the information retrieval database generated in the previous S4 step.
  • each track contains 5 pictures, when the face, pedestrian features and pedestrian attributes are compared, batch feature comparison will be used, and then the retrieval method of C0->C1, C0->C2 will be used. , and finally count the retrieval hit rate for C0->C1, C0->C2. Then dynamically adjust the multi-modal weight parameters, repeat C0->C1, C0->C2, and count the retrieval hit rate. When the retrieval hit rate reaches the highest, it is considered that the current multi-modal parameter is the optimal multi-modal parameter, which not only completes the optimization and adjustment of the multi-modal parameter feature.
  • step S6 the optimized multi-modal weight parameters are finally used to perform cross-mirror pedestrian retrieval on the information retrieval database that has been generated in step S4, and the final detection result is output.
  • an interface-based operation track search track and a picture search track can finally be provided, and the optimal track and the optimal track ranking under each camera can be displayed, and the picture frame number in the track can be searched for The track in the video is played in real time.
  • the multimodal-based cross-mirror tracking and retrieval system provided by the embodiment of the present application can be applied to the following two scenarios: pedestrian trajectory search and pedestrian image search.
  • pedestrian trajectory search and pedestrian image search Use the track ID, pedestrian feature, face feature, pedestrian attribute and camera position information in the database to quickly and accurately retrieve the track and picture, and use the constraints between different features to achieve the effect of accurate matching.
  • the task of trajectory matching is to randomly select an extracted trajectory, retrieve it according to multimodal features, and match all trajectories related to it within the same video and between videos.
  • the specific implementation may include the following steps S11-S15.
  • step S11 the retrieval area is determined, and the offline video monitored in the area is obtained.
  • the area can be relatively fixed places such as shopping malls, office buildings, residential quarters, and communities, and the offline video should be in a certain time period, at least Surveillance video of the same day. Save the video locally, and mark the camera ID, position and start time. In this embodiment, three lenses with different angles are selected, and the camera IDs are C0, C1 and C2.
  • step S12 pedestrian detection and track tracking are performed on the offline videos under each camera.
  • the corresponding pedestrian trajectory pictures are named after the trajectory ID, video frame number, and corresponding time and location (for example, 0001_00025_202003210915_NJ), and are saved in the subfolder named after the trajectory ID.
  • the pedestrian detection model uses the SSD algorithm to obtain the position box and bounding box of the pedestrian in the current frame, and uses the Hungarian tracking algorithm to obtain the pedestrian trajectory.
  • step S13 the pedestrian quality analysis model is used for the pedestrian trajectory obtained in the previous step, and the human skeleton key point detection algorithm is used here, and the completeness of the pedestrian is judged by the number of skeleton key points. trajectories, and extract 5 pictures with better quality and more scattered time in the trajectory, as top5 pedestrian trajectories.
  • step S14 the pedestrian re-identification network, the face recognition network and the pedestrian attribute network are used to extract the pedestrian features, face features (if no face data is detected) and pedestrian attribute features of the top 5 pedestrian trajectories, respectively.
  • Feature extraction After completion, save the three kinds of feature sums (track ID, video frame number, time, location) into the information retrieval database.
  • step S15 trajectory matching and graph matching are completed at this step.
  • the priority of intra-video matching and inter-video matching considering that the images in the video are homologous data, the accuracy can be better guaranteed when matching, so the trajectory matching in the video is prioritized.
  • the function priority of features considering that face features are the most robust features of pedestrians, the comparison of face features is prioritized.
  • the priority of intra-video track matching and inter-video track matching, and the matching process includes the following 1)-3).
  • the scope will be further expanded, and a full query will be carried out in the adjacent nodes. If the approximate time range can be determined, the suspiciousness will further increase the efficiency and accuracy of the retrieval.
  • the technical solutions provided in the embodiments of the present application may include the following steps 11) to 16).
  • step 11 when the pedestrian trajectory is extracted, the trajectory ID, time information and position information are automatically marked on the pedestrian picture, and the trajectory ID, picture frame number, time and space information can be used in subsequent cross-mirror retrieval.
  • the system can also perform trajectory extraction for multiple videos under multiple shots at the same time.
  • the pedestrian quality analysis adopts the human skeleton key point detection technology, selects several key points through this technology, judges the completeness of the pedestrian by the number of detected key points, and outputs a completeness score, Use this score to extract some pedestrian pictures of poor quality (this will remove some pedestrian pictures with more serious occlusion), and then use the time information of the pictures to select 5 pictures with a large time interval in the trajectory (because there are adjacent frames between pictures)
  • the change of the pedestrian pose is small, and the trajectory features of the same pedestrian with different postures are more discriminative, and the five pictures can reduce the calculation amount of trajectory matching).
  • step 13 the application combines the fusion of multi-modal information such as pedestrian multi-target tracking, key point skeleton detection, pedestrian re-identification, pedestrian attribute structuring, face recognition and topological space-time constraints of cameras to achieve cross-camera tracking pedestrian tracking retrieval scheme.
  • multi-modal information such as pedestrian multi-target tracking, key point skeleton detection, pedestrian re-identification, pedestrian attribute structuring, face recognition and topological space-time constraints of cameras.
  • step 14 the features after the fusion of multimodal information are used to better realize the goal of searching pedestrian trajectories across shots, pedestrian images searching for pedestrian trajectories, and finally realizing cross-lens tracking.
  • step 15 when performing cross-mirror tracking of the target pedestrian, a huge amount of data may be faced, and it is almost impossible to perform a full search query.
  • the system combines the spatio-temporal topology relationship of the camera with the appearance model matching algorithm of the target, and uses the graph structure of the camera network topology to analyze the laws of pedestrian movement and transfer, so as to impose spatio-temporal constraints on the cross-mirror tracking of pedestrians.
  • step 16 a batch of labeled data sets of actual scenes are used to optimize the weight parameters between the modalities to achieve the optimal effect of cross-mirror tracking and retrieval.
  • the pedestrian tracking method and device, and the computer-readable storage medium provided by the embodiments of the present application can automatically complete cross-mirror tracking and retrieval, break the viewing angle limitation of a single fixed camera, and avoid manually replaying a large number of surveillance videos to search for retrieval targets.
  • the retrieval efficiency is greatly improved, and the tracking range is improved.
  • the use of multi-modal information, the cross-mirror retrieval feature integrates multiple modal information, including face, pedestrian, attribute and spatiotemporal information, forming the complementarity of multi-modal features, and the integrated features are more feature discriminative, It has better robustness in cross-mirror tracking retrieval and improves retrieval accuracy at the same time.
  • the system can adaptively adjust the multi-modal weight parameters through the test set, which solves the cross-domain problem of the camera to a large extent, and can better adapt to different monitoring scenarios through parameter adjustment.
  • the system has a good human-computer interaction interface, which can configure the camera position information and modal weight parameter information in an interface. Search track, the interface displays the optimal track, track search ranking under different cameras, and can play track.
  • the visualization of database information is very easy to operate and use.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components Components execute cooperatively.
  • Some or all physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit .
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • Computer storage media includes both volatile and nonvolatile implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data flexible, removable and non-removable media.
  • Computer storage media include but are not limited to random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read Only Memory, EEPROM), Flash memory or other memory technology, CD-ROM, Digital Versatile Disc (DVD) or other optical disk storage, magnetic cartridges, tapes, magnetic disk storage or other magnetic storage devices, or any other device that can be used to store the desired information and that can be accessed by a computer other medium.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

一种行人追踪方法和设备,以及计算机可读存储介质,涉及通信技术领域。行人追踪方法包括:对预设监控摄像头采集的视频画面进行行人轨迹分析,生成行人轨迹图片集(S110);对行人轨迹图片集进行多模态特征提取,并形成行人多模态数据库(S120);以及,将行人多模态数据库输入到训练好的多模态识别系统中,进行行人追踪,生成行人在预设监控摄像头中的移动轨迹(S130)。

Description

行人追踪方法和设备,及计算机可读存储介质
本申请要求在2020年6月29日提交中国专利局、申请号为202010603573.9的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域。
背景技术
视频监控已经遍布在我们生活的各个角落中,并且人脸识别技术已经非常成熟。然而,在实际的安防应用场景中,并非所有的摄像头都能够拍摄到清晰的人脸,由于头发、口罩和帽子等遮挡,很难通过人脸识别系统判定行人身份。不仅如此,在实际的应用场景中,一个摄像头往往无法覆盖所有的区域,而且多个摄像头之间一般也没有重叠,因此跨镜追踪检索系统对人员进行锁定和查找显得十分必要。
目前,跨镜追踪技术在产业界和学术界都受到广泛关注,并取得显著进展,从政策方面看,公安部推出平安城市的概念,并且发布了多项预研课题,相关的行业标准也在紧锣密鼓的制定当中。
发明内容
本申请实施例的一个方面提供一种行人追踪方法,包括:对预设监控摄像头采集的视频画面进行行人轨迹分析,生成行人轨迹图片集;对行人轨迹图片集进行多模态特征提取,并形成行人多模态数据库;以及,将行人多模态数据库输入到训练好的多模态识别系统中,进行行人追踪,生成行人在预设监控摄像头中的移动轨迹。
本申请实施例的另一个方面提供一种行人追踪设备,包括:存储器、处理器、存储在存储器上并可在处理器上运行的程序,以及用于实现处理器和存储器之间的连接通信的数据总线;程序被处理器执行,以实现本申请实施例提供的行人追踪方法的至少一个步骤。
本申请实施例的再一个方面提供一种计算机可读存储介质,其上存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器执行,以实现本申请实施例提供的行人追踪方法的至少一个步骤。
附图说明
图1是本申请实施例提供的行人追踪方法的一种流程图。
图2是本申请实施例提供的行人追踪方法的一种流程图。
图3是本申请实施例提供的行人追踪系统的一种结构框图。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本申请的说明,其本身没有特有的意义。因此,“模块”、“部件”或“单元”可以混合地使用。
跨镜追踪检索系统使用最多的就是行人重识别。在该领域,大多研究者一般采用基于行人图片特征来定位和检索行人的方案,这样对行人特征的鲁棒性提出了很高的要求,但是真实的场景往往都非常复杂,比如无正脸照、姿态变换、服装变换、遮挡、光线、摄像头分辨率较低和室内外环境变化等,这些因素通常会导致行人检索与追踪的失败。
本申请提出了一种基于多模态的跨镜追踪检索系统,以多目标行人追踪为基础,结合行人重识别网络、行人质量分析、行人属性分析、人脸识别和摄像头的时间和空间位置信息,来进一步提高跨镜追踪检索的准确率和速度。
如图1所示,本申请实施例提供了一种行人追踪方法,包括以下步骤S110-步骤130。
在步骤S110中,对预设监控摄像头采集的视频画面进行行人轨迹分析,生成行人轨迹图片集。
在步骤S120中,对行人轨迹图片集进行多模态特征提取,并形 成行人多模态数据库。
在步骤S130中,将行人多模态数据库输入到训练好的多模态识别系统中,进行行人追踪,生成行人在预设监控摄像头中的移动轨迹。
在一种可实施方式中,该行人追踪方法还可包括:接收目标行人轨迹,提取目标行人的多模态特征,并在行人多模态数据库中查找与目标行人的多模态特征匹配的第一行人轨迹;将目标行人轨迹和第一行人轨迹合并生成第二行人轨迹,在行人多模态数据库中查询与第二行人轨迹匹配的行人轨迹;以及,根据与第二行人轨迹匹配的行人轨迹,生成目标行人在预设监控摄像头中的移动轨迹。
在一种可实施方式中,该行人追踪方法还可包括:从行人轨迹图片集选取质量参数在预设范围内的图像,并对选取的质量参数在预设范围内的图像进行特征提取。
在一种可实施方式中,可根据训练集对多模态识别系统中各模态参数的影响因子进行调整,得到训练好的多模态识别系统。
在一种可实施方式中,行人轨迹图片集中的图片名称可包括:轨迹标识(ID)、视频帧号、图片拍摄时间,和/或地点信息。
在一种可实施方式中,生成行人在预设监控摄像头中的移动轨迹,可包括:根据监控摄像机分布拓扑的图结构进行行人的移动规律分析。
具体地,可将监控摄像头的时空拓扑关系与目标的外观表现模型匹配算法进行结合,使用监控摄像头拓扑的图结构分析行人移动和转移的规律,从而对行人的跨镜追踪进行时空约束。如果跟踪目标在某个节点(摄像头)处消失,则在其相邻几步可达的节点处进行目标检测,再进行匹配与关联。
更进一步地,空间关系定义了节点之间是否建立边,以及边的方向。图模型的建立过程中,如果两个节点之间在物理空间位置上一步可达,即不经过其他任何节点,则为它们之间建立一条边。
在实际应用系统中,采用统计学习方法为目标的运动建立时间约束,从而定义节点之间合理的权值。想要获取一组摄像头节点数据的统计规律往往比较困难,这是由许多因素决定的:包括目标的运动 规律、摄像机的地理位置、监控周边交通环境变化等。在本申请实施例中,对所有观测时间进行聚类并计算每个类内的方差;根据摄像头相对坐标及路线情况初始化权重,根据行人重识别比对情况进行修正。
考虑到行人无法同时出现在多个摄像头以及从一个摄像头移动到另一个摄像头需要考虑其时间统计规律,可以利用该时空约束显著减少待查询的样本量,减少了查询时间并提高了检索性能。
结合摄像头的空间经纬坐标及可行走路线的空间约束,可以估计出摄像头节点间的连接关系及初始移动时间。后续结合行人重识别的间隔时间进行不断的修正,可以获得摄像头网络拓扑的边权值。
后续查询时,首先根据待查询轨迹确定以此节点为中心的摄像机网络拓扑中的临近节点,再结合边权值限定临近节点中查询数据的时间范围。在每个临近节点的对应时间范围内进行轨迹匹配。
在临近节点A中匹配到目标后,则以该节点为新的网络中心,在摄像机网络拓扑中的临近节点中继续查询,并更新行人行进轨迹和出现的时间节点。在查询结束后完成行人行进轨迹的绘制。
若在推荐的时间范围内没有匹配到目标,则在扩大的时间范围内进一步查询,若仍没有查询到,则在以该节点为中心的下一层临近节点进行查询。
在一种可实施方式中,多模态特征可包括以下一种或多种:行人特征、人脸特征和行人属性特征。行人特征可包括以下一种或多种:高矮胖瘦的体型特征和姿势特征。人脸特征信息可包括以下一种或多种:脸型特征、脸部表情特征和肤色特征。行人属性信息可包括以下一种或多种:发型长短、发型颜色、服装款式、服装颜色以及携带物品。
本申请实施例还提供一种行人追踪设备,包括存储器、处理器、存储在存储器上并可在处理器上运行的程序,以及用于实现处理器和存储器之间的连接通信的数据总线。程序被处理器执行时,可实现本申请实施例提供的行人追踪方法的至少一个步骤,例如,图1所示的步骤。
本申请实施例还提供一种计算机可读存储介质,其上存储有一 个或者多个程序,该一个或者多个程序可被一个或者多个处理器执行,以实现本申请实施例提供的行人追踪方法的至少一个步骤,例如,图1所示的步骤。
本申请实施例公开了一种利用多模态信息融合来对不同摄像头下同一行人进行检索与追踪的系统,如图2所示,该系统实施行人追踪方法可包括以下步骤S1-步骤S6。
在步骤S1中,获取监控区域中不同摄像头的视频。
在步骤S2中,对获取到的离线视频进行行人检测,并完成行人轨迹提取,相应的行人轨迹图片集中的图片名以轨迹ID、视频帧号和对应时间、地点复合命名(例0001_00025_202003210915_NJ),保存在以轨迹ID命名的子文件夹下。
在步骤S3中,通过行人质量分析,提取行人轨迹图片集中质量参数在预设范围内的图像;在一种可实施方式中,可选取图片质量较好且时间较分散的5张图片,作为最佳5张(top5)行人轨迹。
在步骤S4中,分别利用行人重识别网络、人脸识别网络和行人属性网络提取top5行人轨迹的行人特征、人脸特征(若未检测到人脸数据置为空)和行人属性特征,特征提取完成,把三种特征和(轨迹ID、视频帧号、时间、地点)保存到数据库中。
在一种可实施方式中,行人重识别网络是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术,通过行人重识别网络确定行人特征。
行人属性网络用于提取行人属性,行人属性是关于行人外貌的语义性描述,人体不同的部位具有不同的属性,例如,人体头部相关属性有“短发”及“长发”等;衣服款式相关属性有“长袖”、“短袖”、“连衣裙”和“短裤”等;携带物相关属性有“双肩包”、“单肩包”、“手提包”和“无携带”等。在不同环境和场合可以对行人属性进行选择和细分,使之有利于行人的再识别。行人属性信息与人的外观信息相关联的,是更为具体的语义信息,在进行行人比对或行人检索时,根据行人属性的相似情况可以过滤不相关数据。
在步骤S5中,采用一批人工标注好的测试集来优化多模态权重 参数。
在步骤S6中,输出最终检测结果。
本申请实施例提供的方案相对于其它只基于行人图片特征来定位和检索的方案,融合了人脸、行人属性、时间和空间等多模态信息,使得检索更具备鲁棒性,更能够适应复杂的真实场景。
图3是本申请实施例提供的基于多模态检索的跨镜行人追踪系统的一种结构框图。如图3所示,该系统可包括:数据获取与轨迹提取模块、最佳轨迹提取模块、特征提取和多模态信息入库模块、多模态权重参数调整模块,和检索交互和结果展示模块。
数据获取与轨迹提取模块,从监控的视频单元中获取离线视频图像,每个监控单元只负责自己区域的数据保存和提取,保存到指定文件夹下,对已保存的视频进行轨迹追踪提取,并对行人图片自动标注轨迹ID、图片帧号、时间信息和位置信息。
最佳轨迹提取模块,从行人轨迹中筛选出行人质量较完整的,同时,具有较大时间间隔的5张行人轨迹图片。
特征提取和多模态信息入库模块是提取行人、人脸和行人属特征,并将这三种特征和轨迹ID、行人轨迹图片时间和空间信息,保存到行人多模态数据库中。
多模态权重参数调整模块是使用一批标注的测试集来优化多模态参数值的权重,最终达到针对不同的数据集有自己最优的模态参数。
检索交互和结果展示模块能够提供界面化操作轨迹搜索轨迹和图片搜索轨迹,并能够显示最优轨迹和每个摄像头下的最优轨迹排名,并能够通过轨迹中的图片帧号,可以搜索到视频中该轨迹,并进行实时播放。
本申请实施例提供的基于多模态的跨镜追踪检索系统,可获取监控区域的离线视频,对视频中的行人进行行人检索,采用轨迹追踪算法进行行人轨迹提取,并对每张图片完成以轨迹ID、视频帧号和对应时间、地点复合命名,通过行人质量分析提取轨迹中的最佳5张行人图片。对其所有轨迹图片进行人脸、行人和行人属性特征进行提取,特征提取完成后将所有的多模态信息入库。使用测试集对多模 态系统参数进行自适应调整,最后完成跨镜头的行人轨迹搜索,并将结果显示在界面上。该方法相对人工检索方法,很大程度上减少了工作量,高效率的同时还有较高的准确率,该方案可实现跨镜头的行人检索,为智慧安防和平安城市提供了相当有力的支持。
请参见图2,下面对上述各步骤进行详细的描述。
在步骤S1中,确定检索区域,并获取该区域监控的离线视频,该区域可以是商场、写字楼、居民小区和社区等相对固定的场所,并且该离线视频应该是某一时间段的,至少是同一天的监控视频。将视频保存到本地,并标记摄像头ID、位置和起始时间,在本实施例中,选取三个不同角度的镜头,摄像头ID为C0、C1和C2。
在步骤S2中,对各个摄像头下的离线视频进行行人检测、轨迹追踪。相应的行人轨迹图片集中图片以轨迹ID、视频帧号和对应时间、地点复合命名(例0001_00025_202003210915_NJ),保存在以轨迹ID命名的子文件夹下。这里行人检测模型采用的是SSD(Single Shot MultiBox Detector,深度学习目标检测算法)算法来获取当前帧行人的位置框和边界框,采用匈牙利追踪算法获取行人轨迹。
在步骤S3中,对上步获取的行人轨迹,使用行人质量分析模型,这里采用的是人体骨骼关键点检测算法,通过骨骼关键点的个数来判别行人的完整性,若图片中行人骨骼关键点的个数等于预设值,则判断获取的该行人的图片信息完整。在本实施例中,选取的关键点包括:头部、肩部、手掌、脚掌。对于有较多行人图片的轨迹,提取该轨迹中质量较好且时间较分散的5张图片,作为top5行人轨迹。
在步骤S4中,分别利用行人重识别网络、人脸识别网络和行人属性网络提取top5行人轨迹的行人特征、人脸特征(若未检测到人脸数据置为空)和行人属性特征,特征提取完成,把三种特征和(轨迹ID、视频帧号、时间、地点)保存到行人多模态数据库中。
在步骤S5中,因为跨镜追踪的数据集对场景要求非常严格,在网络上没有任何资源可供使用,所以采用了自己构建的训练集,通过提取三个不同镜头下的离线监控视频,三个摄像头分别命名为C0、C1和C2。然后对离线视频进行行人多目标检测跟踪、审核和人工标 注。查询(query)数据归属于C0摄像头,被查询轨迹(gallery)数据分属于两个摄像头C1和C2。使用该标注好的训练集优化多模态识别系统,经过上述S4步骤,完成一系列的多模态信息入库(该数据库归属于多模态权重优化数据库,与之前S4步生成的信息检索数据库不存在冲突),每条轨迹包含5张图片,在进行人脸、行人特征及行人属性比对时将采取批量特征比对的方式进行,然后采用C0->C1,C0->C2的检索方式,最终对C0->C1,C0->C2,统计检索命中率。然后动态的调整多模态权重参数,重新进行C0->C1,C0->C2,统计检索命中率。当检索命中率达到最高时认为,当前的多模态参数为最优多模态参数,既完成多模态参数特征的优化调整。
在步骤S6中,最终采用优化好的多模态权重参数,对S4步已经生成的信息检索数据库,进行跨镜下的行人检索,输出最终检测结果。根据本申请实施例,最终能够提供界面化操作轨迹搜索轨迹和图片搜索轨迹,并能够显示最优轨迹和每个摄像头下的最优轨迹排名,并能够通过轨迹中的图片帧号,可以搜索到视频中该轨迹,并进行实时播放。
本申请实施例提供的这种基于多模态的跨镜追踪检索系统,可应用于如下两种场景:行人轨迹搜索和行人图片搜索。利用数据库中的轨迹ID,行人特征、人脸特征,行人属性和摄像头位置信息,来进行轨迹和图片快速准确的检索,利用不同特征之间的约束,来达到精确匹配的效果。
轨迹匹配的任务目的是:任意选中一个已抽取的轨迹,根据多模态特征进行检索,在同一个视频内以及视频间匹配出与其相关的所有轨迹。具体实施可包括以下步骤S11-步骤S15。
在步骤S11中,确定检索区域,并获取该区域监控的离线视频,该区域可以是商场、写字楼、居民小区和社区等相对固定的场所,并且该离线视频应该是某一时间段的,至少是同一天的监控视频。将视频保存到本地,并标记摄像头ID、位置和起始时间,在本实施例中,选取三个不同角度的镜头,摄像头ID为C0、C1和C2。
在步骤S12中,对各个摄像头下的离线视频进行行人检测、轨 迹追踪。相应的行人轨迹图片以轨迹ID、视频帧号和对应时间、地点复合命名(例0001_00025_202003210915_NJ),保存在以轨迹ID命名的子文件夹下。这里行人检测模型采用的是SSD算法来获取当前帧行人的位置框和边界框,采用匈牙利追踪算法获取行人轨迹。
在步骤S13中,对上步获取的行人轨迹,使用行人质量分析模型,这里采用的是人体骨骼关键点检测算法,通过骨骼关键点的个数来判别行人的完整性,对于有较多行人图片的轨迹,提取该轨迹中质量较好且时间较分散的5张图片,作为top5行人轨迹。
在步骤S14中,分别利用行人重识别网络、人脸识别网络和行人属性网络提取top5行人轨迹的行人特征、人脸特征(若未检测到人脸数据置为空)和行人属性特征,特征提取完成,把三种特征和(轨迹ID、视频帧号、时间、地点)保存到信息检索数据库中。
在步骤S15中,在该步完成轨迹匹配和图匹配。在选取视频内匹配和视频间匹配优先级时,考虑到视频内的图像是同源数据,在进行匹配时更能保证准确性,所以优先处理视频内的轨迹匹配。同时在选用特征的作用优先级时,考虑到人脸特征是行人最为鲁棒的特征,优先进行人脸特征的比对,根据S14步骤中存储的结构性特征,并根据不同特征作用的先后、视频内轨迹匹配及视频间轨迹匹配的优先级,匹配工作的流程包括以下1)-3)。
1)首先进行视频内的轨迹匹配,首先使用目标轨迹的人脸特征,与其它含有人脸特征的轨迹进行批量化的特征比对,若能完成特征匹配,并在行人特征比对和行人属性的批量特征比对时有一定的相关性,则认为轨迹能够匹配成功。其次将匹配到的轨迹与目标轨迹结合作为第二行人轨迹,在剩余的轨迹内进行查询,查询算法为使用行人特征和行人属性特征的批量特征比对,在该过程中将使用重排序算法进行轨迹匹配。该过程充分结合初步查询的稳定轨迹,使第二行人轨迹中包含更多姿态和角度的样本,使该过程的查询更加的稳定。至此完成了视频内的轨迹匹配。
2)然后进行视频间的轨迹匹配,与视频内的轨迹匹配较类似,也要首先使用query中的人脸特征与时空约束中的临近节点中的行 人轨迹进行批量特征匹配。然后利用结合了初次匹配到的样本作为query在临近节点中进行再次查询。不同的是,考虑到跨镜过程中数据源的变化,该过程中特征比对的阈值将适当降低。
3)以行人图片搜索行人图片,对传入的待查询图片要进行结构化的特征提取,通过进行行人检测、行人特征提取、行人属性识别及人脸检测、人脸特征提取完成其特征结构化。在进行以图搜图时,首先在可疑节点的视频轨迹内进行查询,查询到目标轨迹后,后续将使用轨迹匹配的算法进行进一步查询。
若在可疑节点没有查询到目标行人,则进一步扩大范围,在临近节点内进行全量查询,若能确定大致时间范围,则可疑进一步增加检索的效率和准确性。
至此,已完成以行人轨迹匹配搜素轨迹和以行人图片搜索行人图片的过程。
综上所述,本申请实施例提供的技术方案可包括以下步骤11)-步骤16)。
在步骤11)中,在进行行人轨迹提取时,对行人图片自动标注轨迹ID、时间信息和位置信息,在后续跨镜检索的时候可以利用轨迹ID、图片帧号、时间和空间信息。同时,该系统还能够同时对多个镜头下的多个视频进行轨迹提取。
在步骤12)中,行人质量分析采用的是人体骨骼关键点检测技术,通过该技术选取几个关键点,通过检测到的关键点个数来判断行人的完整性,会输出一个完整性得分,通过该得分提取一些质量较差的行人图片(这样会去除一些遮挡较为严重的行人图片),然后在利用图片的时间信息选取轨迹中5张时间间隔较大的图片(因为相邻帧图片之间的行人姿态变化较小,多个不同姿态同一个行人的轨迹特征更具有判别性,同时五张图片可以减少轨迹匹配的计算量)。
在步骤13)中,本申请结合了行人多目标跟踪、关键点骨骼检测、行人重识别、行人属性结构化、人脸识别和摄像机的拓扑时空约束等多模态信息的融合来实现跨摄像头下的行人追踪检索方案。
在步骤14)中,利用多模态信息融合后的特征来更好的实现跨 镜头下行人轨迹搜索行人轨迹,行人图片搜索行人轨迹的目标,最终实现跨镜追踪。
在步骤15)中,在进行目标行人的跨镜追踪时,可能面临巨大的数据量,进行全量搜索查询几乎是不可能的。本系统将摄像机的时空拓扑关系与目标的外观表现模型匹配算法进行结合,使用摄像机网络拓扑的图结构分析行人移动和转移的规律,从而对行人的跨镜追踪进行时空约束。
在步骤16)中,采用一批标注好的实际场景的数据集来优化各个模态之间的权重参数来达到跨镜追踪检索的最优效果。
本申请实施例提供的行人追踪方法和设备,以及计算机可读存储介质,可以自动完成跨镜追踪检索,打破单一固定摄像头的视角局限,同时也避免了手动回放大量监控视频来搜寻检索目标,极大地提高了检索效率,并且提高了追踪范围。多模态信息的利用,该跨镜检索特征综合了多种模态信息,包括人脸、行人、属性和时空信息,形成了多模态特征的互补,综合后的特征更具有特征判别性,在进行跨镜追踪检索时具有更好的鲁棒性,同时提高了检索精度。该系统能够通过测试集自适应地调整多模态权重参数,这样在很大的程度上解决了摄像头的跨域问题,通过参数的调整能够更好的适应不同的监控场景。该系统具有较好的人机交互界面,能够界面化的配置摄像头位置信息,模态权重参数信息,能够通过按键操作,行人追踪,特征提取和特征入库,界面化操作轨迹检索轨迹,行人图片搜索轨迹,界面显示最优轨迹,不同摄像头下的轨迹搜索排名,并能够进行轨迹播放。数据库信息可视化,非常便于操作使用。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、设备中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。
在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字 信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、带电可擦可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上参照附图说明了本申请的优选实施例,并非因此局限本申请的权利范围。本领域技术人员不脱离本申请的范围和实质内所作的任何修改、等同替换和改进,均应在本申请的权利范围之内。

Claims (10)

  1. 一种行人追踪方法,包括:
    对预设监控摄像头采集的视频画面进行行人轨迹分析,生成行人轨迹图片集;
    对所述行人轨迹图片集进行多模态特征提取,并形成行人多模态数据库;以及
    将所述行人多模态数据库输入到训练好的多模态识别系统中,进行行人追踪,生成行人在所述预设监控摄像头中的移动轨迹。
  2. 根据权利要求1所述的方法,还包括:
    接收目标行人轨迹,提取所述目标行人的多模态特征,并在所述行人多模态数据库中查找与所述目标行人的多模态特征匹配的第一行人轨迹;
    将所述目标行人轨迹和所述第一行人轨迹合并生成第二行人轨迹,在所述行人多模态数据库中查询与所述第二行人轨迹匹配的行人轨迹;以及
    根据与所述第二行人轨迹匹配的行人轨迹,生成所述目标行人在所述预设监控摄像头中的所述移动轨迹。
  3. 根据权利要求1所述的方法,还包括:从所述行人轨迹图片集选取质量参数在预设范围内的图像,并对所述选取的质量参数在预设范围内的图像进行特征提取。
  4. 根据权利要求1所述的方法,其中,所述训练好的多模态识别系统是根据训练集对所述多模态识别系统中各模态参数的影响因子进行调整得到的。
  5. 根据权利要求1所述的方法,其中,所述行人轨迹图片集中的图片名称包括:轨迹标识ID、视频帧号、所述图片拍摄时间和地点信息。
  6. 根据权利要求1所述的方法,其中,生成所述行人在所述预设监控摄像头中的所述移动轨迹,包括:
    根据所述监控摄像机分布拓扑的图结构进行所述行人的移动规律分析。
  7. 根据权利要求1-6中任一项所述的方法,其中,所述多模态特征包括以下一种或多种:行人特征、人脸特征和行人属性特征。
  8. 根据权利要求7所述的方法,其中,所述行人特征包括以下一种或多种:高矮胖瘦的体型特征和姿势特征;
    所述人脸特征信息包括以下一种或多种:脸型特征、脸部表情特征和肤色特征;和/或
    所述行人属性信息包括以下一种或多种:发型长短、发型颜色、服装款式、服装颜色以及携带物品。
  9. 一种行人追踪设备,包括:存储器、处理器、存储在所述存储器上并可在所述处理器上运行的程序,以及用于实现所述处理器和所述存储器之间的连接通信的数据总线;其中,所述程序被所述处理器执行时实现根据权利要求1-8中任一项所述的行人追踪方法。
  10. 一种计算机可读存储介质,其上存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现根据权利要求1-8中任一项所述的行人追踪方法。
PCT/CN2021/102652 2020-06-29 2021-06-28 行人追踪方法和设备,及计算机可读存储介质 WO2022001925A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21833495.1A EP4174716A4 (en) 2020-06-29 2021-06-28 PEDESTRIAN TRACKING METHOD AND APPARATUS AND COMPUTER READABLE STORAGE MEDIUM
US18/013,874 US20230351794A1 (en) 2020-06-29 2021-06-28 Pedestrian tracking method and device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010603573.9 2020-06-29
CN202010603573.9A CN113935358A (zh) 2020-06-29 2020-06-29 一种行人追踪方法、设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022001925A1 true WO2022001925A1 (zh) 2022-01-06

Family

ID=79272756

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102652 WO2022001925A1 (zh) 2020-06-29 2021-06-28 行人追踪方法和设备,及计算机可读存储介质

Country Status (4)

Country Link
US (1) US20230351794A1 (zh)
EP (1) EP4174716A4 (zh)
CN (1) CN113935358A (zh)
WO (1) WO2022001925A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666403A (zh) * 2022-02-18 2022-06-24 国政通科技有限公司 一种基于目标轨迹的警务信息推送系统及方法
CN115830076A (zh) * 2023-02-21 2023-03-21 创意信息技术股份有限公司 一种人员轨迹视频智能分析系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994925B (zh) * 2023-02-14 2023-09-29 成都理工大学工程技术学院 一种基于关键点检测的多行人快速跟踪方法
CN117237418B (zh) * 2023-11-15 2024-01-23 成都航空职业技术学院 一种基于深度学习的运动目标检测方法和系统
CN117635402B (zh) * 2024-01-25 2024-05-17 中国人民解放军国防科技大学 智慧流调系统、方法、计算机设备和存储介质
CN117896626B (zh) * 2024-03-15 2024-05-14 深圳市瀚晖威视科技有限公司 多摄像头检测运动轨迹的方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100266159A1 (en) * 2009-04-21 2010-10-21 Nec Soft, Ltd. Human tracking apparatus, human tracking method, and human tracking processing program
CN107657232A (zh) * 2017-09-28 2018-02-02 南通大学 一种行人智能识别方法及其系统
CN108229456A (zh) * 2017-11-22 2018-06-29 深圳市商汤科技有限公司 目标跟踪方法和装置、电子设备、计算机存储介质
CN108629791A (zh) * 2017-03-17 2018-10-09 北京旷视科技有限公司 行人跟踪方法和装置及跨摄像头行人跟踪方法和装置
US10095954B1 (en) * 2012-01-17 2018-10-09 Verint Systems Ltd. Trajectory matching across disjointed video views
CN110188691A (zh) * 2019-05-30 2019-08-30 银河水滴科技(北京)有限公司 一种移动轨迹确定方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100266159A1 (en) * 2009-04-21 2010-10-21 Nec Soft, Ltd. Human tracking apparatus, human tracking method, and human tracking processing program
US10095954B1 (en) * 2012-01-17 2018-10-09 Verint Systems Ltd. Trajectory matching across disjointed video views
CN108629791A (zh) * 2017-03-17 2018-10-09 北京旷视科技有限公司 行人跟踪方法和装置及跨摄像头行人跟踪方法和装置
CN107657232A (zh) * 2017-09-28 2018-02-02 南通大学 一种行人智能识别方法及其系统
CN108229456A (zh) * 2017-11-22 2018-06-29 深圳市商汤科技有限公司 目标跟踪方法和装置、电子设备、计算机存储介质
CN110188691A (zh) * 2019-05-30 2019-08-30 银河水滴科技(北京)有限公司 一种移动轨迹确定方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4174716A4

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114666403A (zh) * 2022-02-18 2022-06-24 国政通科技有限公司 一种基于目标轨迹的警务信息推送系统及方法
CN115830076A (zh) * 2023-02-21 2023-03-21 创意信息技术股份有限公司 一种人员轨迹视频智能分析系统
CN115830076B (zh) * 2023-02-21 2023-05-09 创意信息技术股份有限公司 一种人员轨迹视频智能分析系统

Also Published As

Publication number Publication date
US20230351794A1 (en) 2023-11-02
EP4174716A1 (en) 2023-05-03
CN113935358A (zh) 2022-01-14
EP4174716A4 (en) 2023-12-27

Similar Documents

Publication Publication Date Title
WO2022001925A1 (zh) 行人追踪方法和设备,及计算机可读存储介质
CN111627045B (zh) 单镜头下的多行人在线跟踪方法、装置、设备及存储介质
EP3637303B1 (en) Methods for generating a base of training images, for training a cnn and for detecting a poi change in a pair of inputted poi images using said cnn
Kumar et al. The p-destre: A fully annotated dataset for pedestrian detection, tracking, and short/long-term re-identification from aerial devices
CN107093171B (zh) 一种图像处理方法及装置、系统
US10467800B2 (en) Method and apparatus for reconstructing scene, terminal device, and storage medium
CN106845357B (zh) 一种基于多通道网络的视频人脸检测和识别方法
Kalra et al. Dronesurf: Benchmark dataset for drone-based face recognition
WO2020135523A1 (zh) 目标对象的检索定位方法和装置
US8254633B1 (en) Method and system for finding correspondence between face camera views and behavior camera views
JP6172551B1 (ja) 画像検索装置、画像検索システム及び画像検索方法
TWI745818B (zh) 視覺定位方法、電子設備及電腦可讀儲存介質
WO2020131467A1 (en) Detecting objects in crowds using geometric context
WO2020135392A1 (zh) 异常行为检测方法及装置
US20200027240A1 (en) Pedestrian Tracking Method and Electronic Device
US8130285B2 (en) Automated searching for probable matches in a video surveillance system
Tulyakov et al. Robust real-time extreme head pose estimation
TW201142752A (en) Tracking method
CN110428449A (zh) 目标检测跟踪方法、装置、设备及存储介质
WO2019062347A1 (zh) 人脸识别方法及相关产品
CN110516707B (zh) 一种图像标注方法及其装置、存储介质
CN110175583A (zh) 一种于基于视频ai的校园全域安全监控分析方法
KR20180015101A (ko) 소스 비디오 내에서 관심 동영상을 추출하는 장치 및 방법
CN111126288A (zh) 目标对象的关注度计算方法、装置、存储介质及服务器
Revaud et al. Did it change? learning to detect point-of-interest changes for proactive map updates

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21833495

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021833495

Country of ref document: EP

Effective date: 20230130