WO2021196294A1 - 一种跨视频人员定位追踪方法、系统及设备 - Google Patents

一种跨视频人员定位追踪方法、系统及设备 Download PDF

Info

Publication number
WO2021196294A1
WO2021196294A1 PCT/CN2020/085081 CN2020085081W WO2021196294A1 WO 2021196294 A1 WO2021196294 A1 WO 2021196294A1 CN 2020085081 W CN2020085081 W CN 2020085081W WO 2021196294 A1 WO2021196294 A1 WO 2021196294A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
geographic
person
tracked
surveillance
Prior art date
Application number
PCT/CN2020/085081
Other languages
English (en)
French (fr)
Inventor
胡金星
宋亦然
沈策
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2021196294A1 publication Critical patent/WO2021196294A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • This application belongs to the technical field of pedestrian positioning and tracking, and in particular relates to a method, system and electronic equipment for cross-video personnel positioning and tracking.
  • video personnel positioning methods mainly adopt environmental measurement positioning methods and camera-based positioning methods.
  • the environmental measurement positioning method refers to outdoor personnel positioning or ranging.
  • laser range measurement or ultrasonic range measurement can be used, with an accuracy of centimeters.
  • this type of method has high accuracy and the surveyor has geographic location information, it is difficult to perform continuous, large-scale, and multi-person determinations.
  • Camera-based positioning methods are divided into binocular positioning and monocular positioning.
  • binocular positioning is used in the field of precise positioning of robots.
  • the SLAM (simultaneous localization and mapping) method of constructing three-dimensional space, real-time positioning and map construction is not suitable for Public area camera surveillance system.
  • the video-based monocular and binocular ranging methods have problems such as large workloads for building local databases, large calculations for extracting feature information, and serious impact of external factors on feature information, which still restrict the accuracy and usability of visual positioning.
  • the geographic information construction method is mainly based on the image content to determine the geographic location information of the image.
  • image-based geographic location recognition needs to first extract visual features from the image to compare the similarity between different images.
  • these prior image data cannot be obtained in advance.
  • the camera-based positioning method does not belong to the field of measurement, it is necessary to introduce an image geographic information construction method to obtain personnel geographic information.
  • the method of introducing SIFT parameters is mainly adopted for the same person tracking, that is, as much as possible of the details of the person to be tracked and stored in the database. When the same person appears, the person who matches the database will be re-identified.
  • a common cross-video tracking method is template matching. A given template is used to search for the image area to be matched, and the matching result is obtained according to the calculated matching degree.
  • this kind of personnel matching method requires a given template in advance, and the area to be matched is strictly required to be searched. Due to the existence of matching and searching, the calculation efficiency in continuous video is low and time-consuming, and it cannot be fully applied to a multi-video system.
  • the existing person (pedestrian) tracking system mainly adopts the method of finding and locating the person.
  • the method of introducing more person characteristic parameters will be introduced for tracking.
  • due to the inability to obtain personnel location information it is difficult for the existing personnel tracking system to obtain the continuous space-time trajectory of personnel.
  • the present application provides a cross-video personnel location tracking method, system, and electronic device, which are intended to solve at least one of the above-mentioned technical problems in the prior art to a certain extent.
  • a cross-video personnel location tracking method including the following steps:
  • Step a Constructing an object geographic coordinate database, performing reference point topology matching and video geographic registration on the surveillance video according to object recognition, and determining the geographic coordinates of the surveillance video pixels;
  • Step b Perform personnel detection location calculation on the surveillance video to obtain the geographic location of the person to be tracked;
  • Step c Combining the geographic locations of the persons to be tracked detected by multiple nearby videos, and using the maximum likelihood estimation method to perform cross-video re-identification and analysis of persons on multiple surveillance videos to obtain the continuous spatio-temporal trajectory of the persons to be tracked.
  • the technical solution adopted in the embodiment of the present application further includes: in the step a, the reference point topology matching and the video geographic registration of the surveillance video according to the object recognition specifically include:
  • the world geographic coordinate system conversion method is adopted to perform geographic registration on the control points with the same name in the ground area in the surveillance video, so that the surveillance video has geographic location information.
  • the technical solution adopted by the embodiment of the present application further includes: in the step a, the topological matching of reference points and the video geographic registration of the surveillance video according to the object recognition further include:
  • a certain frame of a two-dimensional image in the pre-processed video image is intercepted, and edge detection and watershed segmentation methods are used to perform edge extraction on the two-dimensional image to obtain a ground area with GIS information in the two-dimensional image.
  • the technical solution adopted in the embodiment of the present application further includes: in the step b, the calculation of the person detection position of the surveillance video is specifically:
  • the frame difference method is used to detect the moving objects in the surveillance video, and the head detector is used to locate the position of the person to be tracked, and the head information of the person to be tracked is obtained.
  • the technical solution adopted in the embodiment of the application further includes: the human head detector adopts a person detection method based on a convolutional neural network CNN, and the convolutional neural network includes an input layer, a convolutional layer, a pooling layer, a fully connected layer, and In the output layer, multiple convolutional layers and pooling layers are combined to process the input data, and the connection layer is used to map the output target.
  • the technical solution adopted in the embodiment of the present application further includes: in the step b, the obtaining the geographic location of the personnel further includes:
  • the movement detection of the person to be tracked is performed based on the head information, and the pixel points of the feet of the person to be tracked are obtained, and the pixels of the feet are the geographic location information of the person to be tracked.
  • the technical solution adopted in the embodiment of the present application further includes: in the step b, the obtaining the geographic location of the personnel further includes:
  • the geographic location information of the person to be tracked is calibrated by suppressing the camera itself.
  • the technical solution adopted in the embodiment of the present application further includes: in the step c, the use of the maximum likelihood estimation method to perform cross-video re-identification analysis of multi-channel surveillance videos by people also includes:
  • the geographic area overlap determination is triggered; the geographic area overlap determination is specifically:
  • the geographic information of the shooting scene of the multi-channel surveillance video is located, and the surveillance area of each camera device is divided according to the overlapping geographic location area; the geographic information space coordinates of the person to be tracked in consecutive frames are connected to obtain the to-be-tracked person Track the continuous trajectory of the person.
  • the geographic information space coordinate of the person to be tracked exceeds the monitoring area of the current camera and moves to the monitoring area of the next camera, the next camera is triggered to track the trajectory of the person.
  • a cross-video personnel location tracking system including:
  • Video geo-registration module used to build an object geographic coordinate database, perform reference point topology matching and video geo-registration on surveillance videos according to object recognition, and determine the geographic coordinates of the surveillance video pixels;
  • Cross-video personnel positioning and tracking module used to perform personnel detection position calculation on the surveillance video and obtain the geographic location of the personnel to be tracked;
  • Multi-video trajectory tracking module used to combine the geographic locations of the persons to be tracked detected by multiple nearby videos, and use the maximum likelihood estimation method to perform cross-video re-identification and analysis of persons on multiple surveillance videos to obtain the continuous spatio-temporal trajectories of the persons to be tracked.
  • an electronic device including:
  • At least one processor At least one processor
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform the following operations of the cross-video person location tracking method described above:
  • Step a Constructing an object geographic coordinate database, performing reference point topology matching and video geographic registration on the surveillance video according to object recognition, and determining the geographic coordinates of the surveillance video pixels;
  • Step b Perform personnel detection location calculation on the surveillance video to obtain the geographic location of the person to be tracked;
  • Step c Combining the geographic locations of the persons to be tracked detected by multiple nearby videos, and using the maximum likelihood estimation method to perform cross-video re-identification and analysis of persons on multiple surveillance videos to obtain the continuous spatio-temporal trajectory of the persons to be tracked.
  • the beneficial effects produced by the embodiments of the present application are: the cross-video personnel location tracking method, system and electronic equipment of the embodiments of the present application obtain the location information of the reference object through object recognition to perform the geographic registration of the surveillance video.
  • the detection and acquisition of personnel's geographic location information and their movement time and space trajectory are simple in calculation and possess geographic location information, which has better application value in multi-channel video system scenarios.
  • This application introduces personnel geographic location information based on video geographic calibration, performs maximum likelihood estimation during cross-video personnel re-identification, reduces the difficulty of visual personnel re-identification trajectory tracking algorithm and system complexity, and is better in multi-channel cross-video system scenarios Application value.
  • FIG. 1 is a flowchart of a method for cross-video personnel location tracking according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a surveillance video geographic registration algorithm according to an embodiment of the present application
  • Figure 3(a) is the video image before preprocessing
  • Figure 3(b) is the video image after preprocessing
  • Figure 4 is a schematic diagram of a camera imaging model
  • Figures 5(a) and (b) are schematic diagrams of the spatial relationship between pixel coordinates and world geographic coordinates.
  • Figure 5(a) is pixel coordinates
  • Figure 5(b) is world geographic coordinates;
  • Fig. 6 is a schematic diagram of a person detection algorithm according to an embodiment of the present application.
  • FIG. 7 is a flowchart of a frame difference method according to an embodiment of the present application.
  • Figure 8 is an example diagram of maximum likelihood estimation
  • FIG. 9 is a flowchart of a cross-video person tracking algorithm based on geographic area overlap determination according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a cross-video personnel positioning and tracking system according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the hardware device structure of the cross-video personnel location tracking method provided by an embodiment of the present application.
  • FIG. 1 is a flowchart of a cross-video personnel location tracking method according to an embodiment of the present application.
  • the cross-video personnel location tracking method of the embodiment of the present application includes the following steps:
  • Step 100 Obtain surveillance video of the person to be tracked
  • Step 200 Construct a database of object geographic coordinates, perform reference point topology matching and video geographic registration on the surveillance video according to the object recognition, and determine the geographic coordinates of the surveillance video pixels;
  • the object geographic coordinate database includes a third-party geographic information database such as Baidu or existing BIM file data (data for construction projects, including building appearance and geographic location).
  • a third-party geographic information database such as Baidu or existing BIM file data (data for construction projects, including building appearance and geographic location).
  • the standard WGS84 coordinate system is used for reference point calibration, and the BIM (Building Information Modeling) information of the identified object corresponds to the coordinate points of the WGS84 coordinate system in the public area and serves as a reference Points to improve the accuracy of tracking personnel space calculations.
  • the object geographic database is used for the correspondence between the world coordinate system and the image coordinate system.
  • the existing object geographic information and image are used to identify the object, and the pixel points in the surveillance video are registered with the actual actual coordinates, and the spatial position coordinates of the surveillance video are determined .
  • FIG. 2 is a schematic diagram of a surveillance video geo-registration algorithm according to an embodiment of the present application.
  • the surveillance video geographic registration algorithm of the embodiment of the application includes:
  • Step 210 Perform object recognition and classification on the surveillance video by using an object recognition algorithm to obtain a reference point in the surveillance video. At the same time, perform image preprocessing on the surveillance video to obtain a fish-eye-calibrated video image;
  • the object recognition algorithm includes methods such as RCNN, YOLO, etc., through object recognition and classification of the surveillance video, a reference point in the surveillance video is obtained, and the reference point is fuzzy matched with the object in the object geographic database.
  • This application preprocesses the surveillance video through the checkerboard correction method, calculates the internal parameters and correction coefficients of the fisheye, then corrects and trims the surveillance video, removes part of the edge part, and obtains the fisheye calibrated video image.
  • the preprocessed Video images can reduce errors in world coordinate conversion caused by nonlinear distortion. The details are shown in Figure 3(a) and (b), where Figure 3(a) is the video image before preprocessing, and Figure 3(b) is the video image after preprocessing.
  • nonlinear distortion is generally geometric distortion, which causes a certain offset between pixel coordinates and ideal pixel coordinates, which can be expressed as:
  • the first term in ⁇ u and ⁇ v is affected by the camera components, and the second and third terms are caused by the inaccuracy of the original camera imaging.
  • s 1 and s 2 are nonlinear distortion parameters. By calculating the value of the nonlinear distortion parameter, the image distortion is restored.
  • Step 220 Geographically register the identified reference point with the object in the object geographic database to obtain the geographic location information of the control point with the same name in the surveillance video;
  • Step 230 intercept a frame of a two-dimensional image in the pre-processed video image, and use edge detection and watershed segmentation methods to perform edge extraction on the two-dimensional image to obtain a ground area with GIS information in the two-dimensional image;
  • step 230 using the special spatial structure relationship of the surveillance video, the area under the video is the ground part, and vertical objects will block the ground. Therefore, this application uses edge detection technology and watershed algorithm to calculate the horizontal structure and vertical structure of the image.
  • Step 240 Use the method of world geographic coordinate system conversion to perform geographic registration on the control points with the same name in the ground area in the surveillance video, so that the surveillance video has geographic location information;
  • the offset matrix is calculated according to the conversion of the world geographic coordinate system, and each pixel in the ground area in the surveillance video is matched with geographic coordinates through methods such as image stretching, filling, and cutting, to obtain a geographic coordinate in the surveillance video.
  • Information plane By comparing the object geographic database, the actual geographic location of the identified object and the relative position of the observed object are obtained. Since matrix calculation generally uses four points to complete the conversion between pixel coordinates and world coordinates of a plane, the result calculated by the world coordinate system conversion matrix is an estimated value, not an accurate value. In order to control the error of the estimated value, this application performs constraint calculation on the image when there are more control points with the same name.
  • the coordinate transformation formula between the world coordinate system and the camera coordinate system is:
  • R is a 3 ⁇ 3 rotation transformation matrix
  • t is a 3 ⁇ 1 translation
  • the transformation matrix, R and t respectively represent the relative posture and position between the world coordinate system and the camera coordinate system.
  • the coordinate transformation method between the camera coordinate system and the image coordinate system is to project a three-dimensional space point P (X, Y, Z) in the camera coordinate system to the imaging plane to obtain the corresponding two-dimensional plane point p (x, y),
  • the relationship between x, y and X, Y can be expressed as:
  • f is the focal length of the camera.
  • the vertical distance of the recognized object or pixel area on the ground is:
  • the height of the identified object can be determined by the Y value.
  • the ground area can be determined and the ground coordinates can be determined.
  • Figures 5(a) and (b) are schematic diagrams of the spatial relationship between pixel coordinates and world geographic coordinates.
  • Figure 5(a) is the pixel coordinates
  • Figure 5(b) is the world geographic coordinates.
  • this application proposes a variety of methods to suppress the error to measure the pixel points of the personnel. Position calibration. After the object is recognized, use the center point of the bottom of the object recognition frame as the geographic location of the object and calculate it as the reference point; first, the two-dimensional plane point (x, y) in the image and its corresponding point (u, v) ) The relationship is expressed by the following formula:
  • d x , d y represent the physical distance of the pixel on the u-axis and v-axis and direction of the camera, and (u 0 , v 0 ) are the coordinates of the principal point of the camera in the pixel coordinate system.
  • f u and f v respectively represent the focal length when using pixel width and height as the unit.
  • matrix The parameters in are called camera internal parameters, and they are only affected by the camera's internal structure and imaging characteristics.
  • the parameters of the matrix R and the matrix t are called external camera parameters.
  • Step 300 Perform personnel detection location calculation on the surveillance video, and obtain the geographic location of the person to be tracked;
  • FIG. 6 is a schematic diagram of a person detection algorithm according to an embodiment of the present application.
  • the personnel detection algorithm in this embodiment of the application includes:
  • Step 310 Detect the head of the person entering the surveillance video, and obtain the head information of the person to be tracked;
  • human head detection is a method for quickly identifying a human head model, which is suitable for multi-channel surveillance videos.
  • this application uses the frame difference method to detect moving objects in the surveillance video, and combines the human head detector to locate the human position.
  • the head detector adopts a person detection method based on the convolutional neural network CNN.
  • the convolutional neural network is composed of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. It combines multiple convolutional layers and pooling. The layer processes the input data and realizes the mapping with the output personnel in the connection layer.
  • the flow chart of the frame difference method is shown in Figure 7, which specifically includes: remember that the images of the n+1th, nth and n-1th frames in the video sequence are fn+1, fn and fn-1, respectively, three frames
  • the gray values of the corresponding pixels of the image are denoted as fn+1(x,y), fn(x,y) and fn-1(x,y) respectively, and the difference images Dn+1 and Dn are obtained respectively.
  • Dn +1 and Dn perform operations, then perform threshold processing and connectivity analysis, and finally detect moving objects.
  • Step 320 Perform movement detection (personal position calculation) of the person to be tracked based on the head detection result to obtain the pixel points of the person to be tracked, which is the geographic location information of the person to be tracked;
  • step 320 since the person is a moving object, according to the characteristics of the video, the bottom area of the moving object is the position where the feet stand. Therefore, this application uses the method of combining human head detection and movement detection to quickly find the pixel points under the feet corresponding to the movement area of the person to be tracked. Compared with methods such as general person posture detection, SIFE feature tracking, etc., this application It has faster detection efficiency, and is more robust in complex environments, and it also has a good performance in the accuracy of personnel recognition.
  • Step 330 Calibrate the geographic location information of the person to be tracked
  • step 330 the present application uses the method of suppressing the camera itself to calibrate the geographic location information of the person to be tracked, thereby reducing uncertain errors caused by blurring when the person moves, and improving positioning accuracy.
  • Step 400 Combining the geographic locations of the persons to be tracked detected by multiple nearby videos, and use the maximum likelihood estimation method to perform cross-video re-identification and analysis of the persons to be tracked on multiple surveillance videos to obtain the continuous spatio-temporal trajectory of the persons to be tracked;
  • step 400 when a person moves across videos, although the geographic location information of the person to be tracked is available, there is a situation in which it is impossible to determine whether they are the same person.
  • This application maximizes the movement trajectory of the same person to be tracked in multiple videos.
  • Likelihood estimation judge whether they are the same person through probability calculation. Specifically: given a probability distribution D, assuming its probability density function (continuous distribution) or probability aggregation function (discrete distribution) is f D , and a distribution parameter ⁇ , one can be extracted from the probability distribution D with n values Sampling x 1 ,x 2 ,...,x n , and then estimate ⁇ . Calculate its probability by using f D:
  • the maximum likelihood estimation will find the most likely value of ⁇ (that is, in all possible values of ⁇ , find a value to maximize the "probability" of this sample). To realize the maximum likelihood estimation method mathematically, we must first define the possibility:
  • the value that maximizes the probability is the maximum likelihood estimate of ⁇ .
  • the maximum likelihood is judged, and after the threshold is set, it is judged whether it is the same person.
  • Figure 8 it is an example diagram of maximum likelihood estimation.
  • Geographical area overlap determination is to locate the geographic information of the shooting scenes of multiple surveillance videos, and then divide the respective tracking surveillance areas according to the overlapping geographic locations.
  • the continuous trajectory of the person to be tracked is obtained by connecting the geographic information space coordinates of the person to be tracked in consecutive frames.
  • FIG. 10 is a schematic structural diagram of a cross-video personnel positioning and tracking system according to an embodiment of the present application.
  • the cross-video personnel location tracking system of the embodiment of the present application includes:
  • Video geo-reference module used to build a database of object geographic coordinates, perform reference point topology matching and video geo-registration on surveillance video based on object recognition, and determine the geographic coordinates of surveillance video pixels;
  • Cross-video personnel positioning and tracking module used to perform personnel detection position calculation on surveillance videos and obtain the geographic location of the personnel to be tracked;
  • Multi-video trajectory tracking module It is used to combine multiple nearby videos to detect the geographic location of the person to be tracked, and use the maximum likelihood estimation method to perform cross-video re-identification and analysis of the multi-channel surveillance video to obtain the continuous spatio-temporal trajectory of the person to be tracked.
  • FIG. 11 is a schematic diagram of the hardware device structure of the cross-video personnel location tracking method provided by an embodiment of the present application.
  • the device includes one or more processors and memory. Taking a processor as an example, the device may also include: an input system and an output system.
  • the processor, the memory, the input system, and the output system may be connected through a bus or in other ways.
  • the connection through a bus is taken as an example.
  • the memory can be used to store non-transitory software programs, non-transitory computer executable programs, and modules.
  • the processor executes various functional applications and data processing of the electronic device by running non-transitory software programs, instructions, and modules stored in the memory, that is, realizing the processing methods of the foregoing method embodiments.
  • the memory may include a program storage area and a data storage area, where the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data and the like.
  • the memory may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid state storage devices.
  • the memory may optionally include a memory remotely provided with respect to the processor, and these remote memories may be connected to the processing system through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input system can receive input digital or character information, and generate signal input.
  • the output system may include display devices such as a display screen.
  • the one or more modules are stored in the memory, and when executed by the one or more processors, the following operations of any of the foregoing method embodiments are performed:
  • Step a Constructing an object geographic coordinate database, performing reference point topology matching and video geographic registration on the surveillance video according to object recognition, and determining the geographic coordinates of the surveillance video pixels;
  • Step b Perform personnel detection location calculation on the surveillance video to obtain the geographic location of the person to be tracked;
  • Step c Combining the geographic locations of the persons to be tracked detected by multiple nearby videos, and using the maximum likelihood estimation method to perform cross-video re-identification and analysis of persons on multiple surveillance videos to obtain the continuous spatio-temporal trajectory of the persons to be tracked.
  • the embodiments of the present application provide a non-transitory (non-volatile) computer storage medium.
  • the computer storage medium stores computer-executable instructions, and the computer-executable instructions can perform the following operations:
  • Step a Build an object geographic coordinate database, perform reference point topology matching and video geographic registration on the surveillance video according to object recognition, and determine the geographic coordinates of the surveillance video pixels;
  • Step b Perform personnel detection location calculation on the surveillance video to obtain the geographic location of the person to be tracked;
  • Step c Combine the geographic locations of the persons to be tracked detected by multiple nearby videos, and use the maximum likelihood estimation method to perform cross-video re-identification analysis of persons on multiple surveillance videos to obtain the continuous spatio-temporal trajectory of the persons to be tracked.
  • the embodiment of the present application provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer To make the computer do the following:
  • Step a Constructing an object geographic coordinate database, performing reference point topology matching and video geographic registration on the surveillance video according to object recognition, and determining the geographic coordinates of the surveillance video pixels;
  • Step b Perform personnel detection location calculation on the surveillance video to obtain the geographic location of the person to be tracked;
  • Step c Combine the geographic locations of the persons to be tracked detected by multiple nearby videos, and use the maximum likelihood estimation method to perform cross-video re-identification analysis of persons on multiple surveillance videos to obtain the continuous spatio-temporal trajectory of the persons to be tracked.
  • the cross-video personnel positioning and tracking method, system, and electronic equipment of the embodiments of the present application obtain the location information of the reference object through object recognition to perform the geographic registration of the surveillance video, obtain the geographic location information of the personnel and obtain the movement spatiotemporal trajectory through the personnel detection, and the calculation is simple. It also has geographic location information, which has better application value in the multi-channel video system scenario. At the same time, this application introduces personnel geographic location information based on video geographic calibration, performs maximum likelihood estimation during cross-video personnel re-identification, and reduces the difficulty and system complexity of visual personnel re-identification trajectory tracking algorithm. Better application value.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

一种跨视频人员定位追踪方法、系统及电子设备。包括:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定监控视频像素点的地理坐标(S200);对监控视频进行人员检测位置计算,获取人员地理位置(S300);结合多个临近视频检测的人员位置信息,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到人员连续时空轨迹(S400)。基于视频地理标定引入人员地理位置信息,在跨视频人员重识别时进行最大似然估计,减少视觉人员重识别轨迹追踪算法难度、系统复杂度,在多路跨视频系统场景下有更好的应用价值。

Description

一种跨视频人员定位追踪方法、系统及设备 技术领域
本申请属于行人定位追踪技术领域,特别涉及一种跨视频人员定位追踪方法、系统及电子设备。
背景技术
目前视频人员定位方法主要采用环境测量定位方法和基于摄像头的定位方法。其中,环境测量定位方法是指在室外人员定位或测距。在军用领域可以通过激光测距或超声波测距,精度达到厘米级。虽然这类方法精度高、测量人员具有地理位置信息,但难以进行连续、大范围、多人员测定。
基于摄像头的定位方法分为双目定位和单目定位,其中,双目定位应用于机器人精准定位领域通过构建三维空间的SLAM(simultaneous localization and mapping)方法,即时定位与地图构建,并不适用于公共区域摄像头监控系统。且基于视频的单目测距、双目测距的方法存在构建本地数据库工作量大、提取特征信息计算量大、特征信息受外部因素影响严重等问题仍制约着视觉定位的精度和可用性。
地理信息构建方法主要基于其图像内容判断该图像的地理位置信息。一般来说需要对事先存在的带有地理位置标注的图像数据库进行训练得到分类器或者从中查找与查询图像类似的图像。因此,基于图像的地理位置识别都需要首先对图像提取视觉特征用于比对不同图像之间的相似度。然而在传统的监控场景下,无法提前获取这些先验图像数据。
基于摄像头的定位方法由于不属于测量领域,所以需引入图像地理信息构建方法才能获取人员地理信息。在跨视频追踪领域,针对同一人员追踪主要采用引入SIFT参数的方法,即尽可能多的获取待追踪人员的细节并存储到数据 库,当同一人员出现会匹配数据库的人员,进行重识别。在重识别领域,常见的跨视频追踪方法有模板匹配,用给定的模板去待匹配图像区域搜索,根据计算的匹配度,得到匹配结果。但是这种人员匹配方法需要事先给定模板,对待搜索待匹配的区域要求严格,由于匹配和搜索的存在,导致在连续视频的计算效率低、耗时长,无法完全应用于多视频系统。
另外,现有的人员(行人)追踪系统主要采用寻找人员、定位人员的方法,在多视频联动下的跨视频追踪时,会引入更多人员特征参数的方法进行追踪。但由于无法获取人员位置信息,现有人员追踪系统难以获取人员连续时空轨迹。
发明内容
本申请提供了一种跨视频人员定位追踪方法、系统及电子设备,旨在至少在一定程度上解决现有技术中的上述技术问题之一。
为了解决上述问题,本申请提供了如下技术方案:
一种跨视频人员定位追踪方法,包括以下步骤:
步骤a:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
步骤b:对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
步骤c:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
本申请实施例采取的技术方案还包括:在所述步骤a中,所述根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准具体包括:
采用物体识别算法对所述监控视频进行物体识别分类,得到所述监控视频中的参照点;
将所述参照点与所述物体地理数据库中的物体进行匹配,得到所述监控视频中的同名控制点地理位置信息;
采用世界地理坐标系转换方法,对所述监控视频中地面区域的同名控制点进行地理配准,使得所述监控视频具有地理位置信息。
本申请实施例采取的技术方案还包括:在所述步骤a中,所述根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准还包括:
对所述监控视频进行图像预处理,得到经过鱼眼校准的视频图像;
截取所述预处理后的视频图像中的某一帧二维图像,采用边缘检测及分水岭分割方法对所述二维图像进行边缘提取,得到所述二维图像中具有GIS信息的地面区域。
本申请实施例采取的技术方案还包括:在所述步骤b中,所述对所述监控视频进行人员检测位置计算具体为:
采用帧差法对所述监控视频中的运动物体进行检测,并结合人头检测器对待追踪人员的位置进行定位,获取待追踪人员的头部信息。
本申请实施例采取的技术方案还包括:所述人头检测器采用基于卷积神经网络CNN的人员检测方法,所述卷积神经网络包括输入层、卷积层、池化层、全连接层和输出层,复合多个卷积层和池化层对输入数据进行加工,并通过连接层进行与输出目标之间的映射。
本申请实施例采取的技术方案还包括:在所述步骤b中,所述获取人员地理位置还包括:
基于所述头部信息对所述待追踪人员进行移动检测,得到所述待追踪人员的脚下像素点,所述脚下像素点即为待追踪人员的地理位置信息。
本申请实施例采取的技术方案还包括:在所述步骤b中,所述获取人员地理位置还包括:
通过抑制摄像头本身的方法对所述待追踪人员的地理位置信息进行校准。
本申请实施例采取的技术方案还包括:在所述步骤c中,所述采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析还包括:
当所述多路监控视频内存在拍摄场景移动时,则触发地理区域重叠判定;所述地理区域重叠判定具体为:
将所述多路监控视频的拍摄场景的地理信息进行定位,根据重叠的地理位置区域划分各个摄像装置的监控区域;通过连接所述待追踪人员在连续帧中的地理信息空间坐标得到所述待追踪人员的连续轨迹,当所述待追踪人员的地理信息空间坐标超出当前摄像头的监控区域并移动到下一个摄像头的监控区域时,则触发下一个摄像头进行人员轨迹追踪。
本申请实施例采取的另一技术方案为:一种跨视频人员定位追踪系统,包括:
视频地理配准模块:用于构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
跨视频人员定位追踪模块:用于对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
多视频轨迹追踪模块:用于结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述 待追踪人员连续时空轨迹。
本申请实施例采取的又一技术方案为:一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的跨视频人员定位追踪方法的以下操作:
步骤a:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
步骤b:对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
步骤c:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
相对于现有技术,本申请实施例产生的有益效果在于:本申请实施例的跨视频人员定位追踪方法、系统及电子设备通过物体识别获取参照物位置信息进行监控视频的地理配准,通过人员检测获取人员地理位置信息、获取其移动时空轨迹,运算简单,且具有地理位置信息,在多路视频系统场景下有更好的应用价值。本申请基于视频地理标定引入人员地理位置信息,在跨视频人员重识别时进行最大似然估计,减少视觉人员重识别轨迹追踪算法难度、系统复杂度,在多路跨视频系统场景下有更好的应用价值。
附图说明
图1是本申请实施例的跨视频人员定位追踪方法的流程图;
图2是本申请实施例的监控视频地理配准算法示意图;
图3(a)为预处理前的视频图像,图3(b)为预处理后的视频图像;
图4为摄像机成像模型示意图;
图5(a)、(b)为像素坐标与世界地理坐标的空间关系示意图,其中,图5(a)为像素坐标,图5(b)为世界地理坐标;
图6是本申请实施例的人员检测算法示意图;
图7是本申请实施例的帧差法流程图;
图8为最大似然估计示例图;
图9是本申请实施例的基于地理区域重叠判定的跨视频人员追踪算法流程图;
图10是本申请实施例的跨视频人员定位追踪系统的结构示意图;
图11是本申请实施例提供的跨视频人员定位追踪方法的硬件设备结构示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
请参阅图1,是本申请实施例的跨视频人员定位追踪方法的流程图。本申请实施例的跨视频人员定位追踪方法包括以下步骤:
步骤100:获取待追踪人员的监控视频;
步骤200:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定监控视频像素点的地理坐标;
本步骤中,物体地理坐标数据库包括百度或者已有的BIM文件数据(建筑工程用数据,包含建筑物外观和地理位置)等第三方地理信息库。为得到更精确的GPS位置信息,采用标准WGS84坐标系的方式进行参照点标定,根据识别物体的BIM(Building Information Modeling,建筑信息模型)信息与公共区域WGS84坐标系的坐标点对应,并作为参照点,提高跟踪人员空间计算的准确度。物体地理数据库应用于世界坐标系和图像坐标系转换后的对应,利用已有物体地理信息和图像识别物体,对监控视频中的像素点与现实实际坐标进行配准,确定监控视频的空间位置坐标。
进一步地,请一并参阅图2,是本申请实施例的监控视频地理配准算法示意图。本申请实施例的监控视频地理配准算法包括:
步骤210:采用物体识别算法对监控视频进行物体识别分类,得到监控视频中的参照点,同时,对监控视频进行图像预处理,得到经过鱼眼校准的视频图像;
步210骤中,物体识别算法包括RCNN、YOLO等方法,通过对监控视频进行物体识别分类,得到监控视频中的参照点,将该参照点与物体地理数据库中的物体进行模糊匹配。
在实际情况中,摄像机由于成像原件制造水平、装配水平等因素影响,视频拍摄时会存在边缘扭曲情况,且越是广角的镜头边缘扭曲越明显,从而导致监控视频的非线性畸变,因此需要对摄像机非线性畸变纠正,提高计算的准确性。本申请通过棋盘矫正法对监控视频进行预处理,计算出鱼眼的内参和矫正系数,然后对监控视频进行矫正及剪裁,去掉部分边缘部分,得到鱼眼校准后的视频图像,预处理后的视频图像可以减少因非线性畸变导致的世界坐标系转换时的误差。具体如图3(a)、(b)所示,其中,图3(a)为预处理前的视频 图像,图3(b)为预处理后的视频图像。
具体的,非线性畸变一般为几何畸变,它使像素点坐标与理想像素点坐标存在一定偏移,可表示为:
Figure PCTCN2020085081-appb-000001
式(1)中,(u,v)为理想的像素坐标,(u′,v′)为受到畸变影响的像素坐标。非线性畸变δ u、δ v的表达式可写为:
Figure PCTCN2020085081-appb-000002
式(2)中,δ u、δ v中的第一项受到摄像机元件的影响,第二项和第三项是由于摄像机成像原件的不准确产生,参数p 1、p 2、k 1、k 2、s 1、s 2为非线性畸变参数。通过计算非线性畸变参数的数值,还原图像畸变。
步骤220:将识别到的参照点与物体地理数据库中的物体进行地理配准,得到监控视频中的同名控制点地理位置信息;
步骤230:截取预处理后的视频图像中的某一帧二维图像,采用边缘检测及分水岭分割方法对该二维图像进行边缘提取,得到二维图像中具有GIS信息的地面区域;
步骤230中,利用监控视频的特殊空间结构关系,视频下方区域为地面部分,垂直物体会对地面产生遮挡。因此,本申请通过边缘检测技术和分水岭算法,计算图像的水平结构和垂直结构。
步骤240:采用世界地理坐标系转换的方法,对监控视频中地面区域的同名控制点进行地理配准,使得监控视频具有地理位置信息;
步骤240中,根据世界地理坐标系转换计算得到偏移矩阵,通过图像拉伸、填充、切割等方法,将监控视频中地面区域的每个像素点分别进行地理坐标匹 配,得到监控视频中具有地理信息的平面。通过对比物体地理数据库得到识别物体的实际地理位置和观测到的物体的相对位置。由于矩阵计算一般通过四个点完成一个平面的像素坐标和世界坐标转换,因此通过世界坐标系转换矩阵计算得到的结果为预估值,并非准确值。为控制预估值的误差,本申请在有更多的同名控制点的情况下对图像进行约束计算。在有多个三维空间点X w=(X w,Y W,Z W) T投影到成像平面得到相对应的二维平面点m=(u,v) T时,通过多次三角形区域的矩阵计算,会得到多个三维空间点X w=(X w,Y W,Z W) T。通过带入多个同名控制点多次计算求均值的方法,可减少因形变量过大导致的计算误差。
摄像机成像模型如图4所示。将客观世界中的一个三维空间点X w=(X w,Y W,Z W) T投影到成像平面得到相对应的二维平面点m=(u,v) T,是根据不同坐标系之间的坐标变换实现的。具体的,世界坐标系与摄像机坐标系之间的坐标变换公式为:
Figure PCTCN2020085081-appb-000003
式(3)中,X c=(X c,Y c,Z c) T表示点X W在摄像机坐标系下的三维坐标:R是3×3的旋转变换矩阵,t是3×1的平移变换矩阵,R和t分别代表世界坐标系与摄像机坐标系之间的相对姿态和位置。
摄像机坐标系与图像坐标系之间的坐标变换方法是将摄像机坐标系下的一个三维空间点P(X,Y,Z)投影到成像平面得到对应的二维平面点p(x,y),x、y与X、Y的关系可表示:
Figure PCTCN2020085081-appb-000004
式(4)中,f为摄像机焦距。通过已知的具有地理位置信息的世界地理坐标平面点P(X,Y,Z)和成像平面的二维坐标点p(x,y),可以计算出摄像机所拍摄的区域所对应的地理位置。具体如图4所示,为摄像机成像模型示意图。
根据图像坐标系O′xy和像素坐标系o′ xy的坐标关系可得:
Figure PCTCN2020085081-appb-000005
再有空间几何关系可得:
Figure PCTCN2020085081-appb-000006
综合以上公式可得识别的物体或像素区域在地面上的垂直方向距离为:
Figure PCTCN2020085081-appb-000007
因此通过Y值可判定出识别物体所在的高度,当Y值小于设定地面的阈值(可根据实际情况设置精确值)时,可判定地面区域,确定地面坐标。通过计算X c=(X c,Y c,Z c) T与像素点p(x,y)可计算出偏移矩阵,通过矩阵偏移和填充的方式,可将图片中某一块的地面区域形变为匹配物体地理数据库的实际地面矩阵,实现像素坐标和地理坐标的对应。图5(a)、(b)为像素坐标与世界地理坐标的空间关系示意图,其中,图5(a)为像素坐标,图5(b)为世界地理坐标。
在世界坐标系转换中,由于图像在成像时会存在偏移误差,会导致经过转移矩阵世界坐标系误差更大,因此,为抑制误差,本申请提出多种抑制误差的 方法对人员像素点进行位置校准。识别物体后,以物体识别框底部中心点为此物体的地理位置,并作为基准点计算;首先将图像中的二维平面点(x,y)与其在像素坐标系中对应点(u,v)的关系通过下式表示:
Figure PCTCN2020085081-appb-000008
用坐标变换形式可将上式表示为:
Figure PCTCN2020085081-appb-000009
式(9)中,d x,d y代表像素在摄像u轴和v轴和方向上的物理距离,(u 0,v 0)是摄像机主点在像素坐标系下的坐标。通过联立可得:
Figure PCTCN2020085081-appb-000010
式(10)中,f u,f v分别代表使用像素宽度和高度作单位时的焦距长度。矩阵
Figure PCTCN2020085081-appb-000011
中的参数被称为摄像机内参数,它们只受相机内部结构和成像特点的影响。矩阵R和矩阵t的参数被称为相机外参数。P=K[R,t]被称为透视投影矩阵。由此实现了平面点与摄像机点的转换,可计算出任何一点(x,y)与已知点(u,v)的距离,进而算出点(x,y)的GPS信息。
步骤300:对监控视频进行人员检测位置计算,获取待追踪人员地理位置;
请一并参阅图6,是本申请实施例的人员检测算法示意图。本申请实施例 的人员检测算法包括:
步骤310:对监控视频进人员头检测,获取待追踪人员的头部信息;
步骤310中,人头检测是一种快速识别人员头部模型的方法,适用于多路的监控视频。为提高人员检测的时效性,本申请采用帧差法对监控视频中的运动物体进行检测,并结合人头检测器对人员位置进行定位。人头检测器采用基于卷积神经网络CNN的人员检测方法,卷积神经网络由输入层、卷积层、池化层、全连接层和输出层构成,其复合了多个卷积层和池化层对输入数据进行加工,并在连接层实现与输出人员之间的映射。帧差法流程图如图7所示,其具体包括:记视频序列中第n+1帧、第n帧和第n-1帧的图像分别为fn+1、fn和fn-1,三帧图像对应像素点的灰度值分别记为fn+1(x,y)、fn(x,y)和fn-1(x,y),分别得到差分图像Dn+1和Dn,对差分图像Dn+1和Dn进行操作,然后再进行阈值处理、连通性分析,最终检测出运动物体。
步骤320:基于人头检测结果对待追踪人员进行移动检测(人员位置计算),得到待追踪人员的脚下像素点,即为待追踪人员的地理位置信息;
步骤320中,由于人员是移动物体,根据视频的特点,运动物体的最下面区域为脚站立的位置,即待追踪人员的定位坐标点为脚部区域,而并不能以头部坐标点作为地理位置计算的空间坐标人员,因此本申请通过人头检测和移动检测相结合的方法快速寻找到待追踪人员运动区域对应的脚下像素点,相较于一般人员姿态检测、SIFE特征追踪等方法,本申请具有更快速的检测效率,且在复杂环境下的鲁棒性更强,在人员识别准确率上也具有很好的表现。
步骤330:对待追踪人员的地理位置信息进行校准;
步骤330中,本申请通过抑制摄像头本身的方法对待追踪人员的地理位置信息进行校准,从而减少人员移动时因模糊导致的不确定误差,提高定位准确 性。
步骤400:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到待追踪人员连续时空轨迹;
步骤400中,当人员进行跨视频移动时,虽然具有待追踪人员的地理位置信息,但存在无法确定是否为同一人员的情况,本申请通过对多个视频中同一待追踪人员的移动轨迹进行最大似然估计,通过概率计算判断是否为同一人员。具体为:给定一个概率分布D,假定其概率密度函数(连续分布)或概率聚集函数(离散分布)为f D,以及一个分布参数θ,可以从该概率分布D中抽出一个具有n个值的采样x 1,x 2,...,x n,然后估计θ。通过利用f D计算出其概率:
P=(x 1,x 2,x 3…,x n)=f D(x 1,x 2,x 3…,x n|θ)  (11)
最大似然估计会寻找关于θ的最可能的值(即,在所有可能的θ取值中,寻找一个值使这个采样的“可能性”最大化)。要在数学上实现最大似然估计法,首先要定义可能性:
lik(θ)=f D(x 1,x 2,x 3…,x n|θ)  (12)
并且在θ的所有取值上,使这个函数最大化。这个使可能性最大的值即为θ的最大似然估计。通过比对两个视频中获取的轨迹,判断最大似然度,设定阈值后,判断是否为同一人员。具体如图8所示,为最大似然估计示例图。
当待追踪人员存在跨视频移动时,会存在拍摄场景移动的情况,此时,本申请将触发地理区域重叠判定,具体如图9所示,是本申请实施例的基于地理区域重叠判定的跨视频人员追踪算法流程图。地理区域重叠判定是将多路监控视频的拍摄场景的地理信息进行定位后,根据重叠的地理位置区域划分各自追踪的监控区域。通过连接待追踪人员在连续帧中的地理信息空间坐标得到待追 踪人员的连续轨迹,当待追踪人员的地理信息空间坐标超出当前摄像头的监控区域并移动到下一个摄像头的监控区域时,则触发下一个摄像头进行人员轨迹追踪。
请参阅图10,是本申请实施例的跨视频人员定位追踪系统的结构示意图。本申请实施例的跨视频人员定位追踪系统包括:
视频地理配准模块:用于构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定监控视频像素点的地理坐标;
跨视频人员定位追踪模块:用于对监控视频进行人员检测位置计算,获取待追踪人员地理位置;
多视频轨迹追踪模块:用于结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到待追踪人员连续时空轨迹。
图11是本申请实施例提供的跨视频人员定位追踪方法的硬件设备结构示意图。如图11所示,该设备包括一个或多个处理器以及存储器。以一个处理器为例,该设备还可以包括:输入系统和输出系统。
处理器、存储器、输入系统和输出系统可以通过总线或者其他方式连接,图11中以通过总线连接为例。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序、非暂态计算机可执行程序以及模块。处理器通过运行存储在存储器中的非暂态软件程序、指令以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述方法实施例的处理方法。
存储器可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储数据等。此外,存 储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施例中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至处理系统。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入系统可接收输入的数字或字符信息,以及产生信号输入。输出系统可包括显示屏等显示设备。
所述一个或者多个模块存储在所述存储器中,当被所述一个或者多个处理器执行时,执行上述任一方法实施例的以下操作:
步骤a:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
步骤b:对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
步骤c:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例提供的方法。
本申请实施例提供了一种非暂态(非易失性)计算机存储介质,所述计算机存储介质存储有计算机可执行指令,该计算机可执行指令可执行以下操作:
步骤a:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
步骤b:对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
步骤c:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
本申请实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行以下操作:
步骤a:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
步骤b:对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
步骤c:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
本申请实施例的跨视频人员定位追踪方法、系统及电子设备通过物体识别获取参照物位置信息进行监控视频的地理配准,通过人员检测获取人员地理位置信息、获取其移动时空轨迹,运算简单,且具有地理位置信息,在多路视频系统场景下有更好的应用价值。同时,本申请基于视频地理标定引入人员地理位置信息,在跨视频人员重识别时进行最大似然估计,减少视觉人员重识别轨迹追踪算法难度、系统复杂度,在多路跨视频系统场景下有更好的应用价值。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰, 这些改进和润饰也应视为本发明的保护范围。

Claims (10)

  1. 一种跨视频人员定位追踪方法,其特征在于,包括以下步骤:
    步骤a:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
    步骤b:对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
    步骤c:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
  2. 根据权利要求1所述的跨视频人员定位追踪方法,其特征在于,在所述步骤a中,所述根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准具体包括:
    采用物体识别算法对所述监控视频进行物体识别分类,得到所述监控视频中的参照点;
    将所述参照点与所述物体地理数据库中的物体进行匹配,得到所述监控视频中的同名控制点地理位置信息;
    采用世界地理坐标系转换方法,对所述监控视频中地面区域的同名控制点进行地理配准,使得所述监控视频具有地理位置信息。
  3. 根据权利要求2所述的跨视频人员定位追踪方法,其特征在于,在所述步骤a中,所述根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准还包括:
    对所述监控视频进行图像预处理,得到经过鱼眼校准的视频图像;
    截取所述预处理后的视频图像中的某一帧二维图像,采用边缘检测及分水岭 分割方法对所述二维图像进行边缘提取,得到所述二维图像中具有GIS信息的地面区域。
  4. 根据权利要求1至3任一项所述的跨视频人员定位追踪方法,其特征在于,在所述步骤b中,所述对所述监控视频进行人员检测位置计算具体为:
    采用帧差法对所述监控视频中的运动物体进行检测,并结合人头检测器对待追踪人员的位置进行定位,获取待追踪人员的头部信息。
  5. 根据权利要求4所述的跨视频人员定位追踪方法,其特征在于,所述人头检测器采用基于卷积神经网络CNN的人员检测方法,所述卷积神经网络包括输入层、卷积层、池化层、全连接层和输出层,复合多个卷积层和池化层对输入数据进行加工,并通过连接层进行与输出目标之间的映射。
  6. 根据权利要求4所述的跨视频人员定位追踪方法,其特征在于,在所述步骤b中,所述获取人员地理位置还包括:
    基于所述头部信息对所述待追踪人员进行移动检测,得到所述待追踪人员的脚下像素点,所述脚下像素点即为待追踪人员的地理位置信息。
  7. 根据权利要求6所述的跨视频人员定位追踪方法,其特征在于,在所述步骤b中,所述获取人员地理位置还包括:
    通过抑制摄像头本身的方法对所述待追踪人员的地理位置信息进行校准。
  8. 根据权利要求7所述的跨视频人员定位追踪方法,其特征在于,在所述步骤c中,所述采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析还包括:
    当所述多路监控视频内存在拍摄场景移动时,则触发地理区域重叠判定;所述地理区域重叠判定具体为:
    将所述多路监控视频的拍摄场景的地理信息进行定位,根据重叠的地理位置区域划分各个摄像装置的监控区域;通过连接所述待追踪人员在连续帧中的地理信息空间坐标得到所述待追踪人员的连续轨迹,当所述待追踪人员的地理信息空间坐标超出当前摄像头的监控区域并移动到下一个摄像头的监控区域时,则触发下一个摄像头进行人员轨迹追踪。
  9. 一种跨视频人员定位追踪系统,其特征在于,包括:
    视频地理配准模块:用于构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
    跨视频人员定位追踪模块:用于对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
    多视频轨迹追踪模块:用于结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
  10. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述1至8任一项所述的跨视频人员定位追踪方法的以下操作:
    步骤a:构建物体地理坐标数据库,根据物体识别对监控视频进行参照点拓扑匹配、视频地理配准,确定所述监控视频像素点的地理坐标;
    步骤b:对所述监控视频进行人员检测位置计算,获取待追踪人员地理位置;
    步骤c:结合多个临近视频检测的待追踪人员地理位置,采用最大似然估计 方法对多路监控视频进行人员跨视频重识别分析,得到所述待追踪人员连续时空轨迹。
PCT/CN2020/085081 2020-04-03 2020-04-16 一种跨视频人员定位追踪方法、系统及设备 WO2021196294A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010259428.3A CN111462200B (zh) 2020-04-03 2020-04-03 一种跨视频行人定位追踪方法、系统及设备
CN202010259428.3 2020-04-03

Publications (1)

Publication Number Publication Date
WO2021196294A1 true WO2021196294A1 (zh) 2021-10-07

Family

ID=71680274

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085081 WO2021196294A1 (zh) 2020-04-03 2020-04-16 一种跨视频人员定位追踪方法、系统及设备

Country Status (2)

Country Link
CN (1) CN111462200B (zh)
WO (1) WO2021196294A1 (zh)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114040168A (zh) * 2021-11-16 2022-02-11 西安热工研究院有限公司 一种用于火力电厂的智慧型电力网络监控机构
CN114240997A (zh) * 2021-11-16 2022-03-25 南京云牛智能科技有限公司 一种智慧楼宇在线跨摄像头多目标追踪方法
CN114550041A (zh) * 2022-02-18 2022-05-27 中国科学技术大学 一种多摄像头拍摄视频的多目标标注方法
CN114862973A (zh) * 2022-07-11 2022-08-05 中铁电气化局集团有限公司 基于固定点位的空间定位方法、装置、设备及存储介质
CN115033960A (zh) * 2022-06-09 2022-09-09 中国公路工程咨询集团有限公司 Bim模型和gis系统的自动融合方法及装置
CN115457449A (zh) * 2022-11-11 2022-12-09 深圳市马博士网络科技有限公司 一种基于ai视频分析和监控安防的预警系统
CN115527162A (zh) * 2022-05-18 2022-12-27 湖北大学 一种基于三维空间的多行人重识别方法、系统
CN115578756A (zh) * 2022-11-08 2023-01-06 杭州昊恒科技有限公司 基于精准定位和视频联动的人员精细化管理方法及系统
CN115731287A (zh) * 2022-09-07 2023-03-03 滁州学院 基于集合与拓扑空间的运动目标检索方法
CN115808170A (zh) * 2023-02-09 2023-03-17 宝略科技(浙江)有限公司 一种融合蓝牙与视频分析的室内实时定位方法
CN115856980A (zh) * 2022-11-21 2023-03-28 中铁科学技术开发有限公司 一种编组站作业人员监控方法和系统
CN115979250A (zh) * 2023-03-20 2023-04-18 山东上水环境科技集团有限公司 基于uwb模块、语义地图与视觉信息的定位方法
CN116189116A (zh) * 2023-04-24 2023-05-30 江西方兴科技股份有限公司 一种交通状态感知方法及系统
CN116631596A (zh) * 2023-07-24 2023-08-22 深圳市微能信息科技有限公司 一种放射人员工作时长的监控管理系统及方法
CN116740878A (zh) * 2023-08-15 2023-09-12 广东威恒输变电工程有限公司 一种多摄像头协同的全局区域双向绘制的定位预警方法
CN117185064A (zh) * 2023-08-18 2023-12-08 山东五棵松电气科技有限公司 一种智慧社区管理系统、方法、计算机设备及存储介质
CN117058331B (zh) * 2023-10-13 2023-12-19 山东建筑大学 基于单个监控摄像机的室内人员三维轨迹重建方法及系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070003A (zh) * 2020-09-07 2020-12-11 深延科技(北京)有限公司 基于深度学习的人员追踪方法和系统
CN112184814B (zh) * 2020-09-24 2022-09-02 天津锋物科技有限公司 定位方法和定位系统
CN112163537B (zh) * 2020-09-30 2024-04-26 中国科学院深圳先进技术研究院 一种行人异常行为检测方法、系统、终端以及存储介质
WO2022067606A1 (zh) * 2020-09-30 2022-04-07 中国科学院深圳先进技术研究院 一种行人异常行为检测方法、系统、终端以及存储介质
CN112766210A (zh) * 2021-01-29 2021-05-07 苏州思萃融合基建技术研究所有限公司 建筑施工的安全监控方法、装置及存储介质
CN113190711A (zh) * 2021-03-26 2021-07-30 南京财经大学 地理场景中视频动态对象轨迹时空检索方法及系统
CN113435329B (zh) * 2021-06-25 2022-06-21 湖南大学 一种基于视频轨迹特征关联学习的无监督行人重识别方法
CN113627497B (zh) * 2021-07-27 2024-03-12 武汉大学 一种基于时空约束的跨摄像头行人轨迹匹配方法
CN113837023A (zh) * 2021-09-02 2021-12-24 北京新橙智慧科技发展有限公司 一种跨摄像头行人自动追踪方法
CN117237418B (zh) * 2023-11-15 2024-01-23 成都航空职业技术学院 一种基于深度学习的运动目标检测方法和系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472332A (zh) * 2015-12-01 2016-04-06 杨春光 基于定位技术与视频技术的分析方法及其分析系统
CN105913037A (zh) * 2016-04-26 2016-08-31 广东技术师范学院 基于人脸识别与射频识别的监控跟踪系统
CN107547865A (zh) * 2017-07-06 2018-01-05 王连圭 跨区域人体视频目标跟踪智能监控方法
WO2018087545A1 (en) * 2016-11-08 2018-05-17 Staffordshire University Object location technique
CN110147471A (zh) * 2019-04-04 2019-08-20 平安科技(深圳)有限公司 基于视频的轨迹跟踪方法、装置、计算机设备及存储介质
CN110375739A (zh) * 2019-06-26 2019-10-25 中国科学院深圳先进技术研究院 一种移动端视觉融合定位方法、系统及电子设备
CN110414441A (zh) * 2019-07-31 2019-11-05 浙江大学 一种行人行踪分析方法及系统
WO2020055928A1 (en) * 2018-09-10 2020-03-19 Mapbox, Inc. Calibration for vision in navigation systems

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101548639B1 (ko) * 2014-12-10 2015-09-01 한국건설기술연구원 감시 카메라 시스템의 객체 추적장치 및 그 방법
US10372970B2 (en) * 2016-09-15 2019-08-06 Qualcomm Incorporated Automatic scene calibration method for video analytics
CN107153824A (zh) * 2017-05-22 2017-09-12 中国人民解放军国防科学技术大学 基于图聚类的跨视频行人重识别方法
WO2019145018A1 (en) * 2018-01-23 2019-08-01 Siemens Aktiengesellschaft System, device and method for detecting abnormal traffic events in a geographical location
CN109461132B (zh) * 2018-10-31 2021-04-27 中国人民解放军国防科技大学 基于特征点几何拓扑关系的sar图像自动配准方法
CN110717414B (zh) * 2019-09-24 2023-01-03 青岛海信网络科技股份有限公司 一种目标检测追踪方法、装置及设备
CN110765903A (zh) * 2019-10-10 2020-02-07 浙江大华技术股份有限公司 行人重识别方法、装置及存储介质
CN110706259B (zh) * 2019-10-12 2022-11-29 四川航天神坤科技有限公司 一种基于空间约束的可疑人员跨镜头追踪方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472332A (zh) * 2015-12-01 2016-04-06 杨春光 基于定位技术与视频技术的分析方法及其分析系统
CN105913037A (zh) * 2016-04-26 2016-08-31 广东技术师范学院 基于人脸识别与射频识别的监控跟踪系统
WO2018087545A1 (en) * 2016-11-08 2018-05-17 Staffordshire University Object location technique
CN107547865A (zh) * 2017-07-06 2018-01-05 王连圭 跨区域人体视频目标跟踪智能监控方法
WO2020055928A1 (en) * 2018-09-10 2020-03-19 Mapbox, Inc. Calibration for vision in navigation systems
CN110147471A (zh) * 2019-04-04 2019-08-20 平安科技(深圳)有限公司 基于视频的轨迹跟踪方法、装置、计算机设备及存储介质
CN110375739A (zh) * 2019-06-26 2019-10-25 中国科学院深圳先进技术研究院 一种移动端视觉融合定位方法、系统及电子设备
CN110414441A (zh) * 2019-07-31 2019-11-05 浙江大学 一种行人行踪分析方法及系统

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114240997A (zh) * 2021-11-16 2022-03-25 南京云牛智能科技有限公司 一种智慧楼宇在线跨摄像头多目标追踪方法
CN114240997B (zh) * 2021-11-16 2023-07-28 南京云牛智能科技有限公司 一种智慧楼宇在线跨摄像头多目标追踪方法
CN114040168A (zh) * 2021-11-16 2022-02-11 西安热工研究院有限公司 一种用于火力电厂的智慧型电力网络监控机构
CN114550041A (zh) * 2022-02-18 2022-05-27 中国科学技术大学 一种多摄像头拍摄视频的多目标标注方法
CN114550041B (zh) * 2022-02-18 2024-03-29 中国科学技术大学 一种多摄像头拍摄视频的多目标标注方法
CN115527162A (zh) * 2022-05-18 2022-12-27 湖北大学 一种基于三维空间的多行人重识别方法、系统
CN115033960B (zh) * 2022-06-09 2023-04-07 中国公路工程咨询集团有限公司 Bim模型和gis系统的自动融合方法及装置
CN115033960A (zh) * 2022-06-09 2022-09-09 中国公路工程咨询集团有限公司 Bim模型和gis系统的自动融合方法及装置
CN114862973A (zh) * 2022-07-11 2022-08-05 中铁电气化局集团有限公司 基于固定点位的空间定位方法、装置、设备及存储介质
CN115731287A (zh) * 2022-09-07 2023-03-03 滁州学院 基于集合与拓扑空间的运动目标检索方法
CN115578756A (zh) * 2022-11-08 2023-01-06 杭州昊恒科技有限公司 基于精准定位和视频联动的人员精细化管理方法及系统
CN115457449B (zh) * 2022-11-11 2023-03-24 深圳市马博士网络科技有限公司 一种基于ai视频分析和监控安防的预警系统
CN115457449A (zh) * 2022-11-11 2022-12-09 深圳市马博士网络科技有限公司 一种基于ai视频分析和监控安防的预警系统
CN115856980A (zh) * 2022-11-21 2023-03-28 中铁科学技术开发有限公司 一种编组站作业人员监控方法和系统
CN115808170B (zh) * 2023-02-09 2023-06-06 宝略科技(浙江)有限公司 一种融合蓝牙与视频分析的室内实时定位方法
CN115808170A (zh) * 2023-02-09 2023-03-17 宝略科技(浙江)有限公司 一种融合蓝牙与视频分析的室内实时定位方法
CN115979250A (zh) * 2023-03-20 2023-04-18 山东上水环境科技集团有限公司 基于uwb模块、语义地图与视觉信息的定位方法
CN116189116A (zh) * 2023-04-24 2023-05-30 江西方兴科技股份有限公司 一种交通状态感知方法及系统
CN116189116B (zh) * 2023-04-24 2024-02-23 江西方兴科技股份有限公司 一种交通状态感知方法及系统
CN116631596A (zh) * 2023-07-24 2023-08-22 深圳市微能信息科技有限公司 一种放射人员工作时长的监控管理系统及方法
CN116631596B (zh) * 2023-07-24 2024-01-02 深圳市微能信息科技有限公司 一种放射人员工作时长的监控管理系统及方法
CN116740878A (zh) * 2023-08-15 2023-09-12 广东威恒输变电工程有限公司 一种多摄像头协同的全局区域双向绘制的定位预警方法
CN116740878B (zh) * 2023-08-15 2023-12-26 广东威恒输变电工程有限公司 一种多摄像头协同的全局区域双向绘制的定位预警方法
CN117185064A (zh) * 2023-08-18 2023-12-08 山东五棵松电气科技有限公司 一种智慧社区管理系统、方法、计算机设备及存储介质
CN117185064B (zh) * 2023-08-18 2024-03-05 山东五棵松电气科技有限公司 一种智慧社区管理系统、方法、计算机设备及存储介质
CN117058331B (zh) * 2023-10-13 2023-12-19 山东建筑大学 基于单个监控摄像机的室内人员三维轨迹重建方法及系统

Also Published As

Publication number Publication date
CN111462200A (zh) 2020-07-28
CN111462200B (zh) 2023-09-19

Similar Documents

Publication Publication Date Title
WO2021196294A1 (zh) 一种跨视频人员定位追踪方法、系统及设备
Walch et al. Image-based localization using lstms for structured feature correlation
CN107240124B (zh) 基于时空约束的跨镜头多目标跟踪方法及装置
JP6095018B2 (ja) 移動オブジェクトの検出及び追跡
CN103325112B (zh) 动态场景中运动目标快速检测方法
Alcantarilla et al. On combining visual SLAM and dense scene flow to increase the robustness of localization and mapping in dynamic environments
US10043097B2 (en) Image abstraction system
Gao et al. Robust RGB-D simultaneous localization and mapping using planar point features
US9275472B2 (en) Real-time player detection from a single calibrated camera
US20220180534A1 (en) Pedestrian tracking method, computing device, pedestrian tracking system and storage medium
CN108470356B (zh) 一种基于双目视觉的目标对象快速测距方法
CN111160291B (zh) 基于深度信息与cnn的人眼检测方法
WO2019075948A1 (zh) 移动机器人的位姿估计方法
Jung et al. Object detection and tracking-based camera calibration for normalized human height estimation
He et al. Ground and aerial collaborative mapping in urban environments
CN115376034A (zh) 一种基于人体三维姿态时空关联动作识别的运动视频采集剪辑方法及装置
CN116468786A (zh) 一种面向动态环境的基于点线联合的语义slam方法
CN110636248B (zh) 目标跟踪方法与装置
CN111829522B (zh) 即时定位与地图构建方法、计算机设备以及装置
CN108694348B (zh) 一种基于自然特征的跟踪注册方法及装置
CN116128919A (zh) 基于极线约束的多时相图像异动目标检测方法及系统
CN114608522A (zh) 一种基于视觉的障碍物识别与测距方法
Sen et al. SceneCalib: Automatic targetless calibration of cameras and LiDARs in autonomous driving
KR102249380B1 (ko) 기준 영상 정보를 이용한 cctv 장치의 공간 정보 생성 시스템
CN110991383B (zh) 一种多相机联合的周界区域人员定位方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928887

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928887

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20928887

Country of ref document: EP

Kind code of ref document: A1