CN111462200A

CN111462200A - Cross-video pedestrian positioning and tracking method, system and equipment

Info

Publication number: CN111462200A
Application number: CN202010259428.3A
Authority: CN
Inventors: 宋亦然; 胡金星; 沈策
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-04-03
Filing date: 2020-04-03
Publication date: 2020-07-28
Anticipated expiration: 2040-04-03
Also published as: CN111462200B; WO2021196294A1

Abstract

The application relates to a cross-video pedestrian positioning and tracking method, a cross-video pedestrian positioning and tracking system and electronic equipment. The method comprises the following steps: constructing an object geographic coordinate database, carrying out reference point topological matching and video geographic registration on a monitoring video according to object identification, and determining geographic coordinates of pixel points of the monitoring video; calculating the pedestrian detection position of the monitoring video to obtain the geographic position of the pedestrian; and combining the pedestrian position information detected by a plurality of adjacent videos, and performing pedestrian cross-video re-recognition analysis on the multi-path monitoring videos by adopting a maximum likelihood estimation method to obtain the continuous space-time trajectory of the pedestrian. The pedestrian geographic position information is introduced based on video geographic calibration, maximum likelihood estimation is carried out when the cross-video pedestrian re-identifies, the difficulty and the system complexity of the visual personnel re-identifying track tracking algorithm are reduced, and the method has better application value in a multi-path cross-video system scene.

Description

Cross-video pedestrian positioning and tracking method, system and equipment

Technical Field

The application belongs to the technical field of pedestrian positioning and tracking, and particularly relates to a cross-video pedestrian positioning and tracking method, a cross-video pedestrian positioning and tracking system and electronic equipment.

Background

The existing pedestrian positioning method mainly adopts an environment measurement positioning method and a positioning method based on a camera. The environment measurement positioning method refers to positioning or ranging of pedestrians outdoors. Can pass through laser rangefinder or ultrasonic ranging in military field, the precision reaches centimetre level. Although such methods are highly accurate and provide geographic location information for pedestrians, continuous, wide-range, multi-pedestrian determination is difficult.

The positioning method based on the camera comprises binocular positioning and monocular positioning, wherein the binocular positioning is applied to the field of accurate positioning of the robot, and is not suitable for a camera monitoring system in a public area through an S L AM (massive localization and mapping) method for constructing a three-dimensional space, and the method is used for instant positioning and map construction.

The geographic information construction method mainly judges the geographic position information of the image based on the image content. It is generally necessary to train a pre-existing database of images with geographic location labels to obtain a classifier or to find images from it that are similar to the query image. Therefore, image-based geographic location identification requires first extracting visual features from images for comparing similarities between different images. However, in the conventional monitoring scenario, the prior image data cannot be acquired in advance.

The positioning method based on the camera does not belong to the measurement field, so that the pedestrian geographic information can be acquired only by introducing an image geographic information construction method. In the field of cross-video tracking, a method of introducing SIFT parameters is mainly adopted for the same pedestrian tracking, namely, the details of the pedestrian to be tracked are obtained as much as possible and stored in a database, and when the same pedestrian appears, the pedestrian matched with the database is re-identified. In the field of re-identification, a common cross-video tracking method comprises template matching, searching an image area to be matched by using a given template, and obtaining a matching result according to the calculated matching degree. However, the pedestrian matching method needs to give a template in advance, the requirement on the area to be matched to be searched is strict, and due to the existence of matching and searching, the calculation efficiency of continuous videos is low, the time consumption is long, and the method cannot be completely applied to a multi-video system.

In addition, the existing pedestrian tracking system mainly adopts methods of searching pedestrians and positioning pedestrians, and when cross-video tracking is carried out under multi-video linkage, methods of more pedestrian characteristic parameters are introduced for tracking. However, the existing pedestrian tracking system is difficult to acquire the continuous space-time trajectory of the pedestrian because the position information of the pedestrian cannot be acquired.

Disclosure of Invention

The application provides a cross-video pedestrian positioning and tracking method, a cross-video pedestrian positioning and tracking system and electronic equipment, and aims to solve at least one of the technical problems in the prior art to a certain extent.

In order to solve the above problems, the present application provides the following technical solutions:

a cross-video pedestrian positioning and tracking method comprises the following steps:

step a: constructing an object geographic coordinate database, carrying out reference point topological matching and video geographic registration on a monitoring video according to object identification, and determining geographic coordinates of pixel points of the monitoring video;

step b: calculating the pedestrian detection position of the monitoring video to acquire the geographic position of the pedestrian to be tracked;

step c: and combining the geographic positions of the pedestrians to be tracked detected by the adjacent videos, and performing pedestrian cross-video re-identification analysis on the multi-path monitoring videos by adopting a maximum likelihood estimation method to obtain the continuous space-time trajectory of the pedestrians to be tracked.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, the performing reference point topology matching and video geographic registration on the surveillance video according to object identification specifically includes:

adopting an object recognition algorithm to perform object recognition classification on the monitoring video to obtain reference points in the monitoring video;

matching the reference point with an object in the object geographic database to obtain geographic position information of the control point with the same name in the monitoring video;

and performing geographic registration on the homonymous control points of the ground area in the monitoring video by adopting a world geographic coordinate system conversion method, so that the monitoring video has geographic position information.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, the performing reference point topology matching and video geographic registration on the surveillance video according to object identification further includes:

carrying out image preprocessing on the monitoring video to obtain a video image subjected to fisheye calibration;

and intercepting a certain frame of two-dimensional image in the preprocessed video image, and performing edge extraction on the two-dimensional image by adopting an edge detection and watershed segmentation method to obtain a ground area with GIS information in the two-dimensional image.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the calculating the pedestrian detection position of the surveillance video specifically comprises:

and detecting a moving object in the monitoring video by adopting a frame difference method, positioning the position of the pedestrian to be tracked by combining a human head detector, and acquiring the head information of the pedestrian to be tracked.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the human head detector adopts a pedestrian detection method based on a convolutional neural network CNN, the convolutional neural network comprises an input layer, convolutional layers, a pooling layer, a full-connection layer and an output layer, the convolutional layers and the pooling layer are compounded to process input data, and mapping between the convolutional layers and the pooling layer and an output target is carried out through the connection layer.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in step b, the step of obtaining the geographic position of the pedestrian further comprises:

and carrying out movement detection on the pedestrian to be tracked based on the head information to obtain underfoot pixel points of the pedestrian to be tracked, wherein the underfoot pixel points are the geographical position information of the pedestrian to be tracked.

and calibrating the geographical position information of the pedestrian to be tracked by a method of restraining a camera.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in step c, the performing pedestrian cross-video re-recognition analysis on the multiple paths of monitoring videos by using the maximum likelihood estimation method further includes:

when shooting scene movement exists in the multi-path monitoring videos, triggering geographic area overlapping judgment; the geographic area overlap determination specifically includes:

positioning the geographic information of the shooting scenes of the multiple paths of monitoring videos, and dividing the monitoring areas of all the camera devices according to the overlapped geographic position areas; obtaining the continuous track of the pedestrian to be tracked by connecting the geographic information space coordinates of the pedestrian to be tracked in the continuous frames, and triggering the next camera to track the pedestrian track when the geographic information space coordinates of the pedestrian to be tracked exceed the monitoring area of the current camera and move to the monitoring area of the next camera.

Another technical scheme adopted by the embodiment of the application is as follows: a cross-video pedestrian location tracking system, comprising:

a video geographic registration module: the system is used for constructing an object geographic coordinate database, carrying out reference point topological matching and video geographic registration on a monitoring video according to object identification, and determining geographic coordinates of pixel points of the monitoring video;

the cross-video pedestrian positioning and tracking module: the pedestrian detection position calculation is carried out on the monitoring video to obtain the geographic position of the pedestrian to be tracked;

the multi-video track tracking module: the method is used for carrying out pedestrian cross-video re-identification analysis on the multi-path monitoring video by combining the geographic positions of the pedestrians to be tracked detected by the adjacent videos and adopting a maximum likelihood estimation method to obtain the continuous space-time trajectory of the pedestrians to be tracked.

The embodiment of the application adopts another technical scheme that: an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the one processor to cause the at least one processor to perform the following operations of the cross-video pedestrian location tracking method described above:

Compared with the prior art, the embodiment of the application has the advantages that: the cross-video pedestrian positioning and tracking method, system and electronic equipment acquire reference object position information through object identification to conduct geographic registration of monitoring videos, acquire pedestrian geographic position information through pedestrian detection and acquire moving space-time tracks of the pedestrians, are simple in operation, have geographic position information, and have better application value in a multi-channel video system scene. The pedestrian geographic position information is introduced based on video geographic calibration, maximum likelihood estimation is carried out when the cross-video pedestrian re-identifies, the difficulty and the system complexity of the visual personnel re-identifying track tracking algorithm are reduced, and the method has better application value in a multi-path cross-video system scene.

Drawings

FIG. 1 is a flow chart of a cross-video pedestrian location tracking method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a surveillance video geographic registration algorithm according to an embodiment of the present application;

FIG. 3(a) is a video image before preprocessing, and FIG. 3(b) is a video image after preprocessing;

FIG. 4 is a schematic view of an imaging model of a camera;

fig. 5(a) and (b) are schematic diagrams of spatial relationship between pixel coordinates and world geographic coordinates, where fig. 5(a) is the pixel coordinates and fig. 5(b) is the world geographic coordinates;

FIG. 6 is a schematic diagram of a pedestrian detection algorithm in accordance with an embodiment of the present application;

FIG. 7 is a flow chart of a frame differencing method according to an embodiment of the application;

FIG. 8 is an exemplary diagram of maximum likelihood estimation;

FIG. 9 is a flowchart of a cross-video pedestrian tracking algorithm based on a geographic area overlap determination according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a cross-video pedestrian location tracking system according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a hardware device of a cross-video pedestrian location and tracking method according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Please refer to fig. 1, which is a flowchart illustrating a cross-video pedestrian location tracking method according to an embodiment of the present application. The cross-video pedestrian positioning and tracking method comprises the following steps:

step 100: acquiring a monitoring video of a pedestrian to be tracked;

step 200: constructing an object geographic coordinate database, carrying out reference point topological matching and video geographic registration on the monitoring video according to object identification, and determining geographic coordinates of pixel points of the monitoring video;

in this step, the object geographic coordinate database includes a third-party geographic information base such as hundredths or existing BIM file data (data for construction engineering including building appearance and geographic position). In order to obtain more accurate GPS position information, a standard WGS84 coordinate system mode is adopted for reference point calibration, BIM (Building information model) information of an identified object corresponds to a coordinate point of a public area WGS84 coordinate system and serves as a reference point, and therefore the accuracy of pedestrian tracking space calculation is improved. The object geographic database is applied to correspondence of a world coordinate system and an image coordinate system after conversion, and is used for identifying an object by utilizing the existing object geographic information and an image, registering pixel points in a monitoring video with actual coordinates, and determining the spatial position coordinates of the monitoring video.

Further, please refer to fig. 2, which is a schematic diagram of a geographic registration algorithm of a surveillance video according to an embodiment of the present application. The geographic registration algorithm for the surveillance video in the embodiment of the application comprises the following steps:

step 210: adopting an object recognition algorithm to perform object recognition classification on the monitoring video to obtain reference points in the monitoring video, and simultaneously performing image preprocessing on the monitoring video to obtain a video image subjected to fisheye calibration;

in step 210, the object recognition algorithm includes methods such as RCNN and YO L O, and performs object recognition and classification on the surveillance video to obtain a reference point in the surveillance video, and performs fuzzy matching on the reference point and an object in the object geographic database.

In practical situations, due to the influence of factors such as the manufacturing level and the assembly level of an imaging original, an edge distortion condition exists in a video shooting process of a camera, and the edge distortion of a lens with a wider angle is more obvious, so that the nonlinear distortion of a monitoring video is caused, and therefore the nonlinear distortion of the camera needs to be corrected, and the calculation accuracy is improved. The method comprises the steps of preprocessing a monitoring video through a chessboard correction method, calculating internal parameters and correction coefficients of fisheyes, correcting and cutting the monitoring video, removing part of edge parts, and obtaining a video image after fisheye correction, wherein the preprocessed video image can reduce errors caused by nonlinear distortion during world coordinate system conversion. Specifically, as shown in fig. 3(a) and (b), fig. 3(a) is a video image before preprocessing, and fig. 3(b) is a video image after preprocessing.

Specifically, the nonlinear distortion is generally geometric distortion, which causes a certain offset between the coordinates of the pixel point and the coordinates of the ideal pixel point, and can be expressed as:

in the formula (1), (u, v) is an ideal pixel coordinate, and (u ', v') is a pixel coordinate affected by distortion. Non-linear distortion_u、_vThe expression of (c) can be written as:

in the formula (2), the reaction mixture is,_u、_vthe second and third terms being due to inaccuracies in the imaging original of the camera, parameter p₁、p₂、k₁、k₂、s₁、s₂Is a nonlinear distortion parameter. And restoring the image distortion by calculating the numerical value of the nonlinear distortion parameter.

Step 220: geographic registration is carried out on the identified reference point and an object in an object geographic database, and geographic position information of the control point with the same name in the monitoring video is obtained;

step 230: intercepting a certain frame of two-dimensional image in the preprocessed video image, and performing edge extraction on the two-dimensional image by adopting an edge detection and watershed segmentation method to obtain a ground area with GIS information in the two-dimensional image;

in step 230, by using the special spatial structure relationship of the surveillance video, the area below the video is the ground portion, and the vertical object can block the ground. Therefore, the horizontal structure and the vertical structure of the image are calculated by the edge detection technology and the watershed algorithm.

Step 240: geographic registration is carried out on the homonymous control points of the ground area in the monitoring video by adopting a world geographic coordinate system conversion method, so that the monitoring video has geographic position information;

in step 240, the coordinate system is converted and calculated according to the world geographic coordinate systemAnd shifting the matrix, and respectively performing geographic coordinate matching on each pixel point of the ground area in the monitoring video through methods such as image stretching, filling, cutting and the like to obtain a plane with geographic information in the monitoring video. The actual geographic position of the identified object and the relative position of the observed object are obtained by comparing the object geographic database. Since the matrix calculation generally performs the conversion between the pixel coordinates of one plane and the world coordinates through four points, the result obtained by the world coordinate system conversion matrix calculation is an estimated value and is not an accurate value. In order to control the error of the estimated value, the image is subjected to constraint calculation under the condition that more control points with the same name exist. In a plurality of three-dimensional space points X_w＝(X_w，Y_W，Z_W)^TProjecting to an imaging plane to obtain a corresponding two-dimensional plane point m ═ u, v)^TThen, a plurality of three-dimensional space points X are obtained through a plurality of times of matrix calculation of the triangular area_w＝(X_w，Y_W，Z_W)^T. By introducing a method of calculating and averaging a plurality of homonymous control points for a plurality of times, the calculation error caused by overlarge deformation amount can be reduced.

The camera imaging model is shown in fig. 4. A three-dimensional space point X in the objective world_w＝(X_w，Y_W，Z_W)^TProjecting to an imaging plane to obtain a corresponding two-dimensional plane point m ═ u, v)^TAnd the method is realized according to coordinate transformation between different coordinate systems. Specifically, the coordinate transformation formula between the world coordinate system and the camera coordinate system is as follows:

in the formula (3), X_c＝(X_c，Y_c，Z_c)^TRepresents point X_WAnd (3) three-dimensional coordinates in a camera coordinate system, wherein R is a rotation transformation matrix of 3 × 3, t is a translation transformation matrix of 3 × 1, and R and t respectively represent the relative attitude and position between the world coordinate system and the camera coordinate system.

The coordinate transformation method between the camera coordinate system and the image coordinate system is to project a three-dimensional space point P (X, Y, Z) in the camera coordinate system to the imaging plane to obtain a corresponding two-dimensional plane point P (X, Y), and the relationship between X, Y and X, Y can be expressed as:

in the formula (4), f is the focal length of the camera. The geographic position corresponding to the area shot by the camera can be calculated by the known world geographic coordinate plane point P (X, Y, Z) with geographic position information and the two-dimensional coordinate point P (X, Y) of the imaging plane. Specifically, as shown in fig. 4, a schematic view of an imaging model of a camera is shown.

According to an image coordinate system o 'xy and a pixel coordinate system o'_xyThe coordinate relationship of (a) can be given by:

again, the spatial geometry can be found:

the distance of the identified object or pixel region in the vertical direction on the ground can be obtained by combining the formula as follows:

therefore, the height of the recognized object can be judged according to the Y value, and when the Y value is smaller than the threshold value of the set ground (the accurate value can be set according to the actual situation), the ground area can be judged, and the ground coordinates can be determined. By calculating X_c＝(X_c，Y_c，Z_c)^TAnd calculating an offset matrix with the pixel point p (x, y), and deforming a ground area of a certain block in the picture into an actual ground matrix matched with the object geographic database in a matrix offset and filling mode to realize the correspondence of the pixel coordinate and the geographic coordinate. FIG. 5(a) and (b) are imagesThe spatial relationship between the pixel coordinates and the world geographic coordinates is illustrated schematically, where fig. 5(a) is the pixel coordinates and fig. 5(b) is the world geographic coordinates.

In the world coordinate system conversion, because the image can have the skew error when the formation of image, can lead to passing through transfer matrix world coordinate system error bigger, consequently, for the suppression error, this application provides the method of multiple suppression error and carries out position calibration to pedestrian's pixel. After the object is identified, the bottom center point of the object identification frame is used as the geographic position of the object and is used as a reference point for calculation; first, the relationship between a two-dimensional plane point (x, y) in an image and its corresponding point (u, v) in a pixel coordinate system is represented by:

the above equation can be expressed in a coordinate transformation form as:

in the formula (9), d_x，d_yRepresenting the physical distance of the pixel in the u and v axes and directions of the image capture (u)₀，v₀) Is the coordinates of the camera principal point in the pixel coordinate system. By simultaneous obtaining:

in the formula (10), f_u，f_vRespectively, represent the focal length in units of pixel width and height. Matrix array

The parameters in (1) are called camera intrinsic parameters, and are only influenced by the internal structure and imaging characteristics of the camera. The parameters of matrix R and matrix t are referred to as the camera-out parameters. P ═ K [ R, t ═]Referred to as a perspective projection matrix. Therefore, the conversion between the plane point and the camera point is realized, the distance between any point (x, y) and the known point (u, v) can be calculated, and further, the distance between any point (x, y) and the known point (u, v) can be calculatedThe GPS information of the point (x, y) is calculated.

Step 300: calculating the pedestrian detection position of the monitoring video to acquire the geographic position of the pedestrian to be tracked;

please refer to fig. 6, which is a schematic diagram of a pedestrian detection algorithm according to an embodiment of the present application. The pedestrian detection algorithm of the embodiment of the application comprises the following steps:

step 310: carrying out human head detection on the monitoring video to acquire head information of a pedestrian to be tracked;

in step 310, human head detection is a method for rapidly identifying a pedestrian head model, and is suitable for multiple paths of monitoring videos. In order to improve the timeliness of pedestrian detection, the method adopts a frame difference method to detect a moving object in a monitoring video, and is combined with a human head detector to position the position of a pedestrian. The human head detector adopts a pedestrian detection method based on a convolutional neural network CNN, the convolutional neural network is composed of an input layer, convolutional layers, pooling layers, full-connection layers and output layers, the convolutional layers and the pooling layers are compounded to process input data, and mapping between the convolutional layers and the pooling layers and output pedestrians is realized on the connection layers. Fig. 7 shows a flow chart of the frame difference method, which specifically includes: the images of the n +1 th frame, the n-1 th frame and the n-1 th frame in the video sequence are respectively recorded as fn +1, fn and fn-1, the gray values of the corresponding pixel points of the three frames of images are respectively recorded as fn +1(x, y), fn (x, y) and fn-1(x, y), differential images Dn +1 and Dn are respectively obtained, the differential images Dn +1 and Dn are operated, then threshold processing and connectivity analysis are carried out, and finally a moving object is detected.

Step 320: performing movement detection (pedestrian position calculation) on the pedestrian to be tracked based on the pedestrian head detection result to obtain underfoot pixel points of the pedestrian to be tracked, namely the geographic position information of the pedestrian to be tracked;

in step 320, because the pedestrian is a moving object, according to the characteristics of the video, the lowermost region of the moving object is a position where the foot stands, that is, the positioning coordinate point of the pedestrian to be tracked is a foot region, and the head coordinate point cannot be used as a spatial coordinate pedestrian calculated by the geographic position, so that the method of combining head detection and movement detection is used for quickly finding the underfoot pixel point corresponding to the moving region of the pedestrian to be tracked.

Step 330: calibrating the geographical position information of the pedestrian to be tracked;

in step 330, the method calibrates the geographic position information of the pedestrian to be tracked by inhibiting the camera, so that the uncertain error caused by the blur when the pedestrian moves is reduced, and the positioning accuracy is improved.

Step 400: combining the geographic positions of the pedestrians to be tracked detected by the adjacent videos, and performing pedestrian cross-video re-recognition analysis on the multi-path monitoring videos by adopting a maximum likelihood estimation method to obtain continuous space-time tracks of the pedestrians to be tracked;

in step 400, when a pedestrian moves across videos, although the pedestrian has geographic position information of the pedestrian to be tracked, whether the pedestrian is the same pedestrian cannot be determined, the method determines whether the pedestrian is the same pedestrian through probability calculation by performing maximum likelihood estimation on the movement track of the same pedestrian to be tracked in a plurality of videos. The method specifically comprises the following steps: given a probability distribution D, assume its probability density function (continuous distribution) or probability aggregation function (discrete distribution) to be f_DAnd a distribution parameter θ, from which a sample x having n values can be extracted₁，x₂，...，x_nThen, θ is estimated. By using f_DThe probability is calculated as follows:

P＝(x₁，x₂，x₃...，x_n)＝f_D(x₁，x₂，x₃...，x_n|θ) (11)

the maximum likelihood estimate finds the most likely value for theta (i.e., finding one value that maximizes the "likelihood" of this sample among all possible theta values). To mathematically implement the maximum likelihood estimation method, first the likelihood is defined:

lik(θ)＝f_D(x₁，x₂，x₃...，x_n|θ) (12)

and this function is maximized over all values of theta. This value that maximizes the likelihood is the maximum likelihood estimate of θ. And comparing the tracks obtained from the two videos to judge the maximum likelihood, and judging whether the videos are the same pedestrian or not after setting a threshold value. Specifically, as shown in fig. 8, it is an exemplary diagram of maximum likelihood estimation.

When the pedestrian to be tracked moves across videos, the shooting scene may move, and at this time, the geographic area overlap determination is triggered, specifically as shown in fig. 9, which is a flowchart of a cross-video pedestrian tracking algorithm based on the geographic area overlap determination according to the embodiment of the present application. And the geographic area overlapping judgment is to position the geographic information of the shooting scenes of the multiple paths of monitoring videos and then divide the respective tracked monitoring areas according to the overlapped geographic position areas. The continuous track of the pedestrian to be tracked is obtained by connecting the geographic information space coordinates of the pedestrian to be tracked in the continuous frames, and when the geographic information space coordinates of the pedestrian to be tracked exceed the monitoring area of the current camera and move to the monitoring area of the next camera, the next camera is triggered to track the pedestrian track.

Please refer to fig. 10, which is a schematic structural diagram of a cross-video pedestrian positioning and tracking system according to an embodiment of the present application. The cross-video pedestrian positioning and tracking system of the embodiment of the application comprises:

the cross-video pedestrian positioning and tracking module: the system is used for calculating the pedestrian detection position of the monitoring video and acquiring the geographic position of the pedestrian to be tracked;

the multi-video track tracking module: the method is used for combining the geographic positions of the pedestrians to be tracked detected by the adjacent videos, and performing pedestrian cross-video re-identification analysis on the multi-path monitoring videos by adopting a maximum likelihood estimation method to obtain the continuous space-time trajectory of the pedestrians to be tracked.

Fig. 11 is a schematic structural diagram of a hardware device of a cross-video pedestrian location and tracking method according to an embodiment of the present application. As shown in fig. 11, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.

The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 11.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.

The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:

Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:

The cross-video pedestrian positioning and tracking method, system and electronic equipment acquire reference object position information through object identification to conduct geographic registration of monitoring videos, acquire pedestrian geographic position information through pedestrian detection and acquire moving space-time tracks of the pedestrians, are simple in operation, have geographic position information, and have better application value in a multi-channel video system scene. Meanwhile, pedestrian geographic position information is introduced based on video geographic calibration, maximum likelihood estimation is carried out when the cross-video pedestrian is re-identified, the difficulty of the visual staff re-identifying the track tracking algorithm and the system complexity are reduced, and the method has better application value in a multi-path cross-video system scene.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A cross-video pedestrian positioning and tracking method is characterized by comprising the following steps:

2. The cross-video pedestrian positioning and tracking method according to claim 1, wherein in the step a, the performing reference point topology matching and video geographic registration on the surveillance video according to object recognition specifically comprises:

3. The cross-video pedestrian positioning and tracking method according to claim 2, wherein in the step a, the performing reference point topology matching and video geographic registration on the surveillance video according to object recognition further comprises:

4. The cross-video pedestrian location tracking method according to any one of claims 1 to 3, wherein in the step b, the calculating the pedestrian detection position of the surveillance video specifically comprises:

5. The cross-video pedestrian positioning and tracking method according to claim 4, wherein the human head detector adopts a pedestrian detection method based on a Convolutional Neural Network (CNN), the convolutional neural network comprises an input layer, a convolutional layer, a pooling layer, a full-link layer and an output layer, the plurality of convolutional layers and pooling layers are compounded to process input data, and mapping between the input data and an output target is performed through the link layers.

6. The cross-video pedestrian location tracking method according to claim 4, wherein in the step b, the obtaining the geographic position of the pedestrian further comprises:

7. The cross-video pedestrian location tracking method according to claim 6, wherein in the step b, the obtaining the pedestrian geographic position further comprises:

8. The cross-video pedestrian location tracking method according to claim 7, wherein in the step c, the performing pedestrian cross-video re-recognition analysis on the multiple surveillance videos by using the maximum likelihood estimation method further comprises:

9. A cross-video pedestrian location tracking system, comprising:

10. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the following operations of the cross-video pedestrian location tracking method of any of claims 1 to 8 above: