CN110245199B

CN110245199B - Method for fusing large-dip-angle video and 2D map

Info

Publication number: CN110245199B
Application number: CN201910350808.5A
Authority: CN
Inventors: 朱雪坚; 刘学军; 叶远智; 刘洋; 王晓辉
Original assignee: Zhejiang Natural Resources Monitoring Center; Nanjing Normal University
Current assignee: Zhejiang Natural Resources Monitoring Center; Nanjing Normal University
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2021-10-08
Anticipated expiration: 2039-04-28
Also published as: CN110245199A

Abstract

The invention discloses a method for fusing a large-dip-angle video and a 2D map, which obtains a static video background image through video background modeling; extracting a foreground dynamic target by adopting a mode of combining a background subtraction method and a three-frame difference method; performing image segmentation on the static video background image obtained in the step S1 to obtain a road surface area in the image, and performing geometric correction on the video background image based on a homography large-inclination-angle video image geometric correction algorithm to obtain a corrected video background image; establishing a mutual mapping model of the monitoring video and the 2D geographic space data based on the internal and external parameters of the camera; and based on the mutual mapping model of the monitoring video and the 2D geographic space data established in the S4, mapping the corrected static video background image and the extracted foreground dynamic target to a two-dimensional map, and finishing the integrated expression of the monitoring video and the two-dimensional map.

Description

Method for fusing large-dip-angle video and 2D map

Technical Field

The invention belongs to a technology for fusing a monitoring video and a 2D map, and particularly relates to a method for fusing a large-dip-angle video and a 2D map.

Background

Integration of video and GIS is a new way to perform geographical scene representation, wherein, in 1978, in the field of integration of video and 2D GIS, Lippman (Lippman a. motion maps: an application of the optical video to computer graphics [ J ]. sigraph' 80, 1980, 14(3):32-42.) first integrates video and GIS, and dynamic and user-interactive hypermedia maps are developed, and then, the research work of video and GIS integration is gradually deepened, more and more emphasized by researchers, and a great deal of related research is developed. Berry (Berry J K. Capture "Where" and "When" on video-based GIS [ J ]. GEO WORLD,2000,13:26-27.) and other people put forward a frame of a video map, and design a corresponding conceptual scheme for field acquisition, processing and application of data; lewis (Lewis P, Fotheringham S, Winstanley A.spatial video and GIS.International Journal of geographic Information Science,2011,25(5):697-716) and the like define a pyramid data structure of a geospatial video data model under the constraint of GIS, can be applied to two-dimensional GIS analysis and visualization, and the feasibility of the pyramid data structure is verified through experiments; the campus geographic video monitoring WebGIS system design and implementation [ J ], surveying and mapping science, 2012,37(1):195 & 197.) and the like and the Zhang (Zhang. GIS-based public security video monitoring plan system design and implementation [ D ]. Kaifeng: Henan university, 2011.) and the like respectively realize the loose integration of the video monitoring system and the GIS, represent a camera by one point or represent the view range of the camera by one sector in a two-dimensional map, and statically call a video file to play by adopting a hyperlink mode; establishing a mapping relation among a geographic position (XY), a road mileage (M), a video time or a frame (T/F) by people such as a hole cloud peak (hole cloud peak, geographic video data model design and network video GIS realization [ J ]. Wuhan university academic newspaper information science edition, 2010, 35(2): 133-; the inter-mapping model of the monitoring video and the geographic spatial data is researched by Zhang Xing (Zhang Xing, Liu Xue Jun, Wang Si Ning, and the like.) the inter-mapping model of the monitoring video and the 2D geographic spatial data is researched, and a semi-automatic inter-mapping method based on feature matching is provided, wherein the inter-mapping model is provided by the Wuhan university newspaper, the information science edition 2015,40(8):1130 and 1136.

The geometric correction of the image has wide application in the field of remote sensing, and is one of important means for reducing the difference between the remote sensing image and the real form of the ground. In the field of photogrammetry, only the problem of geometric correction of small oblique shots (with an inclination angle within 2 °) is generally discussed. In the aspect of geometric correction of small-inclination images, existing algorithms mainly include methods based on control point solution according to certain mathematical models, such as polynomial methods, and the other is collinear equation methods based on digital elevation models and formation equations (grand-pod. Some researchers have also conducted some research on the geometric correction of large-tilt images (tilt angles between 2 ° and 90 °). And (5) performing geometric correction [ J ] based on homography large inclination angle images, college university of Shanghai, Nature edition, 2005,11(5): 481-. Xuqingyang (Xuqingyang, near space large inclination angle remote sensing image geometric correction method research [ D ]. Harbin Industrial university, 2009 ]) deep research is carried out on the difficult problems in the near space large inclination angle remote sensing image geometric correction technology, a segmented polynomial correction model is provided, the process of automatically selecting control points by SIFT interest operators is optimized through an iterative error control point removing algorithm and a uniform distribution algorithm, and full-automatic geometric correction of images is realized. For the large-inclination aerial photographic image, such as Juzhuang (steady juxie, Wangyong, great courage) geographic coordinate assignment method [ J ] ocean mapping, 2010,30(3):23-26 ], a method for performing geometric correction based on an improved six-parameter affine transformation model by referring to a reference image is provided.

From the above analysis, it can be seen that the research on the integration problem of the surveillance video and the two-dimensional map has been emphasized by the learners, the research on the related theories and methods has gradually become a focus of attention in the related fields of academia, and corresponding research results have been obtained, but the main problems are:

(1) in the aspect of integration of videos and GIS, the existing research uses video data as the attribute of spatial data, adopts a hyperlink mode to statically call a video file, lacks the spatial resolution of the videos, or simply places the videos on a map, and does not fully utilize rich information contained in the videos. In addition, the existing research only integrates the monitoring video and the map of the coverage area of each camera, has less deductive attention on the monitoring blind area, and cannot sense the spatial pattern of the dynamic target of the monitoring blind area.

(2) In the aspect of image geometric correction, the photogrammetry field generally only discusses the geometric correction problem of small-inclination images, while the existing monitoring cameras generally have larger inclination angles due to the requirement of monitoring range, and the geometric correction method suitable for the small-inclination images cannot be simply applied to the geometric correction of large-inclination monitoring video images. In the existing geometric correction method for the large-inclination image, a dynamic target in a correction result has large geometric deformation and large distortion, and the geometric correction requirement of a real-time monitoring video image is difficult to meet.

Disclosure of Invention

The invention provides a method for fusing a large-inclination-angle video and a 2D map, which aims at solving the problems that in a corrected image result obtained by the existing large-inclination image geometric correction method, a dynamic target has large geometric deformation, has large distortion and is low in correction algorithm efficiency.

The invention discloses a method for fusing a large-dip-angle video and a 2D map, which comprises the following steps of:

s1: establishing a mutual mapping model of the monitoring video and the 2D map according to the parameters of the camera;

s2: and mapping the front-view static video background image of the surveillance video and the foreground dynamic target of the surveillance video onto the 2D map according to the mutual mapping model, so as to finish the integrated expression of the surveillance video and the 2D map.

Further, the acquiring of the front-view static video background image of the surveillance video comprises the following steps:

obtaining a static video background image of a monitoring video according to a video background modeling technology;

and performing geometric correction on the static video background image to obtain a corresponding front-view image.

Further, the step of extracting foreground dynamic objects in the surveillance video includes:

performing AND operation on the foreground dynamic target binary image obtained by the three-frame difference method and the foreground dynamic target binary image obtained by the background subtraction method to obtain a final foreground dynamic target;

and obtaining the position of the foreground dynamic target by analyzing the connected domain of the foreground dynamic target.

Further, the step of obtaining a foreground dynamic target binary image includes:

respectively extracting foreground dynamic targets from a static video background image according to a three-frame difference method and a background subtraction method to respectively obtain preliminary foreground dynamic targets;

each frame of the surveillance video

Obtaining a difference value g corresponding to each pixel by subtracting the preliminary foreground dynamic target₁And g₂：

If g is₁＞k₁Or g₂＞k₂Wherein k is₁And k₂Respectively, are the corresponding adaptive threshold values,

and if the average gray value of the background image of the static video is the gray value, marking the pixel point as 1, and marking other points as 0 to obtain a video foreground dynamic target binary image.

Further, before the step of geometrically correcting the background image of the still video, the method includes:

performing super-pixel segmentation on a static video background image;

based on the prior knowledge of the ground and the non-ground, constructing a decision tree which takes the image characteristics extracted from the segmented static video background image as a classification basis;

classifying the horizontal bottom surface and the non-bottom surface of the divided static video background image by adopting a decision tree to obtain a ground part and a non-ground part in the static video background image;

the step of geometrically correcting the still video background image comprises: correcting the ground part in the static video background image into an orthographic image by adopting a homography matrix;

if the front-view image has the void point, acquiring a corresponding point of the void point on the static video background image, calculating a gray value of the corresponding point of the void point on the static video background image by using a bilinear interpolation method, and further acquiring the gray value of the void point to obtain the finally corrected static video background image.

Further, the step of mapping the front-view static video background image in the surveillance video onto the 2D map comprises:

acquiring a corresponding view trapezoid of the monitoring video in the geographic space according to the mutual mapping model established in the S1;

according to the coordinates (X) of four corner points of the view trapezoid_i，Y_i)，i∈[1，4]And calculating to obtain the side length L of each side of the trapezoid of the vision field_i，i∈[1，4]The distance l corresponding to each side length of the trapezoid of the view field on the map_i，i∈[1，4]：l_i＝s×L_i，i∈[1，4]S is the scale of the 2D map;

according to the length PL of each side of the front-view static video background image in the monitoring video_iAnd its actual length l, calculating the scaling factor epsilon_i：

Based on a scaling factor, after scaling transformation is carried out on the visual static video background image, according to the coordinate of the center point of the camera, the rotation angle and the pitch angle of the camera, rotation and translation transformation are carried out on the visual static video background image, and the static video background image is mapped to the correct position of the 2D map;

the step of mapping foreground dynamic objects in the surveillance video onto the 2D map comprises:

calculating the center coordinate Centre of each foreground dynamic target in the monitoring video:

wherein M is the pixel number of the foreground dynamic target, (x)_i，y_i) Pixel coordinates of the foreground dynamic target;

according to a scaling factor epsilon_iScaling the foreground dynamic target in the same proportion;

converting the center coordinates of each foreground dynamic target into 2D geographic coordinates according to a mutual mapping model of the monitoring video and the 2D map;

and mapping the foreground dynamic target of the monitoring video to a 2D map according to the central coordinates of each foreground dynamic target and the movement direction of the foreground dynamic target, and updating the position of the foreground dynamic target in real time.

Further, the moving direction of the foreground dynamic target is determined by the rotation angle of the camera.

Further, the mutual mapping model of the monitoring video and the 2D map comprises a mapping model from a video image space to a geographic space and a model from the geographic space to the video image space;

the mapping model of the video image space to the geographic space is represented as:

in the formula (X)_G,Y_G,Z_G) The space coordinate of the target is taken, (Xc, Yc, Zc) is the optical center point coordinate of the camera, (f, x, y) is a sight line vector, P and T are rotation matrixes of the camera, and lambda is a ray extension parameter;

the mapping model of the geographic space to the video image space is represented as:

further, the video background modeling technology adopts a Vibe algorithm.

Further, the super-pixel segmentation of the static video background image is realized by adopting a SLIC super-pixel segmentation algorithm.

Has the advantages that: the invention provides a method for integrating a large-dip-angle monitoring video and a 2D map based on the defects existing in the fusion of the monitoring video and the 2D map at present, and mainly solves the problem of deformation and distortion of a dynamic target after the geometric correction of a large-dip-angle image; meanwhile, in the video shot by the gunlock, the background of the video is kept unchanged, so that each frame of the video does not need to be geometrically corrected, the calculated amount is greatly reduced, the efficiency of geometric correction of the monitoring video image is effectively improved, the integration of a single monitoring video and a two-dimensional map is completed, and the expression of the two-dimensional map on a dynamic target is enhanced.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a video context separation experiment result of the present invention;

FIG. 3 is a result diagram of the segmentation of horizontal road surface areas according to the present invention, wherein 3-1 is an original video background diagram, 3-2 is a color effect diagram after classification, and 3-3 is a segmented horizontal road surface;

FIG. 4 is a diagram of the video background image correction result of the present invention, wherein 4-1 is the video background image to be corrected, and 4-2 is the result after correction;

fig. 5 is a general effect diagram of the present invention, wherein 5-1 is the original image of the 359 th frame of video, and 5-2 is the integrated effect diagram of the 359 th frame of video image and the two-dimensional map.

Detailed Description

The invention is further illustrated below with reference to the figures and examples.

The basic idea of the invention is as follows: firstly, video background modeling is carried out on a monitoring video, automatic segmentation of a flat road surface is carried out on an obtained static video background image by adopting a decision tree, and then a homography matrix is utilized to carry out large-inclination-angle geometric correction processing on the flat road surface; extracting a video foreground dynamic target by adopting a mode of combining a background subtraction method and a three-frame difference method; and finally, mapping the corrected static video background image and the extracted foreground dynamic target to a 2D map respectively, and finally realizing the integration of the monitoring video and the two-dimensional map, thereby enhancing the expression of the two-dimensional map to the dynamic target.

Example 1:

as shown in fig. 1, the method for fusing a large-inclination-angle video and a 2D map according to the embodiment includes the following steps:

firstly, video background modeling:

the video background modeling technology is used to obtain a static video background image, namely a static or very slow moving point in the video. The background modeling technology adopts a Vibe algorithm, the Vibe algorithm is a pixel-level video background modeling algorithm, the algorithm is high in calculation efficiency and has certain robustness on noise, the Vibe algorithm is suitable for complex scenes such as camera shaking and illumination change, good real-time performance can be kept, and video background modeling quality and efficiency can be guaranteed.

Secondly, foreground dynamic target extraction:

the foreground dynamic target refers to a point which obviously moves in a monitored video, the points are usually represented as running vehicles, walking pedestrians and the like in the monitored video, the video foreground dynamic target is extracted by adopting a mode of combining a background subtraction method and a three-frame difference method, and the specific steps are as follows:

(1) background subtraction is carried out by utilizing the static video background image obtained in the first step, so as to extract a preliminary foreground dynamic target, and then the average gray value of the static video background image is used for

Setting two adaptive thresholds k₁And k₂：

(2) Reading each frame of a surveillance video

And solving the difference value g corresponding to each pixel by subtracting the preliminary foreground dynamic target₁And g₂：

(3) If g is₁＞k₁Or g₂＞k₂And marking the pixel point as 1 in the foreground binary image and marking other points as 0, thereby obtaining a preliminary video foreground dynamic target binary image.

(4) And in order to eliminate the noise, performing AND operation on the foreground dynamic target obtained by the three-frame difference method and the foreground dynamic target obtained by the background subtraction method to obtain a final foreground dynamic target result.

(5) Determining the position of the foreground dynamic target by performing Connected Component analysis (Connected Component) on the foreground dynamic target, and solving a central coordinate Centre of the moving target:

wherein M is the pixel number of the foreground dynamic target, (x)_i，y_i) Is the pixel coordinate of the foreground dynamic target.

Thirdly, correcting the video background geometry:

the geometric correction of the video background is based on a decision tree, the static video background image obtained in the first step is subjected to image segmentation to obtain a flat pavement area in the image, and the geometric correction of the pavement area in the static video background image is performed based on a homography large-inclination video image geometric correction algorithm, and the method specifically comprises the following steps:

(1) and performing superpixel segmentation on the static video background image obtained in the first step, wherein an SLIC superpixel segmentation algorithm is adopted, and the method has high-efficiency processing speed, low algorithm complexity and good segmentation boundary.

(2) And acquiring prior knowledge of the ground and the non-ground through machine learning to construct a decision tree. The decision tree classification is based on image features extracted from the segmented static video background image. The embodiment selects 55 pixel characteristics in total of 10 image characteristics as the basis of decision tree classification.

TABLE 1 image characteristics for decision tree classification

(3) And carrying out horizontal ground and non-ground classification on the segmented video background image to obtain a ground part and a non-ground part.

(4) And correcting the ground part in the static video background image into an orthographic image through the homography matrix. Suppose that two images before and after correction are respectively I₁And I₂To, forIn the video background image I to be corrected₁At any point (x)₁，y₁) Can be in the image I₂Find the corresponding point (x) above₂,y₂). The points of the two images before and after correction satisfy a simple homography:

x₂＝Hx₁ (6)

the matrix H is a 3 × 3 matrix, which is specifically expressed as:

obtaining the coordinates of the corrected image points:

(5) in general, x₂、y₂Is an integer, and therefore, needs to be rounded, corresponding to point (x)₂，y₂) The gray value of (A) is represented by point (x)₁，y₁) Determined by the gray value of (d). In the image obtained by the method, holes exist, and in order to eliminate the holes, the holes (x) need to be arranged in a hole point₂′，y₂') detected and evaluated in image I₁Corresponding point (x) on₁′，y₁') and then calculating the gray value of the void point by using a bilinear interpolation method to obtain the finally corrected orthographic static video background image.

Fourthly, the video moving and static target and the 2D map are mapped with each other:

the mutual mapping model refers to the realization of mutual mapping between an image space and a geographic space, and comprises the following specific steps:

(1) mapping of image space to geographic space.

The video image space is a two-dimensional space, the geographic space is a three-dimensional space, and the monitoring camera can project the ground objects in the geographic space into the image space. The monitoring camera adopts perspective projection for imaging, and the model is as follows:

the meaning of the above formula is: in the initial state of the camera, the position of the camera is (0, 0, 0), the horizontal angle and the pitch angle are both 0 degrees, and the sight line vector CP is (f, x, y); as the camera rotates, the rotation is expressed by the rotation matrix of P, T, and the sight line vector is multiplied by the rotation matrix to be the rotated sight line direction; the ray extension, i.e. multiplication by λ in the formula, can obtain the vector from the optical center point of the camera to the object point. Optical center point coordinates (X) of known camera_C，Y_C，Z_C) And the vector from the optical center to the object space point, the coordinate (X) of the object space point can be obtained_G，Y_G，Z_G). In the formula, the left side of the equal sign is the spatial position of the target and is an unknown quantity; to the right of the equal sign are the position of the image pixel, the pose, position and λ of the video sensor, which are known quantities.

The image space is a two-dimensional space, wherein a point may correspond to an infinite point on a straight line in the three-dimensional space, and the above formula is an expression of the straight line. When λ is determined, a unique point in three-dimensional space can be determined.

Let the distance from the optical center point of the camera to the target surface be f_DAnd the distance from the optical center point to the object space point is D, the geometrical relationship can be obtained as follows:

the invention is based on the horizontal ground, when Z is_GEqual to the groundThe elevation of (a). When Z is_GWhen known, one can deduce that λ is:

substituting lambda into formula 10 to obtain X_GAnd Y_GThereby determining the spatial location of the object on the image.

(2) Mapping of geospatial to image space.

For mapping from geospatial to image space, it is the inverse of the image to geospatial mapping. Inverting and transforming equation 10 to obtain the following equation:

the right side of the middle mark in the formula contains known quantities such as geospatial coordinates, camera attitude, camera position, and the like. Left of equal sign when the focal length f is known, λ can be obtained from equation 12:

by substituting expression 15 into expression 10, the coordinates (x, y) of the spatial point in the image space can be calculated.

Fifthly, integrating the video moving and static targets with the 2D map:

the integration of the video target and the 2D map means that a mutual mapping model of the monitoring video and the 2D map is established, and the corrected front-view static video background image and the extracted foreground dynamic target are mapped onto the two-dimensional map, so that the integrated expression of the monitoring video and the two-dimensional map is realized.

(1) The mapping of the static video background of the monitoring video and the two-dimensional map comprises the following specific steps:

firstly, establishing a mapping model from a video image to 2D geographic space data according to internal and external parameters of a camera;

calculating a corresponding view trapezoid of the monitoring video image in the geographic space according to the established mapping model;

coordinate (X) of four corner points of view trapezoid_i，Y_i)，i∈[1，4]Calculating the side length L of each side of the trapezoid of the viewing area_i，i∈[1，4]

Fourthly, calculating the distance l of each side length of the trapezoid of the view field corresponding to the map according to the scale s of the current 2D map_i，i∈[1，4]

l_i＝s×L_i，i∈[1，4] (16)

Fifthly, knowing the length PL of each side of the corrected front-view static video background image_i，i∈[1，4]And the actual length l of the image, a scaling relation exists between the images before and after mapping, and the scaling factor is as follows:

the scaling is performed in order to overlay the surveillance video background image to a suitable size on the map. And after the video background image is subjected to zoom transformation, the monitoring video background image is subjected to rotation and translation transformation according to the coordinate of the central point of the camera, the rotation angle and the pitch angle of the camera, and the monitoring video background image is mapped to the correct position of the two-dimensional map.

(2) Monitoring mapping of a foreground dynamic target of a video and a two-dimensional map, which comprises the following specific steps:

extracting foreground dynamic targets in the monitored video according to the second step, and calculating the center coordinates of each foreground dynamic target;

② according to the scaling factor epsilon_iScaling the dynamic target in the same proportion;

thirdly, according to the internal and external parameters of the monitoring camera, a mapping model of the monitoring video and the 2D geographic space data is constructed, and based on the mapping model, the central coordinates of all the moving objects are converted into 2D geographic coordinates;

mapping the dynamic foreground target of the monitoring video to two-dimensional geographic space data, determining the moving direction of the dynamic target such as the direction of the head of a moving vehicle according to the rotation angle of the camera, and enabling the dynamic target to move on the two-dimensional map by continuously updating the real-time position of the dynamic target.

(3) And (3) re-integrating the mapping results of the (1) and the (2) together, and finally completing the integration of the monitoring video and the two-dimensional map.

Example 2:

first step, preparation of related equipment: a portable notebook computer is prepared, and one high-definition monitoring camera is arranged.

Secondly, separating the background image of the static video from the foreground dynamic target: establishing a video background by using a Vibe algorithm, extracting a foreground dynamic target according to an established video background image, and sequentially obtaining an original image, a video background image and a video dynamic target as a result shown in figure 2; in fig. 2, the first line is the processing result of the 75 th frame of the video, and the second line is the processing result of the 359 th frame of the video.

Thirdly, geometrically correcting the background image of the static video:

(1) and (4) constructing a decision tree by using the urban road traffic map provided by the labelME database as a training picture set. And (3) dividing the flat pavement of the experimental video data based on the constructed decision tree, wherein the division result is shown in figure 3.

(2) And performing geometric correction on the video background image after segmentation. Fig. 3-3 is the image to be corrected, and the homography matrix H of the correction transformation is solved by the internal and external parameters of the monitoring camera, and the result is shown in fig. 4 after the correction is performed on fig. 3-3.

Fourthly, integrating the video moving and static targets with a two-dimensional map:

and mapping the corrected static video background image and the extracted foreground dynamic target to a two-dimensional map.

The method adopts OpenLayers to call Google map tiles as base maps, determines a mapping model of video images and 2D geographic data according to internal and external parameters of a camera, and realizes the mapping of video moving and static targets to the 2D geographic data, wherein the video background images and the dynamic targets can be superposed on the Google maps by calling the ol.layer.image () class and the ol.style.icon () class in the OpenLayers. A graph of the results of the video integration experiment with the two-dimensional map is shown in fig. 5.

Example 3:

the embodiment discloses a fusion system of a large-dip-angle video and a 2D map, which comprises a network interface, a memory and a processor; wherein the content of the first and second substances,

the network interface is used for receiving and sending signals in the process of receiving and sending information with other external network elements;

a memory for storing computer program instructions executable on the processor;

a processor for executing the steps of the method for fusing a high-tilt-angle video with a 2D map in embodiment 1 when executing the computer program instructions.

Example 4:

the present embodiment discloses a computer storage medium storing a program of a fusion method of a high-tilt-angle video and a 2D map, which when executed by at least one processor, implements the steps of the fusion method of a high-tilt-angle video and a 2D map of embodiment 1.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person of ordinary skill in the art can make modifications or equivalents to the specific embodiments of the present invention with reference to the above embodiments, and such modifications or equivalents without departing from the spirit and scope of the present invention are within the scope of the claims of the present invention as set forth in the claims.

Claims

1. A method for fusing a large-dip-angle video and a 2D map is characterized by comprising the following steps: the method comprises the following steps:

s1: establishing a mutual mapping model of the monitoring video and the 2D geographic space data according to the camera parameters;

s2: according to the mutual mapping model, mapping the front-view static video background image of the surveillance video and the foreground dynamic target of the surveillance video to the 2D map to complete the integrated expression of the surveillance video and the 2D map;

wherein the step of mapping the front-view static video background image in the surveillance video onto the 2D map comprises:

acquiring a corresponding view trapezoid of the surveillance video in the geographic space according to the mutual mapping model of the surveillance video and the 2D geographic space data established in the S1;

according to the coordinates (X) of four corner points of the view trapezoid_i，Y_i)，i∈[1，4]And calculating to obtain the side length L of each side of the trapezoid of the vision field_i，i∈[1，4]Corresponding to each side length of trapezoid of view fieldDistance l on the graph_i，i∈[1，4]：l_i＝s×L_i，i∈[1，4]S is the scale of the 2D map;

converting the center coordinates of each foreground dynamic target into 2D geographical coordinates according to a mutual mapping model of the monitoring video and the 2D geographical space data;

2. The method for fusing a large-inclination-angle video and a 2D map according to claim 1, wherein the method comprises the following steps: the acquisition of the front-view static video background image of the monitoring video comprises the following steps:

3. The method for fusing a large-inclination-angle video and a 2D map according to claim 1, wherein the method comprises the following steps: the method for extracting the foreground dynamic target in the monitoring video comprises the following steps:

4. The method for fusing a large-inclination-angle video and a 2D map according to claim 3, wherein the method comprises the following steps: the foreground dynamic target binary image comprises the following steps:

each frame of the surveillance video

5. The method for fusing a large-inclination-angle video and a 2D map according to claim 2, wherein the method comprises the following steps: before the step of geometrically correcting the still video background image, the method comprises the following steps:

performing super-pixel segmentation on a static video background image;

6. The method for fusing a large-inclination-angle video and a 2D map according to claim 1, wherein the method comprises the following steps: the moving direction of the foreground dynamic target is determined by the rotation angle of the camera.

7. The method for fusing a large-inclination-angle video and a 2D map according to claim 1, wherein the method comprises the following steps: the mutual mapping model of the monitoring video and the 2D geographic space data comprises a mapping model from a video image space to geographic space data and a model from the geographic space data to the video image space;

the mapping model of the video image space to the geospatial data is expressed as:

in the formula (X)_G，Y_G，Z_G) The space coordinate of the target is taken, (Xc, Yc, Zc) is the optical center point coordinate of the camera, (f, x, y) is a sight line vector, P and T are rotation matrixes of the camera, and lambda is a ray extension parameter;

the mapping model of the geospatial data to the video image space is represented as:

8. the method for fusing a high-inclination-angle video and a 2D map according to claim 2 or 4, wherein: the video background modeling technology adopts a Vibe algorithm.

9. The method for fusing a large-inclination-angle video and a 2D map according to claim 5, wherein the method comprises the following steps: and performing superpixel segmentation on the static video background image by adopting an SLIC superpixel segmentation algorithm.