CN113642463B

CN113642463B - Heaven and earth multi-view alignment method for video monitoring and remote sensing images

Info

Publication number: CN113642463B
Application number: CN202110930060.3A
Authority: CN
Inventors: 梁华; 李晓威
Original assignee: Guangzhou Fu'an Digital Technology Co ltd
Current assignee: Guangzhou Fu'an Digital Technology Co ltd
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2023-03-10
Anticipated expiration: 2041-08-13
Also published as: CN113642463A

Abstract

The invention discloses a method for aligning heaven and earth multiple views of video monitoring and remote sensing images, which comprises the steps of establishing a low-precision mapping relation between a monitoring camera and longitude and latitude coordinates by aligning the picture coordinates and the longitude and latitude coordinates of the monitoring camera, and obtaining a corresponding target area on the remote sensing images by the longitude and latitude coordinates obtained through conversion; then, example segmentation is respectively carried out on the monitoring camera picture and the remote sensing image target area to obtain an image block information set phi of all objects in the monitoring camera picture and an image block information set mu of all objects in the remote sensing image target area, and the characteristics of each image block are extracted; according to the characteristics, the categories and the mapping position relation of the image blocks, the coarse matching between the image blocks in the monitoring camera picture and the image blocks in the remote sensing image target area is realized by using a Hungarian algorithm, and a corresponding set of image block pairs is obtained; and selecting data by utilizing the corresponding set of image block pairs, and establishing an accurate mapping relation between a monitoring camera picture and a remote sensing image.

Description

Heaven and earth multi-view alignment method for video monitoring and remote sensing images

Technical Field

The invention relates to the field of target detection, in particular to a method for aligning heaven and earth multiple views of video monitoring and remote sensing images.

Background

Image registration is the process of geometrically matching two images taken at the same location, at different times, or with different sensors. Image registration is a precondition and basis for applications such as image fusion and change detection, and the precision of image registration has an important influence on subsequent applications.

The feature-based image registration method is one of the most common methods for image registration at present, and has the greatest advantages that various analyses performed on the whole image can be converted into analyses on image feature information, namely feature points, feature curves, edges and smaller areas, so that the operation amount in the image processing process is greatly reduced, the method has good adaptability to gray scale change, image deformation, occlusion and the like, and quick and accurate registration of the image under complex imaging conditions can be realized. Due to the influence of factors such as noise, shooting conditions, seasonal changes, visual angle changes, platform shaking and the like, the feature-based registration method is more suitable for remote sensing image registration.

If a large number of repeated features exist in the matched images, for example, identical windows or patterns exist on the surface of a building, the traditional image matching method cannot effectively distinguish the features, and finally, the two images cannot be effectively matched.

In recent years, some identification and detection methods combining surveillance video and remote sensing satellites are proposed, for example:

the invention patent with publication number CN 108132979A discloses a port ground feature monitoring method and system based on remote sensing images, the invention includes that an original ground feature database is established, a three-dimensional scanner is used for scanning all ground features to be monitored of a port at the same spatial height to obtain a three-dimensional remote sensing image set, and corresponding real photos are recorded and stored; establishing a ground object coordinate system, selecting an origin of the three-dimensional coordinate system in a relatively open area and near a port edge, simulating an X axis at the port edge, and establishing the three-dimensional coordinate system; classifying the ground objects, namely classifying the ground objects by combining the actual sizes and shapes of the ground objects; extracting surface feature boundary points, and carrying out 360-degree panoramic scanning on the port by using a three-dimensional scanner to obtain a discrete point set of the surface feature boundary; and (3) monitoring the ground features in real time, transmitting the discrete point set obtained by scanning to a data processing unit for three-dimensional remote sensing graphic simulation, extracting a three-dimensional remote sensing image with the highest approximation degree from an original ground feature database, and further matching the three-dimensional remote sensing image with a corresponding real image. However, if a large number of repeated features exist in the matched images, the features cannot be effectively distinguished.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a method for aligning the heaven and earth multiple views of a video monitoring and remote sensing image, which comprises the steps of constructing an example segmentation model to perform example segmentation on a monitoring camera picture and a remote sensing image target region to obtain an image block information set phi of all objects in the monitoring camera picture and an image block information set mu of all objects in the remote sensing image target region, and separating the objects from a background by utilizing semantic segmentation, thereby reducing the interference of the image background and improving the image matching efficiency. And then the Hungarian algorithm is utilized to obtain the optimal matching, the corresponding matching between the image blocks in the picture of the monitoring camera and the image blocks in the target area of the remote sensing image is realized, the accuracy of the image matching is improved, the corresponding relation is established between the image blocks of the monitoring camera and the remote sensing image, the accurate mapping relation between the monitoring picture and the remote sensing image is realized, and the rapid and accurate registration of the image under the complex imaging condition can be realized.

In order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows:

a method for aligning the multiple views of the sky and the ground of a video monitoring and remote sensing image comprises the following steps:

step S1: aligning the picture coordinates and the longitude and latitude coordinates of the monitoring camera, establishing a low-precision mapping relation between the picture coordinates and the longitude and latitude coordinates of the monitoring camera, converting the four picture coordinates of the left upper part, the left lower part, the right upper part and the right lower part of the monitoring camera into the longitude and latitude coordinates, and obtaining a corresponding target area on a remote sensing image according to the longitude and latitude coordinates obtained through conversion;

step S2: respectively carrying out example segmentation on a monitoring camera picture and a remote sensing image target area to obtain an image block information set phi of all objects in the monitoring camera picture and an image block information set mu of all objects in the remote sensing image target area, and extracting the characteristics of each image block;

and step S3: according to the characteristics, the categories and the mapping position relation of the image blocks in the step S2, the Rochoy algorithm is utilized to realize the rough matching between the image blocks in the monitoring camera picture and the image blocks in the remote sensing image target area, and a corresponding set of image block pairs is obtained;

and step S4: and (4) selecting data by utilizing the corresponding set of the image block pairs in the step (S3), and establishing an accurate mapping relation between the picture of the monitoring camera and the remote sensing image according to the corresponding relation of the plurality of groups of image block pairs.

Preferably, in the step S1, the process of establishing the low-precision mapping relationship between the picture coordinates and the longitude and latitude coordinates of the monitoring camera specifically includes:

step 1.1: according to a Haverine formula, calculating the vertical projection position O' of the position of the monitoring camera on the horizontal plane and the optional position A of the monitoring camera on the horizontal plane within the visible range _i Straight horizontal distance d _i In units of m, O' and A _i Horizontal distance s of longitude of _i The unit is m:

wherein: a. b are all intermediate variable values, O' (λ) ₀ ，ψ ₀ ) In order to monitor the vertical projection position of the camera position on the horizontal plane, A _i (λ _i ，ψ _i ) Is any position on a horizontal plane within the visual range of the monitoring camera, and r is the radius of the earth and the unit is m;

step 1.2: from step 1.1, calculate O' and A _i Angle beta between the connecting line of (A) and the true north direction of geography _i ：

Step 1.3: from step 1.1, calculate O and A _i Angle theta between the connecting line of (a) and the vertical line _i ：

H is the height of the monitoring camera from the horizontal plane, and the unit is m;

step 1.4: calculation of A _i In the picture coordinate (x) of the monitoring camera _i ，y _i )：

Wherein, X is the pixel width of the image, Y is the pixel height, and the parameter values of X and Y can be obtained according to the image resolution of the monitoring camera of X multiplied by Y;

theta is the included angle between the central line of the monitoring camera and the vertical line, beta is the included angle between the projection of the central line of the monitoring camera on the horizontal plane and the true north direction of geography, omega _x For monitoring the horizontal field angle, omega, of the camera _y The vertical field angle of the monitoring camera is adopted.

Preferably, in the step S2, an example segmentation model is constructed to perform example segmentation on the monitoring camera picture and the remote sensing image target region, and all objects in the monitoring camera picture and the remote sensing image target region are segmented through the example segmentation model to obtain an image block information set Φ of all objects in the monitoring camera picture and an image block information set μ of all objects in the remote sensing image target region; extracting the global features of each image block comprises: color features, texture features and shape features to obtain a feature set of each image block.

Preferably, the example segmentation model is Mask R-CNN, and a Mask R-CNN main network is a characteristic pyramid network FPN; the Mask R-CNN also comprises an area generation network RPN, a RolAlign layer and a deconvolution network Deconv, wherein the RolAlign layer is a target detection special layer; the feature pyramid network FPN carries out feature extraction on an input monitoring camera picture and a remote sensing image target area image to generate a feature map, the feature map is input into an area generation network RPN, a RolAlign layer processes the feature map and the output of the area generation network, and a processing result is input into a deconvolution network Deconv to generate a predicted mask.

Preferably, matching of the image blocks in the monitoring camera picture and the image blocks in the remote sensing image target area is realized according to the categories of the image blocks and the similarity and the characteristics of the mapping positions, and a category label of each image block is obtained according to the category of the image block, wherein the matching process between the image blocks in the monitoring camera picture and the image blocks in the remote sensing image target area in the step S3 comprises the following steps:

step S3.1: constructing a bipartite graph G = (X, Y, E), wherein X represents an image block set in a picture of a monitoring camera, Y represents an image block set in a remote sensing picture, and E represents an edge set between all nodes in X and all nodes in Y; the edge set E is constructed according to the following rules: if Sim (x, y)>0,x ∈ X, Y ∈ Y, and then an edge (X, Y) is connected between the two corresponding vertices X and Y in the bipartite graph G, and the weight w of the edge is set _xy Sif (X, Y), where sif (X, Y) represents the similarity of feature X to feature Y (X ∈ X, Y ∈ Y);

step S3.2: connecting the image blocks of the same category in the monitoring camera picture and the remote sensing image through the category label of each image block to obtain one-to-one and one-to-many conditions;

step S3.3: converting the coordinates of the image blocks in the image of the monitoring camera into longitude and latitude coordinates by using the low-precision mapping relation between the coordinates of the image of the monitoring camera and the longitude and latitude coordinates in the step S1 and taking the central point of the image block as the coordinates of the image block to obtainThe converted longitude and latitude coordinate set A = { < lambda { _i ,ψ _i > -, where < lambda _i ,ψ _i The longitude and latitude coordinate, lambda, of the ith image block in the picture of the monitoring camera after conversion is more than _i Is longitude,. Psi _i Is the latitude; image block longitude and latitude coordinate set obtained by remote sensing image map

Wherein

Representing longitude and latitude coordinates of the jth image block in the remote sensing image target area,

is longitude,. Epsilon _j If the latitude is the latitude, setting a threshold (lon, lat), if the following conditions are met, keeping the connection line between the latitude and the threshold, and if the following conditions are not met, deleting the connection line:

|ψ _i -ε _j |＜lat

obtaining a matching set between image blocks with smaller errors by screening the position similarity;

step S3.4: and (4) obtaining the optimal matching by using the Hungarian algorithm according to the matching set and the corresponding characteristic set among the image blocks obtained in the step (S3.3), and realizing the corresponding matching between the image blocks in the picture of the monitoring camera and the image blocks in the target area of the remote sensing image.

Preferably, the specific process of step S3.4 is as follows:

in a subgraph P of the bipartite graph G, any two edges in the edge set of P do not depend on the same vertex, and M is called matching; calculating the optimal matching of the bipartite graph by using a Hungarian algorithm, matching the image block features in the monitoring camera picture with the image block features in the remote sensing image target area to obtain matched feature pairs, and then calculating the similarity between the image blocks in the monitoring camera picture and the image blocks in the remote sensing image target area according to the similarity between the matched feature pairs to realize the correspondence between the image blocks in the monitoring picture and the image blocks in the remote sensing image target area to obtain a corresponding set of the image block pairs.

Preferably, in the step S4, the process of establishing the accurate mapping relationship between the monitoring camera picture and the remote sensing image is as follows:

selecting three groups of data from the corresponding set of the image block pairs in the step S3 each time, establishing a coordinate mapping relation according to the corresponding relation among the plurality of groups of image block pairs, and obtaining a plurality of transformation matrixes M through inverse matrix calculation _i ：

Wherein (lon) _i1 ,lat _i1 )、(lon _i2 ,lat _i2 )、(lon _i3 ,lat _i3 ) Three groups of longitude and latitude coordinates of image blocks in a remote sensing image target area (x) respectively _i1 ,y _i1 )、(x _i2 ,y _i2 )、(x _i3 ,y _i3 ) Three groups of coordinates of image blocks in the picture of the monitoring camera are respectively;

taking a plurality of transformation matrices M _i Average value of (a):

the accurate conversion relation between the picture coordinate of the monitoring camera and the longitude and latitude coordinate of the remote sensing image is as follows:

wherein (lon, lat) is the longitude and latitude coordinates of the remote sensing image, (x, y) is the picture coordinates of the monitoring camera, and M is a transformation matrix.

Compared with the prior art, the invention has the beneficial technical effects that:

the method and the device perform example segmentation on the monitoring camera picture and the remote sensing image target area by constructing an example segmentation model to obtain the image block information set phi of all objects in the monitoring camera picture and the image block information set mu of all objects in the remote sensing image target area, and separate the objects from the background by semantic segmentation, thereby reducing the interference of the image background and improving the image matching efficiency. And then the Hungarian algorithm is utilized to obtain the optimal matching, the corresponding matching between the image blocks in the picture of the monitoring camera and the image blocks in the target area of the remote sensing image is realized, the accuracy of the image matching is improved, the corresponding relation is established between the image blocks of the monitoring camera and the remote sensing image, the accurate mapping relation between the monitoring picture and the remote sensing image is realized, and the rapid and accurate registration of the image under the complex imaging condition can be realized.

Drawings

FIG. 1 is a flow chart of a method for aligning multiple views of a video surveillance and remote sensing image in accordance with an embodiment of the present invention;

FIG. 2 is a first schematic diagram of a first method for calculating a mapping relationship between longitude and latitude coordinates and frame coordinates of a monitoring camera in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a second method for calculating a mapping relationship between longitude and latitude coordinates and coordinates of a monitoring camera frame in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a coordinate position of a monitor camera in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments, but the scope of the present invention is not limited to the following embodiments.

Examples

Referring to fig. 1, the present embodiment discloses a method for aligning a plurality of views of a video surveillance and a remote sensing image, which includes the following steps:

In the step S1, the process of establishing the low-precision mapping relationship between the picture coordinates and the longitude and latitude coordinates of the monitoring camera is specifically as follows:

step S1.0: parameter acquisition preparation:

as shown in fig. 2 to 4, N in fig. 3 indicates the geographical true north direction, the height of the monitoring camera from the horizontal plane is measured as H, the included angle between the center line of the monitoring camera and the vertical line is measured as θ, the included angle between the projection of the center line of the monitoring camera on the horizontal plane and the geographical true north direction is measured as β, and the horizontal field angle of the monitoring camera is measured as ω _x The vertical field angle of the monitoring camera is omega _y Acquiring the resolution parameter information of the image of the monitoring camera as X multiplied by Y (X is the pixel width of the image, and Y is the pixel height);

suppose that: the coordinate of the center of the monitoring camera is (0,0), the vertical projection position of the position O where the monitoring camera is located on the horizontal plane is O', and the longitude and latitude are (lambda) ₀ ，ψ ₀ ) Aiming at any position A on the horizontal plane within the visual range of the monitoring camera _i Latitude and longitude coordinates (λ) _i ，ψ _i ) Can be as followsConversion to surveillance camera frame coordinates (x) _i ，y _i )；

Step 1.1: according to a Haverine formula, calculating the vertical projection position O' of the position of the monitoring camera on the horizontal plane and the optional position A of the monitoring camera on the horizontal plane within the visible range _i Straight horizontal distance d _i In units of m, O' and A _i Longitude horizontal distance s of _i The unit is m:

wherein: a. b are all intermediate variable values, O' (λ) ₀ ，ψ ₀ ) For monitoring the vertical projection position of the camera on the horizontal plane, A _i (λ _i ，ψ _i ) Is any position on a horizontal plane within the visible range of a monitoring camera, and r is the radius of the earth and the unit is m;

step 1.2: from step 1.1, O' and A are calculated _i Angle beta between connecting line of (a) and true north direction of geography _i ：

Step 1.3: from step 1.1, calculate O and A _i Angle theta between the connecting line and the vertical line _i ：

According to the low-precision mapping relation between the picture coordinates and the longitude and latitude coordinates of the monitoring camera established by the method, the four picture coordinates of the left upper part, the left lower part, the right upper part and the right lower part of the monitoring camera are converted into the longitude and latitude coordinates, a corresponding target area is obtained on a remote sensing image through the longitude and latitude coordinates obtained through conversion, and further the position information of each image block is obtained;

in the step S2, instance segmentation is carried out on the monitoring camera picture and the remote sensing image target area by constructing an instance segmentation model, and all objects in the monitoring camera picture and the remote sensing image target area are segmented through the instance segmentation model to obtain an image block information set phi of all objects in the monitoring camera picture and an image block information set mu of all objects in the remote sensing image target area; extracting the global features of each image block comprises: color features, texture features and shape features to obtain a feature set of each image block.

The example segmentation model is Mask R-CNN, and a Mask R-CNN main network is a feature pyramid network FPN; the Mask R-CNN also comprises an area generation network RPN, a RolAlign layer and a deconvolution network Deconv, wherein the RolAlign layer is a target detection special layer; the feature pyramid network FPN carries out feature extraction on an input monitoring camera picture and a remote sensing image target area image to generate a feature map, the feature map is input into an area generation network RPN, a RolAlign layer processes the feature map and the output of the area generation network, and a processing result is input into a deconvolution network Deconv to generate a predicted mask.

According to the category of the image blocks and the similarity and the characteristics of the mapping positions, the matching between the image blocks in the image of the monitoring camera and the image blocks in the target area of the remote sensing image is realized, the category label of each image block is obtained according to the category of the image blocks, and the matching process between the image blocks in the image of the monitoring camera and the image blocks in the target area of the remote sensing image in the step S3 comprises the following steps:

step S3.3: converting the image block coordinates in the monitoring camera picture into longitude and latitude coordinates by using the low-precision mapping relation between the picture coordinates and the longitude and latitude coordinates of the monitoring camera in the step S1 and taking the central point of the image block as the coordinates of the image block to obtain a converted longitude and latitude coordinate set A = { < lambda { (λ) } _i ,ψ _i > -, where < lambda _i ,ψ _i The longitude and latitude coordinate, lambda, of the ith image block in the picture of the monitoring camera after conversion is more than _i Is longitude,. Psi _i Is the latitude; image block longitude and latitude coordinate set obtained by remote sensing image map

Wherein

is longitude,. Epsilon _j If the latitude is the latitude, setting a threshold (lon, lat), and if the following conditions are met, keeping the connection between the latitude and the threshold, otherwise, deleting the connection:

|ψ _i -ε _j |＜lat

Preferably, the specific process of step S3.4 is as follows:

In step S4, the process of establishing an accurate mapping relationship between the monitoring camera picture and the remote sensing image is as follows:

Wherein (lon) _i1 ,lat _i1 )、(lon _i2 ,lat _i2 )、(lon _i3 ,lat _i3 ) Three groups of longitude and latitude coordinates of image blocks in a remote sensing image target area (x) respectively _i1 ,y _i1 )、(x _i2 ,y _i2 )、(x _i3 ,y _i3 ) Three groups of coordinates of image blocks in a picture of the monitoring camera are respectively;

taking a plurality of transformation matrices M _i Average value of (a):

the accurate conversion relationship between the picture coordinate of the monitoring camera and the latitude and longitude coordinates of the remote sensing image is as follows:

And finally, establishing an accurate mapping relation between the monitoring camera picture and the remote sensing image through the step S4, and improving the accuracy of image matching so as to realize rapid and accurate registration of the image under the complex imaging condition.

Variations and modifications to the above-described embodiments may occur to those skilled in the art, which fall within the scope and spirit of the above description. Therefore, the present invention is not limited to the specific embodiments disclosed and described above, and some modifications and variations of the present invention should fall within the scope of the claims of the present invention. Furthermore, although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. A method for aligning the multiple views of the sky and the ground of a video monitoring and remote sensing image is characterized by comprising the following steps:

and step S4: selecting data by utilizing the corresponding set of the image block pairs in the step S3, and establishing an accurate mapping relation between a monitoring camera picture and a remote sensing image according to the corresponding relation of the plurality of groups of image block pairs;

the matching process between the image blocks in the monitoring camera picture and the image blocks in the remote sensing image target area in the step S3 comprises the following steps:

step S3.1: construction bipartite graph G =(X, Y, E), wherein X represents an image block set in a picture of the monitoring camera, Y represents an image block set in a remote sensing picture, and E represents an edge set between all nodes in X and all nodes in Y; the edge set E is constructed according to the following rules: if Sim (x, y)>0,x ∈ X, Y ∈ Y, and then an edge (X, Y) is connected between the two corresponding vertices X and Y in the bipartite graph G, and the weight w of the edge is set _xy Sif (X, Y), where sif (X, Y) represents the similarity of feature X to feature Y (X ∈ X, Y ∈ Y);

step S3.3: converting the image block coordinates in the monitoring camera picture into longitude and latitude coordinates by using the low-precision mapping relation between the picture coordinates and the longitude and latitude coordinates of the monitoring camera in the step S1 and taking the central point of the image block as the coordinates of the image block to obtain a converted longitude and latitude coordinate set A = { < lambda { (λ) } _i ,ψ _i > } where < lambda _i ,ψ _i The longitude and latitude coordinate, lambda, of the ith image block in the picture of the monitoring camera after conversion is more than _i Is longitude,. Psi _i Is the latitude; image block longitude and latitude coordinate set obtained by remote sensing image map

Wherein

|ψ _i -ε _j |＜lat

2. The method according to claim 1, wherein in step S1, the process of establishing the low-precision mapping relationship between the coordinates of the monitor camera and the coordinates of latitude and longitude is as follows:

wherein: a. b are all intermediate variable values, O' (λ) ₀ ，ψ ₀ ) For monitoring the vertical projection position of the camera on the horizontal plane, A _i (λ _i ，ψ _i ) Is any position on a horizontal plane within the visual range of the monitoring camera, and r is the radius of the earth and the unit is m;

3. The method for multi-view alignment of the sky and the earth of the video surveillance and remote sensing images according to claim 1, wherein in the step S2, instance segmentation is performed on the surveillance camera picture and the remote sensing image target area by constructing an instance segmentation model, and all objects in the surveillance camera picture and the remote sensing image target area are segmented by the instance segmentation model to obtain an image block information set Φ of all the objects in the surveillance camera picture and an image block information set μ of all the objects in the remote sensing image target area; extracting the global features of each image block comprises: color features, texture features and shape features to obtain a feature set of each image block.

4. The method according to claim 3, wherein the instance segmentation model is Mask R-CNN, and a Mask R-CNN backbone network is a feature pyramid network FPN; the Mask R-CNN also comprises an area generation network RPN, a RolAlign layer and a deconvolution network Deconv, wherein the RolAlign layer is a target detection special layer; the FPN carries out feature extraction on the input monitoring camera picture and the remote sensing image target area image to generate a feature map, the feature map is input into an area generation network RPN, the RolAlign layer processes the feature map and the output of the area generation network, and the processing result is input into a deconvolution network Deconv to generate a predicted mask.

5. The method for multi-view alignment of video surveillance and remote sensing images according to claim 1, wherein the specific process of step S3.4 is as follows:

in one sub-graph P of the bipartite graph G, any two edges in an edge set of the P do not depend on the same vertex, the best matching of the bipartite graph is calculated by using a Hungarian algorithm, the image block features in a monitoring camera picture are matched with the image block features in a remote sensing image target area to obtain matched feature pairs, then the similarity between the image blocks in the monitoring camera picture and the image blocks in the remote sensing image target area is calculated through the similarity between the matched feature pairs, the correspondence between the image blocks in the monitoring picture and the image blocks in the remote sensing image target area is realized, and the corresponding set of the image block pairs is obtained.

6. The method according to claim 1, wherein in step S4, the process of establishing the precise mapping relationship between the monitoring camera picture and the remote sensing image is as follows:

taking a plurality of transformation matrices M _i Average value of (d):

wherein (lon, lat) is longitude and latitude coordinates of the remote sensing image, (x, y) are picture coordinates of the monitoring camera, and M is a transformation matrix.