CN111369495B

CN111369495B - Panoramic image change detection method based on video

Info

Publication number: CN111369495B
Application number: CN202010095879.8A
Authority: CN
Inventors: 郑世强; 陈前
Original assignee: Perot Beijing Information Technology Co ltd
Current assignee: Perot Beijing Information Technology Co ltd
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2024-02-02
Anticipated expiration: 2040-02-17
Also published as: CN111369495A

Abstract

The invention relates to a change detection method of a panoramic image based on video, which comprises the steps of continuously changing a camera visual angle, shooting video data covering an area to be detected, acquiring continuous images by adopting a frame extraction method, and splicing to form the panoramic image of the area to be detected; taking the panoramic image as base map data, shooting video data of an area to be detected, acquiring a real-time image of the area to be detected by a frame extraction method, registering with the panoramic image, and positioning a corresponding area of the real-time image on the panoramic image; and comparing the real-time image with the corresponding area on the panoramic image, and judging whether a change area exists or not. The method has high automation degree, is stable and reliable, can be used for continuous long-time monitoring of a typical area, and can be applied to the fields of monitoring illegal buildings, urban dynamic monitoring and the like.

Description

Panoramic image change detection method based on video

Technical Field

The invention relates to the field of image change detection, in particular to a change detection method of a panoramic image based on video.

Background

The image change detection technology is widely applied to the fields of illegal building monitoring, urban dynamic monitoring, geographic information updating and the like, urban illegal building detection is taken as an example, along with the continuous development of the economic society of China, the urban building is continuously accelerated, the urban building is continuously increased, the number and the scale of illegal buildings are continuously increased, and the phenomenon not only damages urban planning and urban landscapes, but also affects urban image and resident life, and becomes one of negative factors affecting social harmony. At present, the 'illegal cost is low, the law enforcement cost is high', which is one of the main reasons for the frequent occurrence of illegal buildings, the detection of the illegal buildings is weak besides the lack of relevant legal links, and the manual inspection mode is utilized to have a plurality of defects due to the lack of automatic monitoring means for the illegal buildings, so that the discovery process has longer period, and the large-scale monitoring cost is high. The method with high automation degree, robustness and reliability is urgently needed in the market to detect the urban illegal buildings, so that the improvement process of the urban illegal buildings is promoted.

Disclosure of Invention

In order to solve the problems, the invention provides a video-based panoramic image change detection method which has high automation degree, is stable and reliable, and is used in the fields of illegal building monitoring, city dynamic monitoring and the like, thereby promoting the repair process of urban illegal buildings.

The technical scheme of the invention is as follows:

the invention relates to a change detection method of a panoramic image based on video, which comprises the following steps:

continuously changing the view angle of a camera, shooting video data covering an area to be detected, acquiring continuous images by adopting a frame extraction method, and splicing to form a panoramic image of the area to be detected;

taking the panoramic image as base map data, shooting video data of an area to be detected, acquiring a real-time image of the area to be detected by a frame extraction method, registering with the panoramic image, and positioning a corresponding area of the real-time image on the panoramic image;

and comparing the real-time image with the corresponding area on the panoramic image, and judging whether a change area exists or not.

Further, stitching to form a panoramic image of the region to be detected includes:

extracting feature points of the continuous images by adopting a SIFT algorithm, wherein the feature point extraction comprises the steps of constructing a scale space, accurately positioning key points, determining the direction of the key points and generating feature descriptors of the key points;

the adjacent continuous images are subjected to characteristic point matching to realize image registration, the characteristic descriptors of key points are adopted to realize rough matching of the characteristic points, the condition of one-to-many and many-to-one in the matching point pairs is deleted, and then error matching points are removed;

after registering the images, establishing an overall optimized error equation based on a homographic matrix, and establishing an overall transformation model between adjacent images to splice the images so as to obtain the panoramic image.

Further, the feature point extraction of the continuous images by using the SIFT algorithm comprises:

constructing a scale space by utilizing a Gaussian image pyramid and a differential pyramid, wherein the Gaussian pyramid is realized based on a GPU; the difference Gaussian pyramid is obtained by subtracting two adjacent layers of the Gaussian pyramid;

detecting extreme points in the whole Gaussian differential scale space as characteristic points, calculating the principal curvatures of the characteristic points through a Hessian matrix, and eliminating the characteristic points with the principal curvatures larger than a given threshold;

taking the whole differential pyramid as input, and calculating a characteristic point direction gradient module value; the characteristic point direction, the computation of the gradient modulus value adopts a GPU, and the searching adopts a CPU along the main direction and the auxiliary direction;

and generating a feature descriptor of each feature point by using the GPU.

Further, the feature point matching of the adjacent continuous images includes:

coarse matching is carried out on the feature points of the adjacent continuous images to obtain a feature point pair, a certain feature point is selected, the feature point with the nearest Euclidean distance and the feature point with the next nearest Euclidean distance between the feature descriptors corresponding to the feature points are found in the adjacent images, the ratio of the distances between the nearest feature point and the feature point with the next nearest Euclidean distance is calculated, and if the ratio is in a set threshold range, the feature point pair is formed;

deleting one-to-many and many-to-one feature point pairs by utilizing the feature point pair pixel coordinate uniqueness;

adopting slope constraint and polar constraint to remove characteristic point pairs which do not meet the requirements;

removing the characteristic point pairs where the characteristic points outside the perspective transformation model are located based on the perspective transformation model by utilizing a RANSAC algorithm;

the remaining pairs of feature points serve as matched feature points.

Further, establishing an overall optimized error equation based on the homographic matrix, and establishing an overall transformation model between adjacent images to splice the images, wherein the initial registration and the accurate registration are included;

the initial registration is the initialization of the initial registration of the image by using a least square method;

accurate registration includes: calculating a homographic matrix of one image in the adjacent images, projecting the characteristic points to the other image in the adjacent images according to the characteristic point pairs, and calculating deformed characteristic points; and (3) carrying out overall optimization on each homographic matrix corresponding to the continuous images obtained by using a Levenberg-Marquet algorithm, obtaining the value of each homographic matrix, and splicing the images according to the parameters after overall optimization.

Further, after a real-time image of a region to be detected is obtained, extracting feature points of the real-time image by adopting a SIFT algorithm, wherein the feature points comprise a scale space, accurate positioning of key points, determination of the direction of the key points and generation of feature descriptors of the key points; and matching the feature points with the panoramic image, and positioning the feature points to the corresponding area of the image on the panoramic image.

Further, the feature point extraction of the real-time image by using the SIFT algorithm includes:

constructing a scale space by using a Gaussian image pyramid and a differential pyramid, wherein the Gaussian pyramid and the differential pyramid are realized based on a GPU; the difference Gaussian pyramid is obtained by subtracting two adjacent layers of the Gaussian pyramid;

taking the whole differential pyramid as input, and calculating a characteristic point direction gradient module value; the characteristic point direction, the computation of the gradient modulus value adopts GPU, and the main direction and the auxiliary direction are searched for adopting CPU;

and generating a feature descriptor of each feature point by using the GPU.

Further, performing feature point matching with the panoramic image, and locating the corresponding region of the image on the panoramic image comprises:

performing rough matching on the image obtained by the frame extraction method and the feature points of the panoramic image to obtain feature point pairs, calculating the ratio of the distance between a certain feature point and the nearest neighbor feature point to the distance between the feature point and the nearest neighbor feature point, and forming the feature point pairs if the feature point pairs are positioned in a set range;

deleting one-to-many and many-to-one feature points in the feature point pairs by utilizing the feature point pair pixel coordinate uniqueness;

adopting slope constraint elimination and epipolar constraint elimination to eliminate feature points which do not meet the requirements;

eliminating feature points which do not meet the requirements in the feature point pairs based on a perspective transformation model by utilizing a RANSAC algorithm;

the rest characteristic point pairs are used as matched characteristic points, and the corresponding areas of the image on the panoramic image are positioned according to the positions of the matched characteristic points.

Further, determining whether a change region exists includes:

preprocessing the real-time image and the corresponding area image on the panoramic image, and respectively normalizing;

respectively calculating gradient images, taking the difference between the left pixel point and the right pixel point of the pixel points as the gradient in the x direction, taking the difference between the upper pixel point and the lower pixel point as the gradient in the y direction, and converting the gradient into a module value and a direction;

respectively dividing the real-time image and the corresponding region image on the panoramic image obtained by the frame extraction method into grids with the same size, and calculating a gradient histogram for each grid;

respectively calculating gradient histograms of the real-time image and the corresponding region image on the panoramic image by a frame extraction method, combining grids into blocks for all the counted grids, connecting HOG features of the grids in series to form HOG features of the blocks, and carrying out normalization processing on the combined blocks to obtain HOG descriptors;

and comparing HOG descriptors of the combined blocks of the real-time image and the corresponding region image on the panoramic image, and judging the region as a change region if the HOG descriptor difference value of a certain combined block exceeds a set threshold value.

Further, if it is determined that the change area is a change area, outputting the change area; and continuously acquiring continuous real-time images of the region to be detected by adopting a frame extraction method, splicing to form a panoramic image of the region to be detected, and updating the panoramic image serving as base map data.

Compared with the prior art, the invention has the following remarkable progress:

1. the panoramic image is used as a change detection base image, so that the panoramic image can be used for multi-time phase comparison, and the robustness of a detection result is improved.

2. GPU calculation is introduced, so that the calculation speed is increased.

3. The direction gradient histogram is applied to image change detection, so that image characteristics are better utilized.

4. The accuracy and the reliability of the change detection are improved, and the method has important significance for monitoring illegal buildings, urban dynamic monitoring and the like.

Drawings

FIG. 1 is a flow chart of panoramic image based change detection in accordance with the present invention;

FIG. 2 is a flow chart of the present invention for generating panoramic images;

FIG. 3 is a flow chart of video-based panoramic image change detection;

fig. 4 is a flowchart of how feature change detection is performed in accordance with the present invention.

Detailed Description

The objects, technical solutions and advantages of the present invention will become more apparent by the following detailed description of the present invention with reference to the accompanying drawings. It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.

The invention provides a panoramic image change detection-based flow chart, which is shown in fig. 1 and comprises the following steps:

s100, continuously changing the view angle of a camera, shooting video data covering an area to be detected, acquiring continuous images by adopting a frame extraction method, and splicing to form a panoramic image of the area to be detected;

fig. 2 is a flowchart of the panoramic image generation method, which specifically comprises the following steps:

s110, acquiring video data through a wide-format camera, acquiring continuous images by adopting a frame extraction method, acquiring the attribute of an acquired video structure body, outputting pictures at intervals of a certain frame number, wherein the value range of the frame number is 3-10, and generating a panoramic image.

The video data covering the area to be detected is photographed by rotating or moving the camera, and the camera is placed on the fixed support to rotate around the vertical axis or move along the track, so that the scaling ratio is kept unchanged. In one embodiment, the frame rate of video capture is 30 frames/second, the field of view taken is 30 degrees, and the rotational speed is 30 degrees/second. Then the spacing between adjacent frames is 1 degree and the overlap can be up to 97%. Because the overlapping degree of the adjacent frames is large, in order to reduce the operation amount, a key frame can be extracted every 15 frames, and the overlapping degree can reach 50%.

S120, feature point extraction is carried out on continuous images by adopting a SIFT algorithm, and for a large amount of data, CPU and GPU are adopted to cooperatively process through a CUDA (compute unified device architecture), so that the processing time is reduced. The feature point extraction mainly comprises four main steps of constructing a scale space, accurately positioning key points, determining the direction of the key points and generating descriptors of the key points.

S130, feature point matching. Firstly, coarse matching of feature points is realized by adopting SIFT feature descriptors, then, the condition of one-to-many and many-to-one in matching point pairs is deleted, then, a slope constraint is adopted to remove part of matching point pairs with obvious errors, further, the related principle of polar line geometry is utilized to remove part of error matching point pairs, and finally, RANSAC is adopted to remove 'outer points'.

And S140, after registering the images, establishing an overall optimized error equation based on a homographic matrix, and establishing an overall transformation model between the images to splice the images so as to obtain the panoramic image.

Specifically, the step S120 specifically includes:

s121 constructs a scale space using a gaussian image pyramid and a differential pyramid. The gaussian pyramid has N groups, each group having S-layer images, for each layer of data, its corresponding gaussian kernel is calculated. And convolving the data of the ith layer in the group to obtain the image data of the (i+1) th layer, and repeating the convolution operation to obtain all the image data of the Gaussian pyramid. The convolution operation is based on GPU implementation. And executing the Kernel function once to generate one layer in each group of images of the pyramid, and performing N times of S times of circulation to generate the whole pyramid. The construction of the differential pyramid is realized by adopting a GPU, so that the image data of one layer in the differential pyramid can be obtained by subtracting the upper image data and the lower image data in each group, and each group of the generated differential pyramid has i-1 layer images for the pyramid with i layers of images.

S122, the detection of the extreme points is carried out in the whole Gaussian differential scale space, and the detection is realized by adopting a GPU. The input is image data of the entire differential pyramid, and the output is the position and scale of the detected feature points. For each pixel point in the Gaussian differential scale space, comparing the pixel point with the pixel points of 3*3 of adjacent scales above and below the scale where the pixel point is located, if the current pixel point is all larger than or all smaller than the surrounding pixel points, further removing the characteristic point with excessively low contrast, performing interpolation for 5 times through circulation, judging the contrast condition after interpolation if the interpolation is successful, discarding the point if the interpolation is smaller than a threshold value, and taking the point as the characteristic point if the interpolation is larger than or equal to the threshold value. And calculating the principal curvature of the feature points at the same time, solving through a Hessian matrix, and discarding the feature points if the principal curvature is larger than a given threshold value.

The data required for calculating the characteristic point direction gradient modulus value comprises a characteristic point array output by the extreme point detection part and pixel point information around each characteristic point, and the position and the scale of the characteristic point are not fixed, so that the whole differential pyramid is used as input to calculate the characteristic point direction gradient modulus value. The feature point direction and the modulus value are calculated by using a GPU, and the main direction and the auxiliary direction are searched by using a CPU.

S124, generating a feature descriptor and calculating the feature descriptor by using the GPU, wherein the strategy is the same as that when calculating the feature point direction modulus value. The whole grid is used for calculating the contribution of the gradient direction and the gradient size of the pixel points around each feature point to the histogram describing the feature point, and finally 128-dimensional feature vectors of each feature point are obtained; each block in grid is responsible for calculating 128-dimensional feature vectors of one feature point; each thread in the block is responsible for calculating the modular value contribution degree of each pixel point in the gaussian radius to 8 directions of the seed point formed by the small square of 4*4 where the pixel point is located, and accumulating the modular value contribution degrees to the corresponding directions, wherein a total of 4*4 total 16 seed points are needed to be calculated to form the feature vector of 128.

The step S130 specifically includes:

s131, rough matching of the feature points adopts Euclidean distance to judge the similarity degree, and a matching result is obtained. Selecting a certain characteristic point in the image, searching the image associated with the characteristic point, finding the characteristic point with the nearest Euclidean distance between 128-dimensional characteristic descriptors corresponding to the characteristic point and the next nearest characteristic point, comparing the distance between the characteristic point and the nearest characteristic point with the distance between the characteristic point and the next nearest characteristic point, and if the ratio meets the threshold requirement (between 0.6 and 0.8), considering that the matching is correct and does not meet the threshold requirement, and discarding.

S132 deletes one-to-many and many-to-one matching points by utilizing the feature point-to-pixel coordinate uniqueness.

S133, further eliminating partial wrong matching points by using slope constraint and epipolar constraint. Slope constraint and epipolar constraint thresholds are set respectively, and matching points larger than the thresholds are deleted. After the slope constraint, the obviously wrong matching points can be removed, and a small part of the matching point pairs with less obvious errors can be deleted by using the polar constraint.

S134, removing the last residual mismatching points based on the perspective transformation model by using a RANSAC algorithm. Firstly randomly selecting a minimum sampling set from the whole data set, calculating initial values of related model parameters through the sampling sets, searching other inner points in the data set through the calculated model, and removing outer points.

The step S140 specifically includes:

the process of splicing is divided into two steps of initial registration and accurate registration. The initial registration is to acquire an initial registration relation of the image according to least square, but the least square is a direct solving method and is easy to be influenced by rough differences. Accurate registration, namely calculating deformed image feature points according to the current image feature points and a homographic matrix (H is a matrix of 3 multiplied by 3); then calculating the translation amount between every two images according to the new image characteristic points; updating the center point coordinates of each image according to the translation amount; calculating control point coordinates (the control point coordinates take the upper left corner of the first picture as the origin of coordinates) according to the updated center point coordinates and the updated image feature points; and constructing a global equation, and integrally calculating each homographic matrix. And (3) carrying out overall optimization on the homographic matrixes corresponding to all the image sequences by using a Levenberg-Marquet algorithm, obtaining an accurate value of each homographic matrix, and splicing all the images according to overall optimized parameters. The Levenberg-Marquet algorithm is adopted to further optimize the initial matching, so as to obtain a more accurate registration effect.

S200, taking the panoramic image as base map data, shooting video data of a region to be detected, acquiring a real-time image of the region to be detected by a frame extraction method, registering with the panoramic image, and positioning a corresponding region of the real-time image on the panoramic image;

as shown in fig. 3, the method specifically includes:

s210, taking the panoramic image as base map data, and shooting video data of a region to be detected

S220, acquiring a real-time image of the region to be detected by a frame extraction method;

s230, extracting image feature points

S231, carrying out rough matching on the image obtained by the frame extraction method and the characteristic points of the panoramic image to obtain a characteristic point pair, calculating the ratio of the distance between a certain characteristic point and the nearest neighbor characteristic point to the distance between the characteristic point and the nearest neighbor characteristic point, and forming the characteristic point pair if the characteristic point pair is positioned in a set range;

s232, deleting one-to-many and many-to-one feature points in the feature point pairs by utilizing the feature point pair pixel coordinate uniqueness;

s233, adopting slope constraint elimination and epipolar constraint elimination to eliminate feature points which do not meet the requirements;

s234, eliminating feature points which do not meet the requirements in the feature point pairs based on the perspective transformation model by utilizing a RANSAC algorithm; the rest characteristic point pairs are used as matched characteristic points, and the corresponding areas of the image on the panoramic image are positioned according to the positions of the matched characteristic points.

S240, matching the feature points with the panoramic image, and positioning the feature points to the corresponding area of the image on the panoramic image, wherein the feature points comprise:

s241, carrying out rough matching on the image obtained by the frame extraction method and the characteristic points of the panoramic image to obtain a characteristic point pair, calculating the ratio of the distance between a certain characteristic point and the nearest neighbor characteristic point to the distance between the characteristic point and the nearest neighbor characteristic point, and forming the characteristic point pair if the characteristic point pair is positioned in a set range;

s242, deleting one-to-many and many-to-one feature points in the feature point pairs by utilizing the feature point-to-pixel coordinate uniqueness;

s243, adopting slope constraint elimination and epipolar constraint elimination to eliminate feature points which do not meet the requirements;

s244, eliminating feature points which do not meet the requirements in the feature point pairs based on the perspective transformation model by utilizing a RANSAC algorithm; the rest characteristic point pairs are used as matched characteristic points, and the corresponding areas of the image on the panoramic image are positioned according to the positions of the matched characteristic points.

S300, comparing the real-time image with the corresponding area on the panoramic image, and judging whether a change area exists or not. And constructing features by utilizing HOG feature calculation and a gradient direction histogram of a statistical image local region, further performing change detection, and extracting a change region.

Fig. 4 is a flowchart of how the HOG feature change detection of the present invention is implemented as follows:

s310, preprocessing an image, normalizing the image, and reducing local shadow and illumination change of the image, wherein the normalization method adopts Gamma Correction.

S320, calculating a gradient image, taking the difference between the left pixel point and the right pixel point of the pixel points as the gradient in the x direction, taking the difference between the upper pixel point and the lower pixel point as the gradient in the y direction, and converting the gradient image into a module value and a direction.

S330 calculates a grid gradient histogram, divides the image into grids of 32 x 32 size, and calculates a gradient histogram for each grid.

S340 calculates an image gradient histogram, combines all the counted meshes into a block, and concatenates HOG features of the meshes into HOG features of the block. On this basis, the combined blocks are processed in a normalized manner, the result of which is the HOG descriptor.

S350, comparing the HOG descriptors of the corresponding areas of the image to be detected and the panoramic image, and judging whether the image to be detected is a change area or not by setting a threshold value.

Outputting the change region if the change region is determined; and continuously acquiring continuous real-time images of the region to be detected by adopting a frame extraction method, splicing to form a panoramic image of the region to be detected, and updating the panoramic image serving as base map data.

In summary, the present invention provides a method for detecting a change of a panoramic image based on video, including: video acquisition, wherein a wide camera is adopted to acquire video images, a specified region video is acquired, and a frame extraction method is adopted to acquire continuous images for generating panoramic images; stitching panoramic images, and registering and stitching continuous images to obtain panoramic images; and detecting the change, namely acquiring an image of a video acquired by a camera by using the panoramic image as a base map, registering the image with the panoramic image, positioning the image at the position of the panoramic image, further detecting the change, and extracting a change region. The panoramic image is used as a change detection base image, so that the panoramic image can be used for multi-time phase comparison, and the robustness of a detection result is improved; and GPU calculation is introduced to accelerate calculation speed, and the image characteristics are better utilized when the directional gradient histogram is applied to image change detection.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explanation of the principles of the present invention and are in no way limiting of the invention. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the scope of the present invention. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.

Claims

1. A method for detecting a change in a panoramic image based on video, comprising:

continuously changing the view angle of a camera, shooting video data covering an area to be detected, acquiring continuous images by adopting a frame extraction method, and splicing to form a panoramic image of the area to be detected, wherein the method comprises the following steps:

after registering the images, establishing an overall optimized error equation based on a homographic matrix, and establishing an overall transformation model between adjacent images to splice the images so as to obtain the panoramic image;

comparing the real-time image with the corresponding area on the panoramic image, and judging whether a change area exists or not;

the feature point extraction of the continuous images by adopting the SIFT algorithm comprises the following steps:

generating a feature descriptor of each feature point by adopting the GPU;

the feature point matching of adjacent continuous images comprises the following steps:

the rest characteristic point pairs are used as matched characteristic points;

the determining whether a change region exists includes:

2. The method for detecting changes in a panoramic image based on video according to claim 1, wherein establishing an overall optimized error equation based on a Homography matrix, establishing an overall transformation model between adjacent images for stitching of the images includes an initial registration and an accurate registration;

3. The method for detecting a change in a panoramic image based on video according to claim 1, further comprising outputting a change area if it is determined as the change area; and continuously acquiring continuous real-time images of the region to be detected by adopting a frame extraction method, splicing to form a panoramic image of the region to be detected, and updating the panoramic image serving as base map data.