CN114554158A

CN114554158A - Panoramic video stitching method and system based on road traffic scene

Info

Publication number: CN114554158A
Application number: CN202210188383.4A
Authority: CN
Inventors: 杨秋红; 董楠; 赵晓龙; 金涛; 倪守城; 孙炼杰; 王学方; 任凡
Original assignee: Chongqing Changan Automobile Co Ltd
Current assignee: Chongqing Changan Automobile Co Ltd
Priority date: 2022-02-28
Filing date: 2022-02-28
Publication date: 2022-05-27

Abstract

The invention discloses a panoramic video stitching method and a panoramic video stitching system based on a road traffic scene, which comprise image acquisition, image distortion correction and pretreatment, image characteristic point extraction, image matching, matched characteristic point pair filtering, acquisition of a homography matrix of projection transformation and image fusion. The extraction and the feature point matching of the feature points are realized through a deep learning network, when the image feature points are extracted, the feature points of a dynamic foreground and a static background are respectively extracted, the relative position relation between different cameras is dynamically obtained by extracting the feature points of the background and the foreground in an overlapping area between the different cameras and matching the feature points, and the real-time fusion and splicing of images among the cameras in a parcel road traffic system are realized; the problem of traditional single camera monitoring system, need relevant personnel's manual work to call out corresponding a plurality of camera images, compare the amalgamation, just can acquire useful information scheduling is solved, perfect the target is simultaneously followed tracks the problem across the camera.

Description

Panoramic video stitching method and system based on road traffic scene

Technical Field

The invention relates to the technical field of traffic management and video monitoring, in particular to a panoramic video stitching method and a panoramic video stitching system based on a road traffic scene.

Background

With the gradual flourishing of economy and the continuous development of science and technology, the comprehensive real-time monitoring of urban road traffic becomes more important, and the establishment of a comprehensive road traffic image management platform by scientific and technical means becomes the target of traffic management departments in various regions. At present, a road traffic monitoring system basically realizes a digital network system covering main roads, auxiliary roads and intersections of cities and towns, is provided with corresponding image monitoring equipment and operating software, and can transmit the running condition of road traffic to a corresponding dispatching center in real time to monitor the running condition of the road traffic. Once a problem occurs, feedback is made at a first time. However, the existing monitoring equipment is mainly the existing security cameras, there is no intuitive position relation between the cameras, once the condition of crossing the cameras is found, the images of a plurality of corresponding cameras need to be called manually by related personnel, and the images are compared one by one to be spliced, so that the useful information can be acquired, which is time-consuming and labor-consuming.

CN113055613A discloses a "method and apparatus for panoramic video stitching based on mine scene", which adopts a traditional SURF algorithm to extract feature points and generate feature descriptors, where the SURF algorithm cannot implement real-time processing of video frames in practical application, and basically can exclude engineering application; moreover, the feature matching algorithm adopts a K nearest neighbor classification algorithm, and the matching error is large. For another example, CN103516995A discloses a method adopted by "a real-time panoramic video stitching method and apparatus based on ORB features" is as follows: starting to collect multi-path synchronous video data; performing feature point extraction on each image at the same moment by adopting an ORB feature extraction algorithm and calculating an ORB feature vector of each feature point; solving a homography matrix between corresponding frames of the synchronous video by adopting a nearest neighbor matching method and a RANSAC (random sample consensus) matching algorithm; splicing video frame scenes according to the homography matrix; and finally, outputting the spliced video. However, the comparison files are all extracted and matched by using a traditional image feature point extraction and matching method, such as sift or surf, and the extraction precision of the feature points is not high, the extraction time is too long, and real-time splicing cannot be realized.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a panoramic video stitching method and a panoramic video stitching system based on a road traffic scene, and solves the problems that the prior art is low in extraction precision, time-consuming and labor-consuming in searching and incapable of realizing real-time stitching.

In order to achieve the purpose, the invention adopts the following technical scheme:

a panoramic video splicing method based on a road traffic scene comprises the following steps:

s1, image acquisition: synchronously acquiring images of each camera in a test road section by a plurality of cameras, and requiring that the time stamps of the cameras are strictly consistent;

s2, image distortion correction and preprocessing: correcting and preprocessing a plurality of collected images by adopting a Thykes bright-dark channel defogging algorithm;

s3, extracting image feature points: extracting characteristic points of the image by adopting a deep learning network super; extracting characteristic points of an overlapping area between cameras, including angular points and points with obvious gradient change, and generating a characteristic descriptor;

s4, image matching: performing feature matching on the images by using a feature matching algorithm supervalue based on a graph convolution neural network, wherein feature points and descriptors extracted by a superpoint network in the two images are input, and the output is a matching relation between image features;

s5, filtering the matched feature point pairs: filtering the matched feature point pairs by using a K nearest neighbor algorithm;

s6, acquiring a homography matrix of projection transformation, and completing the projection transformation: and circularly traversing the residual matching point pairs after filtering by using a Randac algorithm until the point pairs required by the projection between the images are obtained through calculationOptimization ofThe homography matrix is used for completing the splicing among the images;

s7, image fusion: and searching the optimal splicing line by adopting dynamic programming to perform image fusion.

Further, the overlapping area in S3 is divided into a static background and a dynamic foreground, and the background and foreground feature points are extracted respectively.

In S4, the image feature matching includes static matching and dynamic matching; the static matching comprises matching of static objects such as roads, traffic signs, trees and the like in a road scene; dynamic matching includes matching that refers to pedestrians, vehicles, etc. The static matching is basically equivalent to a background for an image, and the feature points of the background can be matched independently; the dynamic matching is equivalent to the foreground for the image, and the characteristic points of the foreground can be matched independently.

Further, the image fusion uses a fast color interpolation model to search an optimal splicing line by combining with an adjacent shortest distance method to carry out image fusion.

The invention also provides a panoramic video stitching system based on the road traffic scene, which comprises a camera and a processor, wherein the processor executes the panoramic video stitching method.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention realizes the extraction of the characteristic points and the matching of the characteristic points through a deep learning network, when the characteristic points of the image are extracted, the characteristic points of a dynamic foreground and a static background are respectively extracted, the relative position relation between different cameras, namely a homography matrix, is dynamically obtained by extracting the characteristic points of the background and the foreground in an overlapping area between different cameras and matching the characteristic points, and the real-time fusion and splicing of the images between the cameras in a film area road traffic system is realized.

2. The invention solves the problems that in a traditional single-camera monitoring system, relevant personnel need to manually call out corresponding multiple camera images and compare and combine the camera images one by one to obtain useful information, which is troublesome and labor-consuming, and simultaneously perfectly solves the problem of target cross-camera tracking.

3. According to the invention, the camera images in the overlapped area of the same area are subjected to panoramic stitching and fusion according to the relative position relationship, and then are directly projected to the corresponding large monitoring screen, so that the traffic condition of the whole area can be monitored more intuitively and clearly.

Drawings

FIG. 1 is a flow chart of a panoramic video stitching method based on a road traffic scene according to the present invention;

FIG. 2 is an example of an original image acquired using a wide-angle camera;

FIG. 3 is an example of a graph of the effect of an image after rectification;

FIG. 4 is an example of an image stitching effect graph;

fig. 5 is an image fusion effect illustration.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the following detailed description of the present invention is made with reference to the accompanying drawings, but the embodiments of the present invention are not limited thereto.

Referring to fig. 1, a flowchart of a panoramic video stitching method based on a road traffic scene is shown.

Firstly, image acquisition:

the method has the advantages that the multiple cameras are synchronized to simultaneously acquire images of each camera in a test road section, timestamps of each camera are required to be strictly consistent, each camera is subjected to image acquisition by adopting an independent process in the actual test process, the time difference between the fastest process and the slowest process is 0.5ms basically, and 25 frames per second synchronous acquisition can be met. Referring to fig. 2, an example of acquiring a cross-camera image using a Haikang Wide-Angle Camera is shown.

Secondly, correcting and preprocessing image distortion:

(2a) because the road camera mostly adopts the wide-angle camera, the image has serious radial distortion, which is shown as barrel-shaped distortion (as shown in figure 2), the original images of the three cameras are selected to cause image characteristic distortion, so that the camera parameters need to be corrected to eliminate distortion influence, and the processor adopts a Thank bright-dark channel defogging algorithm to correct and preprocess a plurality of collected images. An example of the effect of the rectified image is shown in fig. 3.

(2b) And preprocessing the image by adopting a Thykes bright-dark channel defogging algorithm. The camera is in open air environment for a long time, and the camera lens has the dust, and rainy day and fog day etc. all can cause the image unclear, need carry out corresponding preliminary treatment, promote image quality.

Thirdly, extracting image characteristic points:

the method adopts the deep learning network superpoint to extract the characteristic points of the image. The output of SuperPoint is not only the feature points but also descriptors of the feature points. And extracting characteristic points of an overlapping area between the cameras, including the angular points and the points with obvious gradient change, and generating a characteristic descriptor. Because the image information outside the overlapping region easily causes the mismatching of the feature points, the feature points can be extracted only in the overlapping region, so that the extraction speed of the feature points can be increased, and the matching accuracy can be improved. The overlapped area can be divided into a static background and a dynamic foreground, and characteristic points can be respectively extracted from the background and the foreground.

Fourthly, image matching:

the image matching includes static matching and dynamic matching, the static matching mainly refers to matching of static objects such as roads, traffic signs and trees in a road scene, and the dynamic matching refers to matching of pedestrians and vehicles. And respectively carrying out feature matching on the static and dynamic objects of the image according to the extracted feature descriptors. The invention adopts a character matching algorithm superglue based on a graph convolution neural network to carry out character matching on images, wherein the input is character points and descriptors extracted by the superpoint network in two images, and the output is the matching relation between image characters.

(4a) The static matching is basically equivalent to the background for the image, and the characteristic points of the background can be matched independently;

(4b) the dynamic matching is equivalent to the foreground for the image, and the characteristic points of the foreground can be matched independently.

Fifthly, matching characteristic point pairs and filtering:

the feature points extracted in the feature fuzzy region may become correctly matched interference points, resulting in mismatch, so that the matched feature points need to be filtered according to a certain rule. And filtering the matched characteristic point pairs by using a K nearest neighbor algorithm.

And sixthly, acquiring a homography matrix of projection transformation to finish the projection transformation:

and circularly traversing the residual matching point pairs after filtering by using a Randac algorithm until an optimal homography matrix required by projection between the images is obtained through calculation, and completing splicing between the images. Fig. 4 is an example of an image stitching effect map.

Seventhly, fusing images:

because of reasons such as installation angle between the different cameras for the colour and the luminance that the image demonstrates are not unified, cause the concatenation department very hard, and the effect is not good, still need treat the image of concatenation under this kind of condition and carry out image fusion operation. The invention adopts dynamic programming, namely, a fast color interpolation model is combined with an adjacent shortest distance method to search the optimal splicing line for image fusion. Fig. 5 is an image fusion effect illustration. The fast color interpolation model is mainly used for smoothing color difference values at the joint of two images by using the color gradient difference of the images in two adjacent cameras. The smoothing rule is to use the adjacent shortest distance method, that is, the color value of each pixel is gaussian filtered according to the color value of its eight neighborhoods, so as to achieve the purpose of smoothing color difference.

Therefore, the method has high extraction precision, realizes real-time panoramic stitching and fusion, and then directly projects the panoramic stitching and fusion to the corresponding large monitoring screen, so that the condition of the whole area can be monitored more intuitively and clearly. The innovation of the method is that the method for extracting the characteristic points adopts superpoint and the characteristic matching method adopts superslue; when the characteristics are extracted, the characteristics of the dynamic foreground and the static background are respectively extracted, so that the extracted characteristics have higher precision, and the similar points in the foreground information with wrong background information cannot be subjected to wrong matching; and the problem that the characteristic point extraction time of the traditional sift or surf is too long, so that real-time splicing cannot be realized is solved. Because the static background ratio in each frame of image is very high, but the change rate of the front frame and the rear frame is very low, the sift method is that each frame can extract the information of the whole image, which wastes calculation time, while the superpoint stores the background information of the image of the previous frame, each subsequent frame only needs to be differenced with the previous frame, and then only the feature extraction is carried out on the changed differential image, and only the feature extraction is carried out on the dynamic foreground, so that the time cost required by the feature extraction is greatly reduced.

Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the technical solutions, and those skilled in the art should understand that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all that should be covered by the claims of the present invention.

Claims

1. A panoramic video stitching method based on a road traffic scene is characterized by comprising the following steps:

s1, image acquisition: synchronously acquiring images of each camera in a test road section by a plurality of cameras, and requiring that the time stamps of each camera are strictly consistent;

2. The method for stitching the panoramic video based on the road traffic scene as claimed in claim 1, wherein in the step S3, the overlapped area is divided into a static background and a dynamic foreground, and the background and foreground feature points are extracted respectively.

3. The method for stitching the panoramic video based on the road traffic scene as claimed in claim 1, wherein in the step S4, the image feature matching comprises static matching and dynamic matching; the static matching comprises matching of static objects such as roads, traffic signs, trees and the like in a road scene; dynamic matching includes matching that refers to pedestrians, vehicles, etc.

4. The method for stitching the panoramic video based on the road traffic scene as claimed in claim 1, wherein the static matching is substantially equivalent to the background for the image, and the feature points of the background can be matched separately; the dynamic matching is equivalent to the foreground for the image, and the characteristic points of the foreground can be matched independently.

5. The panoramic video stitching method based on the road traffic scene as claimed in claim 1, wherein the image fusion uses a fast color interpolation model in combination with a neighboring shortest distance method to find an optimal stitching line for image fusion.

6. A panoramic video stitching system based on a road traffic scene is characterized by comprising a camera and a processor, wherein the processor executes the panoramic video stitching method of any one of claims 1 to 5.