CN114022562A - Panoramic video stitching method and device capable of keeping integrity of pedestrians - Google Patents

Panoramic video stitching method and device capable of keeping integrity of pedestrians Download PDF

Info

Publication number
CN114022562A
CN114022562A CN202111238422.9A CN202111238422A CN114022562A CN 114022562 A CN114022562 A CN 114022562A CN 202111238422 A CN202111238422 A CN 202111238422A CN 114022562 A CN114022562 A CN 114022562A
Authority
CN
China
Prior art keywords
camera
image
cameras
panoramic
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111238422.9A
Other languages
Chinese (zh)
Inventor
张�林
郭超政
朱安琪
沈莹
赵生捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202111238422.9A priority Critical patent/CN114022562A/en
Publication of CN114022562A publication Critical patent/CN114022562A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)

Abstract

The invention relates to a panoramic video stitching method and a panoramic video stitching device for keeping integrity of pedestrians, wherein the method comprises the following steps: collecting a plurality of paths of videos by adopting a structured panoramic camera array; jointly calibrating the pose of the camera based on a light beam adjustment method; geometrically aligning video images collected by different cameras, and mapping the video images onto a uniform cylindrical standard curved surface; performing luminosity alignment on images shot by different cameras by using a two-step least square method to eliminate brightness difference; locating a pedestrian target in the image based on semantic segmentation to obtain a semantic mask; based on the semantic mask, a suture line cost function of the video image is constructed, the optimal suture line is solved by using dynamic programming, and the overlapped parts of the aligned images are fused. Compared with the prior art, the panoramic stitching result obtained by the method has better visual consistency, the practicability of the system in the monitoring field is greatly enhanced by considering semantic information, and meanwhile, the method has good calculation efficiency and can realize real-time processing of video images.

Description

Panoramic video stitching method and device capable of keeping integrity of pedestrians
Technical Field
The invention relates to the technical field of video splicing, in particular to a panoramic video splicing method and device capable of keeping integrity of pedestrians.
Background
The panoramic mosaic system is an indispensable module in monitoring or space exploration, and a horizontal view covering all surrounding visual angles can be obtained by using the structured camera array and the panoramic mosaic system, so that a viewer can know the surrounding environment immediately. With the rise of the fields of video conferences, remote education, robot navigation and the like, a single camera cannot record all targets under a large scene due to a small visual field, the expressed scene area information is limited, a high-definition wide-angle camera cannot be popularized due to high price, the requirement of each field on the large visual field is met due to the appearance of a panoramic splicing technology, and the panoramic splicing technology has important practical value. At present, the panoramic image splicing technology is widely applied to the fields of security monitoring, military operation, virtual reality technology, remote sensing image processing, automobile driving assistance and the like.
In order to provide a natural panoramic view for a viewer, the panoramic stitching system needs to align and stitch images with different viewing angles, and simultaneously needs to realize smooth transition of a plurality of images at seams, so that the viewer cannot perceive stitching traces among the images. In addition, the method is different from common image panoramic stitching, and the video stitching has higher requirements on algorithm real-time performance and robustness.
The panoramic video image is formed by splicing a group of pictures which are obtained by shooting the camera around the camera, and more comprehensive environmental information around the camera can be obtained. Due to the rotation motion of the camera, the acquired image is a two-dimensional projection of an entity scene under different camera coordinate systems, severe image distortion can be generated by directly splicing the image, and visual consistency cannot be met, so that the image to be spliced needs to be mapped to a standard plane for image splicing. The panoramic image can be divided into a spherical panoramic image, a cubic panoramic image and a cylindrical panoramic image according to different mapping plane forms of the panoramic image. The cylindrical surface panoramic model has a 360-degree horizontal visual angle, the panoramic image using the model has uniform quality and high detail and reality, and can be directly subjected to subsequent processing by using a traditional image processing algorithm, so that the cylindrical surface panoramic model is widely applied.
The panoramic stitching mainly comprises two steps. The first step is to map a plurality of images to be spliced to a uniform coordinate system for alignment, so as to facilitate subsequent image splicing, wherein the step needs to estimate the transformation relationship of the plurality of images, and usually a method based on feature point matching or a method based on camera external reference calibration is adopted, and then the images are projected to the uniform coordinate system for alignment according to the transformation relationship. And the second step is to fuse the images after the projection alignment, realize the natural transition between the adjacent images, and finally obtain the panoramic image satisfying the visual system of the eyes. Due to the existence of parallax, foreground objects in adjacent images often cannot be completely aligned, and a great research space is still provided for how to intelligently process a dynamic foreground to ensure the consistency of the dynamic foreground in a panorama. In addition, a panoramic video stitching system in practical application also relates to the problems of luminosity alignment of a plurality of camera images, instantaneity of a video stitching algorithm, poor video quality under dark light and the like, and the existing panoramic stitching system is difficult to give consideration to.
Disclosure of Invention
The invention aims to provide a panoramic video stitching method and a panoramic video stitching device for keeping integrity of pedestrians, aiming at overcoming the defects of luminosity difference, image splitting and double images in the prior art.
The purpose of the invention can be realized by the following technical scheme:
a panoramic video stitching method for keeping integrity of pedestrians comprises the following steps:
s1: adopting a plurality of cameras to construct a structured panoramic camera array and collect multiple paths of videos;
s2: placing a calibration plate in the common visual area of the adjacent cameras to form a feature point matching pair, constructing the relative poses of the adjacent cameras and the pose of the forward looking camera in a loop, and further optimizing the poses by adopting a light beam adjustment method;
s3: based on the result of pose optimization, geometrically aligning video images acquired by different cameras, and mapping pixel coordinates onto a unified cylindrical standard curved surface;
s4: after geometric alignment, constructing a luminosity alignment equation of the video images of each camera and a mean value and variance alignment equation of the overlapping area of adjacent images, and solving by adopting a two-step least square method to eliminate the brightness difference between the adjacent images;
s5: positioning a pedestrian target in each camera video image based on semantic segmentation to obtain a semantic mask;
s6: constructing a suture line cost function based on the camera video images and the semantic masks after luminosity alignment, solving an optimal suture line by using dynamic programming, and fusing the overlapped parts of the camera video images after luminosity alignment.
Further, the structured panoramic camera array comprises four cameras, the four cameras face to the front, the rear, the left and the right directions respectively, the cameras are fixed through a support, and the horizontal viewing angle of each camera is within the range of 100 degrees and 200 degrees.
Further, the constructed relative poses of the neighboring cameras include a relative pose T between the forward-looking and left-looking camerasLFLeft view and back view camerasBLRelative pose T between rear view and right view camerasRBRelative pose T between right-view and forward-view camerasFRAnd pose T of forward looking cameraFF
The loop-back forward-looking camera pose T'FFThe calculation expression of (a) is:
T′FF=TFR·TRB·TBL·TLF·TFF
further, the pose optimization by using the light beam adjustment method specifically comprises:
constructing a pose loss function by taking the relative pose of adjacent cameras, the pose of the loop forward-looking camera and the feature point matching pairs as variables to be optimized, and solving based on graph optimization to obtain the relative pose between the adjacent cameras after optimization;
the calculation expression of the pose loss function is as follows:
Figure BDA0003318347010000031
in the formula uijIs the pixel coordinate of the jth feature point that can be observed by the ith camera, sijIs the depth of the feature point, KiIs an internal reference of the i-th camera,
Figure BDA0003318347010000032
is the lie algebra form of the ith camera pose, PijThe loss function implies the conversion of homogeneous coordinates to non-homogeneous coordinates.
Further, step S3 specifically includes the following steps:
s301: taking the central positions of four cameras in the panoramic camera array as central coordinates PcenterAnd obtaining the center coordinate by the coordinate mean value of the four cameras under the coordinate system of the front-view camera:
Figure BDA0003318347010000033
s302: calculating a transformation matrix T of the forward-looking camera with respect to the central coordinatesFWComprises the following steps:
Figure BDA0003318347010000034
s303: based on the relative pose between adjacent cameras, and a transformation matrix T of the forward looking camera with respect to the center coordinatesFWAnd obtaining pose transformation of each camera relative to the center coordinate:
TLW=TLFTFW
TBW=TBLTLFTFW
TRW=TRBTBLTLFTFW
in the formula, TLWFor a transformation matrix of the left-view camera with respect to the central coordinates, TBWFor transformation matrix of rear view camera with respect to central coordinate, TRWA transformation matrix for the right-view camera with respect to the center coordinates;
s304: with PcenterEstablishing a cylindrical projection curved surface with the radius r for the coordinate center, and defining a mapped z-axis scale factor hscaleAnd mapping the pixel coordinates onto a uniform cylindrical standard curved surface, wherein the computational expression of the mapping process is as follows:
Figure BDA0003318347010000041
Figure BDA0003318347010000042
Figure BDA0003318347010000043
in the formula, xwIs the x coordinate, y of the mapped pixel pointwFor mapped y coordinates, z, of pixel pointsWAnd (c) the z coordinate of the mapped pixel point, h is the height of the panoramic spliced image, w is the width of the panoramic spliced image, and (u, v) are the original pixel coordinates.
Further, the construction process of the photometric alignment equation of the video image of each camera includes:
aiming at the video images of the cameras, constructing a luminosity alignment equation of the video images of the cameras according to a luminosity adjustment model containing a deviation term, wherein the expression of the luminosity adjustment model is as follows:
I′=g·I+b
in the formula, I' is an image after luminosity adjustment, I is an image before luminosity adjustment, g is a gain, and b is a deviation term;
the expression of the mean and variance alignment equation of the overlapping areas of the adjacent images is as follows:
gi·Iij+bi=gj·Iji+bj
gi·σij=gj·σji
ij∈{FR,RB,BL,LF}
in the formula, σijIs the standard deviation, σ, of the pixel intensity values of image i in the region where image i and image j overlapjiThe standard deviation of the pixel intensity value of the image j in the overlapping area of the image i and the image j;
the solving by adopting the two-step least square method is specifically as follows:
firstly, the least square method is adopted to carry out the four equations gi·σij=gj·σjiSolving the formed equation set, and normalizing to obtain an approximate solution of the gain g of each image;
substituting the adjusted gain into equation gi·Iij+bi=gj·Iji+bjObtaining a deviation term equation:
Figure BDA0003318347010000044
carrying out SVD on the deviation term equation to obtain an approximate solution of the deviation term b adjusted by each image;
and adjusting the images according to the obtained gain g and the deviation term b of each image to obtain an image combination with consistent brightness of the overlapped area.
Further, in step S5, the neural network is used to segment the video images of the cameras to obtain a pedestrian target, and a semantic mask is obtained, in which the two pixel values are used to distinguish the human body from the background.
Further, in step S6, the expression of the suture cost function is:
Figure BDA0003318347010000051
Figure BDA0003318347010000052
Figure BDA0003318347010000053
in the formula, M (i, j) is a calculation result of the suture line cost function, Sem (i, j) represents a semantic cost of selecting the pixel (i, j) as a demarcation point, Spa (i, j, k) is a suture line path cost function starting from the previous line of pixels (i-1, k) to the current pixel (i, j), and λ is a parameter for adjusting the semantic cost and the path cost weight.
Furthermore, the value range of the parameter k is j-6 < k < j + 6;
in the dynamic programming solving process, before updating the suture line each time, the pixel difference between the adjacent image frames is calculated, if the pixel difference is smaller than a preset difference threshold value, the suture line is reserved, otherwise, a new suture line is calculated.
The invention also provides a panoramic video stitching device capable of maintaining the integrity of pedestrians, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the steps of the method.
Compared with the prior art, the invention has the following advantages:
(1) the image luminosity alignment and semantic detection achieve a natural panoramic video splicing effect, the practicability of the system in the monitoring field is greatly enhanced by considering semantic information, the system has higher vision consistency and luminosity alignment accuracy, meanwhile, the method is higher in efficiency, and real-time video splicing can be achieved.
(2) In the camera pose calibration process of the panoramic camera array around view, the pose is optimized by adopting a light beam adjustment method, and the following are found in experiments: the reprojection errors of the cameras at all the visual angles are reduced, and the average reprojection errors of all the cameras are reduced from 0.1881 to 0.1825, so that the effectiveness of the joint optimization of the camera pose in the invention is verified.
(3) In the process of aligning the luminosity of each video image, the invention takes the mean value and the variance alignment equation of the overlapping area of the adjacent images into account, and adopts a two-step least square method to solve, and the results are found in the experiment: the luminosity alignment model used by the method has smooth overall brightness conversion, no obvious suture line and better elimination of the obvious bright and dark boundary of the sky area, so that the image is fused more naturally near the suture line.
(4) In the process of obtaining the optimal suture line, the semantic mask of the pedestrian target is adopted, the suture line cost function which simultaneously takes semantic cost and path cost into consideration is constructed, and experiments show that the optimal suture line algorithm can ensure the integrity of the body of the pedestrian to the maximum extent so as to obtain a splicing result with better visual effect; the method can ensure the lowest proportion of the broken frames in most scenes, and the generated result is most in line with the observation habit of human vision; and the invention can splice panoramic video at the speed of 12-26 frames/second by selectively updating the suture line and based on the acceleration of zooming, the splicing speed depends on the number of objects in the shooting scene and the motion frequency thereof, and the efficiency is higher.
Drawings
Fig. 1 is a schematic main flow chart of a panoramic video stitching method for maintaining pedestrian integrity according to an embodiment of the present invention;
FIG. 2 is a schematic view of a panoramic camera system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a panoramic camera system and cylindrical coordinate mapping according to an embodiment of the present invention;
fig. 4 is a schematic diagram of estimating an initial pose of a panoramic camera system according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a comparison between the effects of a photometric alignment algorithm provided in an embodiment of the present invention;
FIG. 6 is a comparison of the effect of an optimal suture algorithm provided in the embodiments of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Example 1
The embodiment provides a panoramic video stitching method for keeping integrity of pedestrians, which aims to solve the problems of light difference, image splitting, ghosting and the like in a traditional panoramic video stitching system and provide a high-quality and high-vision-consistency panoramic stitched video, and the method comprises the following steps:
s1: adopting a plurality of cameras to construct a structured panoramic camera array and collect multiple paths of videos;
s2: placing a calibration plate in the common visual area of the adjacent cameras to form a feature point matching pair, constructing the relative poses of the adjacent cameras and the pose of the forward looking camera in a loop, and further optimizing the poses by adopting a light beam adjustment method;
s3: based on the result of pose optimization, geometrically aligning video images acquired by different cameras, and mapping pixel coordinates onto a unified cylindrical standard curved surface;
s4: after geometric alignment, constructing a luminosity alignment equation of the video images of each camera and a mean value and variance alignment equation of the overlapping area of adjacent images, and solving by adopting a two-step least square method to eliminate the brightness difference between the adjacent images;
s5: positioning a pedestrian target in each camera video image based on semantic segmentation to obtain a semantic mask;
s6: constructing a suture line cost function based on the camera video images and the semantic masks after luminosity alignment, solving an optimal suture line by using dynamic programming, and fusing the overlapped parts of the camera video images after luminosity alignment.
The steps are described in detail below.
One, use the panoramic camera array system of the structural all around to gather the video of four routes
The structured panoramic camera array designed by the method is used for collecting videos, and the schematic diagram of a camera system is shown in figure 2. The camera system is composed of four fisheye cameras, the four cameras shoot in four directions of front, back, left and right respectively, the position of each camera is fixed through a support structure, and the pose of each adjacent camera can be conveniently obtained through camera external parameter calibration.
There are two ways of using the surround view camera system: the floor type support structure is suitable for outdoor scenes and the like, and the surrounding environment can be shot by placing the surround camera system on the ground; the desktop type structure is suitable for indoor scenes and the like, is formed by taking down and modifying the top camera device of the floor type support, and can shoot by placing the ring-view camera system on platforms such as a desktop. Two different modes of use greatly facilitate data acquisition in different scenarios.
The chessboard calibration plate can be used for calibrating the internal reference of the fisheye camera, and the method comprises the following specific operations: the fisheye camera to be calibrated is used for shooting the picture of the checkerboard calibration plate, so that the checkerboard can cover all areas of the fisheye image at various angles, and the accuracy of internal reference estimation can be guaranteed. After shooting about dozens of fisheye checkerboard images by each camera, estimating an internal reference matrix and distortion parameters of the camera by using an internal reference calibration function of the fisheye camera for subsequent external reference calibration.
Secondly, jointly calibrating poses of a plurality of cameras based on a light beam adjustment method
S201: placing a calibration board in a common-view area of adjacent cameras, imaging feature points on the calibration board on the two cameras to form a group of feature Point matching pairs, and solving the relative poses of the adjacent cameras by utilizing PnP (Passive-n-Point), wherein the relative poses of a front view camera, a left view camera, a rear view camera, a right view camera and a front view camera are respectively marked as TLF,TBL,TRB.TFRThe pose of the forward-looking camera is marked as TFF
S202: according to the loop structures of four cameras in the loop-around camera system, the relative pose transformation relation of adjacent cameras is utilized to obtain the pose T 'of the loop-around forward-looking camera in an estimated state'FF
T′FF=TFR·TRB·TBL·TLF·TFF
S203: with five camera poses: the forward looking camera pose, the left looking camera pose, the rear looking camera pose, the right looking camera pose, the loop forward looking camera pose and three-dimensional point coordinates of all checkerboard feature points are used as variables to be optimized, and the loop forward looking pose T 'is minimized through a beam Adjustment method'FFAnd the original forward-looking pose TFFThe corresponding loss function is:
Figure BDA0003318347010000081
in the formula uijIs the pixel coordinate of the jth feature point that can be observed by the ith camera, sijIs the depth of the feature point, KiIs an internal reference of the i-th camera,
Figure BDA0003318347010000082
is the lie algebra form of the ith camera pose, PijThe loss function implies the conversion of homogeneous coordinates to non-homogeneous coordinates.
S204: for the above nonlinear optimization problem, g2o is used to solve based on graph optimization, resulting in the relative poses between the four cameras in the camera system.
Thirdly, geometrically aligning images shot by different cameras, and mapping pixels to a uniform cylindrical standard curved surface
S301: taking the central positions of four cameras in the panoramic camera array as central coordinates PcenterAnd obtaining the center coordinate by the coordinate mean value of the four cameras under the coordinate system of the front-view camera:
Figure BDA0003318347010000083
s302: calculating a transformation matrix T of the forward-looking camera with respect to the central coordinatesFWComprises the following steps:
Figure BDA0003318347010000084
s303: based on relative pose T between adjacent camerasLF,TBL,TRB.TFRAnd a transformation matrix T of the forward-looking camera with respect to the central coordinatesFWObtaining the coordinates P of the front, the back, the left and the right cameras relative to the centercenterThe pose of (c) is transformed into:
TLW=TLFTFW
TBW=TBLTLFTFW
TRW=TRBTBLTLFTFW
in the formula, TLWFor a transformation matrix of the left-view camera with respect to the central coordinates, TBWFor transformation matrix of rear view camera with respect to central coordinate, TRWA transformation matrix for the right-view camera with respect to the center coordinates;
s304: with PcenterEstablishing a cylindrical projection curved surface with the radius r for the coordinate center, and defining a mapped z-axis scale factor hscaleAnd mapping the pixel coordinates onto a uniform cylindrical standard curved surface, wherein the computational expression of the mapping process is as follows:
Figure BDA0003318347010000091
Figure BDA0003318347010000092
Figure BDA0003318347010000093
in the formula, xwIs the x coordinate, y of the mapped pixel pointwFor mapped y coordinates, z, of pixel pointsWAnd (c) the z coordinate of the mapped pixel point, h is the height of the panoramic spliced image, w is the width of the panoramic spliced image, and (u, v) are the original pixel coordinates.
Fourthly, performing luminosity alignment on images shot by different cameras by using a two-step least square method to eliminate brightness difference between adjacent images
A photometric adjustment model containing a bias term is established for the input images of the four cameras. Based on an image adjustment model based on gain g, introducing a deviation term b, wherein the corresponding mathematical expression form is as follows:
I′=g·I+b
in the formula, I' is an image after luminosity adjustment, I is an image before luminosity adjustment, g is a gain, and b is a deviation term;
and constructing a luminosity alignment equation of the four images based on the luminosity adjustment model, and solving an adjustment parameter which enables the luminosity consistency to be optimal. Variances of overlapping portions of adjacent images are calculated and aligned according to the adjustment model, and variance alignment is used as an additional constraint for photometric adjustment. m isijRepresenting the mean intensity value, σ, of the pixels of image i in the region where image i and image j overlapijRepresenting the standard deviation of the pixel intensity values of image i in that region. m isjiAnd σjiRepresenting the mean intensity value and standard deviation of the image j in that region. The mean and variance alignment equations for the overlapping regions of adjacent images are:
gi·Iij+bi=gj·Iji+bj
gi·σij=gj·σji
ij∈{FR,RB,BL,LF}
in the formula, σijIs the standard deviation, σ, of the pixel intensity values of image i in the region where image i and image j overlapjiThe standard deviation of the pixel intensity value of the image j in the overlapping area of the image i and the image j;
the above photometric alignment equation is solved using a two-step least squares. The above equation set contains 8 equations in total, and firstly, four equations only contain unknown giThe system of equations is solved, a non-zero approximate solution is solved by using a least square method, and normalization is carried out by dividing the solution by a mean value to obtain the adjustment gain of each image.
Substituting the solution result of the previous step into an equation set, and arranging to obtain:
Figure BDA0003318347010000094
the above equation does not have an exact solution since the left-hand coefficients are not full rank matrices. Through SVD decomposition, the adjusted deviation term b of each image can be obtainediThe approximate solution of (c).
And correspondingly adjusting the four projection images after solving the image intensity adjustment model parameters to obtain the projection result with consistent brightness and color of the overlapped area.
Fifthly, obtaining a semantic mask based on the pedestrian target in the semantic segmentation positioning image
And performing example segmentation on the input image by using Mask R-CNN, and obtaining a semantic segmentation Mask Sem (i, j) by using human body classification in a segmentation result, namely that the value of the pixel (i, j) belonging to the human body is 1, otherwise, the value is 0. And (5) solving the optimal suture line in the step (six) by using the semantic mask to realize the visual consistency of image splicing.
Sixthly, combining the semantic mask of the step (five), using a dynamic programming to minimize a suture cost function, searching an optimal suture, and fusing the overlapped parts of the aligned images
S601: defining a suture line path cost function starting from the pixel (i-1, k) in the previous row to the current pixel (i, j), representing the difference of pixel values of two images of pixel positions at two sides where the suture line passes through, and the mathematical expression is as follows:
Figure BDA0003318347010000101
Figure BDA0003318347010000102
preferably, j-6 is more than k and less than j +6 is taken in the expression, so that the search efficiency is ensured;
s602: defining a suture cost function:
Figure BDA0003318347010000103
in the formula, M (i, j) is a calculation result of the suture line cost function, Sem (i, j) represents a semantic cost of selecting the pixel (i, j) as a demarcation point, Spa (i, j, k) is a suture line path cost function starting from the previous line of pixels (i-1, k) to the current pixel (i, j), and λ is a parameter for adjusting the semantic cost and the path cost weight.
Preferably, before updating the stitching line, pixel differences between adjacent image frames are calculated, if the differences are smaller than a certain threshold, the current stitching line is retained, otherwise, a new stitching line is calculated;
s603: defining a state transfer function according to the suture cost function, solving an optimal suture by using dynamic programming, and minimizing the suture cost function;
s604: and fusing the overlapping areas of the images by using the optimal suture line to obtain a spliced panoramic image.
Seventhly, the beneficial effects of the invention are explained by combining with specific experiments:
experimental conditions and scoring criteria:
the data set used in the experiment contains pedestrian videos in 5 scenes, including two indoor scenes and three outdoor scenes, the indoor scenes include the condition of walking of multiple persons, and at most, four persons exist simultaneously. In outdoor scenes, up to ten pedestrians are present in the video at the same time. The samples of each scene contain 4 directions of fisheye video, 200 frames per direction of video, 4000(5 × 4 × 200) images in total, and the image resolution is 1920 × 1080.
In a calibration experiment of a panoramic camera system, a re-projection error is used for evaluating the calibration performance of a structured camera array, wherein the re-projection error is a difference value between an observed pixel position and a projected two-dimensional position of a corresponding three-dimensional point; in the luminosity alignment experiment, the luminosity alignment performance is evaluated through the intensity difference between the overlapped areas of the adjacent images, and if the brightness level and the color of the adjacent images are well adjusted and unified, the intensity difference of the overlapped areas is as small as possible; in the optimal suture line algorithm experiment, the image splicing effect is evaluated by counting the frames with the fracture phenomenon in the panoramic splicing result, 4 volunteers are invited in the experiment, and the fracture frames of the splicing result of each method are counted. In order to avoid the subjective preference of volunteers, all the frames of the splicing result are disordered, and the performance of the method is better when the proportion of the broken frames in the video is lower after all the frames are counted.
Experiment 1
Performing a calibration algorithm experiment on a panoramic camera system; table 1 shows the experimental results of joint pose optimization, including the re-projection errors of the panoramic camera system before and after joint optimization. It can be seen from the table that the reprojection errors of the cameras from all the viewing angles are reduced, and the average reprojection error of all the cameras is reduced from 0.1881 to 0.1825, which verifies the effectiveness of the joint optimization in the method.
TABLE 1 calibrating reprojection error for panoramic camera system
Figure BDA0003318347010000111
Experiment 2
Performing a luminosity alignment algorithm experiment; in order to prove the effectiveness of the robust luminosity adjustment model in the method, the experiment compares the result of the method with a reference model: the document "Automatic laboratory image filing using innovative defects" (M.Brown and D.G.Lowe, International Journal of Computer Vision, vol.74, No.1, pp.59-73,2007.). And a scheme based on histogram matching in the document of "microscopic video recording of dual camera based on spatial-temporal section optimization" (Q.Liu, X.Su, L.Zhang, and H.Huang, Multimedia Tools and Applications, vol.79, No.5, pp.3107-3124,2020) is expanded to four images, and the effect is compared with the method.
Fig. 5 shows the results of photometric alignment by different methods, (a) as a result of non-photometric alignment, it can be observed that the suture line is very distinct, and the overall brightness distribution is not uniform, unlike the image photographed under natural conditions. (b) As a result of the Brown et al method, although the difference in brightness between adjacent images is reduced, the existence of the stitch line is still perceived in the sky area. (c) As a result of the method of Liu et al, the histogram matching based method directly results in severe distortion of the ground area in the image and there are also sharp bright and dark boundaries on the columns on the right side of the image. (d) The luminosity alignment model used by the method has smooth overall brightness conversion, no obvious suture line and better elimination of the obvious bright and dark boundary of the sky area, so that the image is fused more naturally near the suture line.
Experiment 3
An optimal suture algorithm experiment; the experiment was compared to the four most advanced optimal suture algorithms, GraphCut, DP, success and Iterative, and the results of autostart without suture algorithm were used as a control. In order to ensure the fairness of the experiment, the experiment firstly adopts the same cylinder projection and luminosity alignment preprocessing flow, and then adopts different optimal suture line algorithms in the image fusion stage.
Figure 6 shows the qualitative comparison results of the different stitch line algorithms on the two sets of test images. In the fusion results of autostart without the use of the suture algorithm, a significant ghosting phenomenon was observed. The results of GraphCut and Perception showed dislocation and loss of the pedestrian's body on both sets of test images. DP and Iterative also have the phenomenon of stitching misplacement on a set of test images. These facial and body splice misalignments and deletions are visually apparent to and unacceptable for human perception. In contrast, the optimal suture line algorithm for the pedestrian, which is provided by the method, can ensure the integrity of the body of the pedestrian to the greatest extent so as to obtain a splicing result with a better visual effect.
Table 2 shows the results of the quantitative experiments with the optimal suture algorithm, and lists the percentage of broken frames of pedestrians and objects in the stitching results of each group of videos by each method. It can be seen that the method can ensure the lowest ratio of the broken frames in most scenes, and the generated result is most consistent with the observation habit of human vision. In contrast, considering only the stitching result of DP of pixel cost, the test result of the indoor video-1 in which the proportion of broken frames is very high, especially very many pedestrians, is not considered because semantic information is not considered. The broken frame ratio of the method is the lowest in the average performance of all videos, which shows that the performance of the method is superior to that of the comparison method and the method has obvious advantages in the treatment of pedestrians.
TABLE 2 optimal suture algorithm quantitative experimental results
Figure BDA0003318347010000121
Experiment 4
An optimal suture algorithm time performance experiment; the average stitching time per frame using the different optimal stitch line algorithms is reported in table 3, where the resolution 1 is 500 × 1200, and the resolutions 0.5 × and 0.25 × are scaled by the length and width.
TABLE 3 optimal suture algorithm time Performance
Figure BDA0003318347010000131
Experimental results show that the time cost of variant performance and Iterative of GC is extremely high due to a series of additional time-consuming steps such as image saliency prediction and multi-round Iterative optimization. The method can splice the panoramic video at the speed of 12-26 frames/second by selectively updating the suture line and based on the acceleration of scaling, and the splicing speed depends on the number of objects in the shot scene and the motion frequency of the objects.
The embodiment also provides a panoramic video stitching device for maintaining the integrity of pedestrians, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the steps of the panoramic video stitching method for maintaining the integrity of pedestrians.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A panoramic video stitching method for keeping integrity of pedestrians is characterized by comprising the following steps:
s1: adopting a plurality of cameras to construct a structured panoramic camera array and collect multiple paths of videos;
s2: placing a calibration plate in the common visual area of the adjacent cameras to form a feature point matching pair, constructing the relative poses of the adjacent cameras and the pose of the forward looking camera in a loop, and further optimizing the poses by adopting a light beam adjustment method;
s3: based on the result of pose optimization, geometrically aligning video images acquired by different cameras, and mapping pixel coordinates onto a unified cylindrical standard curved surface;
s4: after geometric alignment, constructing a luminosity alignment equation of the video images of each camera and a mean value and variance alignment equation of the overlapping area of adjacent images, and solving by adopting a two-step least square method to eliminate the brightness difference between the adjacent images;
s5: positioning a pedestrian target in each camera video image based on semantic segmentation to obtain a semantic mask;
s6: constructing a suture line cost function based on the camera video images and the semantic masks after luminosity alignment, solving an optimal suture line by using dynamic programming, and fusing the overlapped parts of the camera video images after luminosity alignment.
2. The method as claimed in claim 1, wherein the panoramic video stitching method for maintaining the integrity of pedestrians is characterized in that the structured panoramic camera array comprises four cameras, the four cameras face to the front, the back, the left and the right respectively, each camera is fixed by a bracket, and the horizontal viewing angle of each camera is within the range of 100 degrees and 200 degrees.
3. The method of claim 2, wherein the constructed relative poses of the adjacent cameras comprise a relative pose T between a forward looking camera and a left looking cameraLFLeft view and back view camerasBLRelative pose T between rear view and right view camerasRBRelative pose T between right-view and forward-view camerasFRAnd pose T of forward looking cameraFF
The loop-back forward-looking camera pose T'FFThe calculation expression of (a) is:
T′FF=TFR·TRB·TBL·TLF·TFF
4. the panoramic video stitching method for maintaining the integrity of pedestrians according to claim 3, wherein the pose optimization by using the beam adjustment method specifically comprises:
constructing a pose loss function by taking the relative pose of adjacent cameras, the pose of the loop forward-looking camera and the feature point matching pairs as variables to be optimized, and solving based on graph optimization to obtain the relative pose between the adjacent cameras after optimization;
the calculation expression of the pose loss function is as follows:
Figure FDA0003318347000000021
in the formula uijIs the pixel coordinate of the jth feature point that can be observed by the ith camera, sijIs the depth of the feature point, KiIs an internal reference of the i-th camera,
Figure FDA0003318347000000022
is the lie algebra form of the ith camera pose, PijThe loss function implies the conversion of homogeneous coordinates to non-homogeneous coordinates.
5. The panoramic video stitching method for maintaining the integrity of pedestrians according to claim 3, wherein the step S3 specifically comprises the following steps:
s301: taking the central positions of four cameras in the panoramic camera array as central coordinates PcenterAnd obtaining the center coordinate by the coordinate mean value of the four cameras under the coordinate system of the front-view camera:
Figure FDA0003318347000000023
s302: calculating a transformation matrix T of the forward-looking camera with respect to the central coordinatesFWComprises the following steps:
Figure FDA0003318347000000024
s303: based on the relative pose between adjacent cameras, and a transformation matrix T of the forward looking camera with respect to the center coordinatesFWAnd obtaining pose transformation of each camera relative to the center coordinate:
TLW=TLFTFW
TBW=TBLTLFTFW
TRW=TRBTBLTLFTFW
in the formula, TLWFor a transformation matrix of the left-view camera with respect to the central coordinates, TBWFor transformation matrix of rear view camera with respect to central coordinate, TRWA transformation matrix for the right-view camera with respect to the center coordinates;
s304: with PcenterEstablishing a cylindrical projection curved surface with the radius r for the coordinate center, and defining a mapped z-axis scale factor hscaleAnd mapping the pixel coordinates onto a uniform cylindrical standard curved surface, wherein the computational expression of the mapping process is as follows:
Figure FDA0003318347000000025
Figure FDA0003318347000000026
Figure FDA0003318347000000027
in the formula, xwIs the x coordinate, y of the mapped pixel pointwFor mapped y coordinates, z, of pixel pointsWAnd (c) the z coordinate of the mapped pixel point, h is the height of the panoramic spliced image, w is the width of the panoramic spliced image, and (u, v) are the original pixel coordinates.
6. The panoramic video stitching method for maintaining the integrity of pedestrians according to claim 3, wherein the construction process of the photometric alignment equation of the video images of each camera comprises:
aiming at the video images of the cameras, constructing a luminosity alignment equation of the video images of the cameras according to a luminosity adjustment model containing a deviation term, wherein the expression of the luminosity adjustment model is as follows:
I′=g·I+b
in the formula, I' is an image after luminosity adjustment, I is an image before luminosity adjustment, g is a gain, and b is a deviation term;
the expression of the mean and variance alignment equation of the overlapping areas of the adjacent images is as follows:
gi·Iij+bi=gj·Iji+bj
gi·σij=gj·σji
ij∈{FR,RB,BL,LF}
in the formula, σijIs the standard deviation, σ, of the pixel intensity values of image i in the region where image i and image j overlapjiThe standard deviation of the pixel intensity value of the image j in the overlapping area of the image i and the image j;
the solving by adopting the two-step least square method is specifically as follows:
firstly, the least square method is adopted to carry out the four equations gi·σij=gj·σjiSolving the formed equation set, and normalizing to obtain an approximate solution of the gain g of each image;
substituting the adjusted gain into equation gi·Iij+bi=gj·Iji+bjObtaining a deviation term equation:
Figure FDA0003318347000000031
carrying out SVD on the deviation term equation to obtain an approximate solution of the deviation term b adjusted by each image;
and adjusting the images according to the obtained gain g and the deviation term b of each image to obtain an image combination with consistent brightness of the overlapped area.
7. The panoramic video stitching method for maintaining the integrity of the pedestrians as claimed in claim 1, wherein in step S5, the neural network is used to segment the video images of the cameras to obtain the pedestrian target, so as to obtain the semantic mask, and in the semantic mask, two pixel values are used to distinguish the human body and the background respectively.
8. The panoramic video stitching method for maintaining the integrity of pedestrians according to claim 1, wherein in step S6, the expression of the stitching line cost function is:
Figure FDA0003318347000000032
Figure FDA0003318347000000041
Figure FDA0003318347000000042
in the formula, M (i, j) is a calculation result of the suture line cost function, Sem (i, j) represents a semantic cost of selecting the pixel (i, j) as a demarcation point, Spa (i, j, k) is a suture line path cost function starting from the previous line of pixels (i-1, k) to the current pixel (i, j), and λ is a parameter for adjusting the semantic cost and the path cost weight.
9. The method for stitching the panoramic video with the integrity of the pedestrian according to claim 8, wherein the value range of the parameter k is j-6 < k < j + 6;
in the dynamic programming solving process, before updating the suture line each time, the pixel difference between the adjacent image frames is calculated, if the pixel difference is smaller than a preset difference threshold value, the suture line is reserved, otherwise, a new suture line is calculated.
10. A panoramic video stitching apparatus for maintaining pedestrian integrity, comprising a memory and a processor, wherein the memory stores a computer program, and the processor calls the computer program to execute the steps of the method according to any one of claims 1 to 9.
CN202111238422.9A 2021-10-25 2021-10-25 Panoramic video stitching method and device capable of keeping integrity of pedestrians Pending CN114022562A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111238422.9A CN114022562A (en) 2021-10-25 2021-10-25 Panoramic video stitching method and device capable of keeping integrity of pedestrians

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111238422.9A CN114022562A (en) 2021-10-25 2021-10-25 Panoramic video stitching method and device capable of keeping integrity of pedestrians

Publications (1)

Publication Number Publication Date
CN114022562A true CN114022562A (en) 2022-02-08

Family

ID=80057258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111238422.9A Pending CN114022562A (en) 2021-10-25 2021-10-25 Panoramic video stitching method and device capable of keeping integrity of pedestrians

Country Status (1)

Country Link
CN (1) CN114022562A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485645A (en) * 2023-04-13 2023-07-25 北京百度网讯科技有限公司 Image stitching method, device, equipment and storage medium
CN116645496A (en) * 2023-05-23 2023-08-25 北京理工大学 Dynamic look-around splicing and stabilizing method for trailer based on grid deformation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485645A (en) * 2023-04-13 2023-07-25 北京百度网讯科技有限公司 Image stitching method, device, equipment and storage medium
CN116645496A (en) * 2023-05-23 2023-08-25 北京理工大学 Dynamic look-around splicing and stabilizing method for trailer based on grid deformation

Similar Documents

Publication Publication Date Title
US9811946B1 (en) High resolution (HR) panorama generation without ghosting artifacts using multiple HR images mapped to a low resolution 360-degree image
CN111062873B (en) Parallax image splicing and visualization method based on multiple pairs of binocular cameras
US10609282B2 (en) Wide-area image acquiring method and apparatus
US10257501B2 (en) Efficient canvas view generation from intermediate views
CN109151439B (en) Automatic tracking shooting system and method based on vision
CN111028155B (en) Parallax image splicing method based on multiple pairs of binocular cameras
CN106157304A (en) A kind of Panoramagram montage method based on multiple cameras and system
CN112085659B (en) Panorama splicing and fusing method and system based on dome camera and storage medium
CN114022562A (en) Panoramic video stitching method and device capable of keeping integrity of pedestrians
CN103177432B (en) A kind of by coded aperture camera acquisition panorama sketch method
CN111866523B (en) Panoramic video synthesis method and device, electronic equipment and computer storage medium
CN110689476A (en) Panoramic image splicing method and device, readable storage medium and electronic equipment
CN110717936A (en) Image stitching method based on camera attitude estimation
CN113160048A (en) Suture line guided image splicing method
CN110278366B (en) Panoramic image blurring method, terminal and computer readable storage medium
CN114926612A (en) Aerial panoramic image processing and immersive display system
CN111640065A (en) Image stitching method and imaging device based on camera array
CN108564654B (en) Picture entering mode of three-dimensional large scene
US20090059018A1 (en) Navigation assisted mosaic photography
Fu et al. Image stitching techniques applied to plane or 3-D models: a review
CN117853329A (en) Image stitching method and system based on multi-view fusion of track cameras
EP3229106A1 (en) Efficient determination of optical flow between images
EP3229470A1 (en) Efficient canvas view generation from intermediate views
CN108805804B (en) Method for processing panoramic picture in liquid crystal display television
CN118247142A (en) Multi-view splicing method and system applied to large-view-field monitoring scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination