CN112422848B

CN112422848B - Video stitching method based on depth map and color map

Info

Publication number: CN112422848B
Application number: CN202011288939.4A
Authority: CN
Inventors: 王丹华; 顾秋生
Original assignee: Shenzhen Gehua Intelligent Technology Co ltd
Current assignee: Shenzhen Gehua Intelligent Technology Co ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2024-03-29
Anticipated expiration: 2040-11-17
Also published as: CN112422848A

Abstract

The invention discloses a video stitching method based on a depth map and a color map, which comprises the following steps: preparing a plurality of groups of color cameras and depth cameras; extracting frames from the shot color video stream and depth video stream, and obtaining a color image and a depth image corresponding to a certain moment; calibrating by using a checkerboard method to obtain the pose of a plurality of groups of color cameras and depth cameras; obtaining three-dimensional coordinates of a relative depth camera coordinate system of a current shooting scene from the depth map, obtaining point cloud data from the depth map, and deleting point cloud data which are not in a depth range from point cloud data corresponding to pixels of the color map; performing image stitching according to the screened point cloud data; repeating the above operation to obtain the spliced video stream. The video stitching method based on the depth map and the color map has the advantages of high stitching precision and efficiency and the like.

Description

Video stitching method based on depth map and color map

Technical Field

The invention relates to the technical field of video stitching, in particular to a video stitching method based on a depth map and a color map.

Background

The existing video stitching method comprises the steps of firstly extracting pictures in videos, extracting feature points of each picture, carrying out feature matching on corresponding pictures in different videos according to the feature points, and carrying out picture external parameter settlement according to matching results or calculating homography matrixes to stitch adjacent pictures. The method has long processing time, and frequently has mismatching of pictures on complex scenes, so that the calculation of external parameters or homography matrixes is not correct, and the splicing effect is not correct. In addition, the picture contains most scenes such as sky and sea surfaces, which are sometimes not of interest to the user because the scenes are basically consistent, and often cause matching failure and inaccurate matching.

Disclosure of Invention

The invention provides a video stitching method based on a depth map and a color map, and aims to solve the problem of inaccurate matching of the existing video stitching technology.

According to an embodiment of the present application, there is provided a video stitching method based on a depth map and a color map, including the following steps:

preparing a plurality of groups of color cameras and depth cameras;

extracting frames from the shot color video stream and depth video stream, and obtaining a color image and a color image corresponding to a certain moment

A depth map;

calibrating by using a checkerboard method to obtain the pose of a plurality of groups of color cameras and depth cameras;

obtaining three-dimensional coordinates of a relative depth camera coordinate system of a current shooting scene from the depth map, obtaining point cloud data from the depth map, and deleting point cloud data which are not in a depth range from point cloud data corresponding to pixels of the color map;

performing image stitching according to the screened point cloud data;

repeating the above operation to obtain the spliced video stream.

Preferably, a color map M and a depth map D of a certain moment in a plurality of groups of color cameras and depth cameras are acquired;

sets of color map and depth map matching pairs (M1, D1), (M2, D2), …, (Mn, dn);

preferably, the method comprises the following steps:

setting a plane with the average depth vertical to the Z axis as a projection plane;

projecting the point cloud data obtained by the depth map onto the projection surface to obtain a plurality of two-dimensional points;

filling the two-dimensional points on the plane according to the pixel values obtained by projecting each three-dimensional point to the color map;

hole filling optimization is carried out on the color map of the projection surface;

and obtaining a complete spliced image frame.

Preferably, calibration is performed by a checkerboard method to obtain the pose of a plurality of groups of color cameras and depth cameras, and the method comprises the following steps:

multiple groups of color cameras and depth cameras observe the same Zhang Qipan grid or multiple chessboards;

taking one of the color cameras as a reference, and obtaining the positions of all the color cameras;

calculating the relative positions of a plurality of groups of color cameras;

the pose of the depth camera and the pose of the color camera on each device are fixed, so that the pose of all the depth cameras is obtained.

Preferably, the three-dimensional coordinates of the relative depth camera coordinate system of the current shooting scene are obtained from the depth map, and the method comprises the following steps:

converting the local three-dimensional coordinates into three-dimensional point clouds according to the calculated relative pose;

fusing all three-dimensional point cloud data;

processing the three-dimensional point cloud data, including denoising and reprojection error optimization;

deleting the point cloud data which are not in the depth range;

and obtaining information of the observation scene.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects: compared with the traditional scheme, the video splicing method based on the depth map and the color map only splices scenes in the depth range according to the appointed depth range. Therefore, the user can obtain the image of the wanted observation range, the interference of other scenes is eliminated, and the splicing precision and efficiency can be greatly improved. The method has great advantages in the aspect of rapid real-time splicing, and can be focused on certain scenes. The depth map has the advantages of accurate matching, high speed and convenient scene assignment.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a video stitching method based on a depth map and a color map according to the present invention;

FIG. 2 is a schematic flow chart of step S2 in a video stitching method based on a depth map and a color map according to the present invention;

FIG. 3 is a schematic flow chart of step S4 in a video stitching method based on a depth map and a color map according to the present invention;

FIG. 4 is a schematic flow chart of step S3 in a video stitching method based on a depth map and a color map according to the present invention;

fig. 5 is a schematic flow chart of step S4 in a video stitching method based on a depth map and a color map according to the present invention.

Description of the reference numerals:

10. a video stitching method based on depth map and color map.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Referring to fig. 1, the invention discloses a video stitching method 10 based on a depth map and a color map, comprising the following steps:

step S1: preparing a plurality of groups of color cameras and depth cameras;

step S2: extracting frames from the shot color video stream and depth video stream, and obtaining a color image and a depth image corresponding to a certain moment;

step S3: calibrating by using a checkerboard method to obtain the pose of a plurality of groups of color cameras and depth cameras;

step S4: obtaining three-dimensional coordinates of a relative depth camera coordinate system of a current shooting scene from the depth map, obtaining point cloud data from the depth map, and deleting point cloud data which are not in a depth range from point cloud data corresponding to pixels of the color map;

step S5: performing image stitching according to the screened point cloud data;

step S6: repeating the above operation to obtain the spliced video stream.

Compared with the traditional scheme, the video splicing method based on the depth map and the color map only splices scenes in the depth range according to the appointed depth range. Therefore, the user can obtain the image of the wanted observation range, the interference of other scenes is eliminated, and the splicing precision and efficiency can be greatly improved. The method has great advantages in the aspect of rapid real-time splicing, and can be focused on certain scenes. The depth map has the advantages of accurate matching, high speed and convenient scene assignment.

There are many devices that can capture depth images, such as microsoft's kinect and obbe light 3d cameras. The devices can conveniently collect the color map and the corresponding depth map at the same time. Since the color camera and the depth camera are fixed on the device, the relative pose of each color map and depth map is fixed.

Referring to fig. 2, the step S2 includes the following steps:

step S21: acquiring a color image M and a depth image D of a certain moment in a plurality of groups of color cameras and depth cameras;

step S22: sets of color map and depth map matching pairs (M1, D1), (M2, D2), …, (Mn, dn);

each time the 3D camera collects, a color map M and a depth map D are obtained simultaneously. The splicing requires multiple pictures at different positions, so that a picture of a scene with a larger range can be spliced. In practice, the color picture M1 … Mn is mainly used for splicing, and the purpose of the depth map is to assist in splicing, so that the splicing is more accurate, and meanwhile, the unnecessary range can be removed by using the depth information.

Referring to fig. 3, the step S4 includes the following steps:

step S41: setting a plane with the average depth vertical to the Z axis as a projection plane;

step S42: projecting the point cloud data obtained by the depth map onto the projection surface to obtain a plurality of two-dimensional points;

step S43: filling the two-dimensional points on the plane according to the pixel values obtained by projecting each three-dimensional point to the color map;

step S44: hole filling optimization is carried out on the color map of the projection surface;

step S45: and obtaining a complete spliced image frame.

Wherein, each point cloud data has a three-dimensional coordinate, and the general Z-axis coordinate can be regarded as a depth value. In this way, a depth range is set in advance, each point is traversed, and point clouds with Z-axis values not in the range are deleted.

Referring to fig. 4, the step S3 includes the following steps:

step S31: multiple groups of color cameras and depth cameras observe the same Zhang Qipan grid or multiple chessboards;

step S32: taking one of the color cameras as a reference, and obtaining the positions of all the color cameras;

step S33: calculating the relative positions of a plurality of groups of color cameras;

step S34: the pose of the depth camera and the pose of the color camera on each device are fixed, so that the pose of all the depth cameras is obtained.

For example: taking the position of the first 3d sensor as a reference, the position relation among other pre-calibrated 3d sensors is (R2, T2), … (Rn, tn), R is a rotation matrix, the dimension 3 x 3, T is a translation matrix, and the dimension 3*1 is that a three-dimensional point p on the ith sensor is converted into a coordinate of Rix p+Ti on a reference coordinate system.

Referring to fig. 5, the step S4 further includes the following steps:

step S46: converting the local three-dimensional coordinates into three-dimensional point clouds according to the calculated relative pose;

step S47: fusing all three-dimensional point cloud data;

step S48: processing the three-dimensional point cloud data, including denoising and reprojection error optimization;

step S49: deleting the point cloud data which are not in the depth range;

step S50: and obtaining information of the observation scene.

The possible absence of a picture is generally because the content captured in that portion is not within the depth of view and may not be considered. If for aesthetic purposes, it can also be interpolated with surrounding color pixel values.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. The video stitching method based on the depth map and the color map is characterized by comprising the following steps of: preparing a plurality of groups of color cameras and depth cameras; extracting frames from the shot color video stream and depth video stream, and obtaining a color image and a depth image corresponding to a certain moment; calibrating by using a checkerboard method to obtain the pose of a plurality of groups of color cameras and depth cameras; obtaining three-dimensional coordinates of a relative depth camera coordinate system of a current shooting scene from the depth map, obtaining point cloud data from the depth map, and deleting point cloud data which are not in a depth range from point cloud data corresponding to pixels of the color map; performing image stitching according to the screened point cloud data; repeating the above operation to obtain a spliced video stream;

the method for calibrating by adopting the checkerboard method to obtain the pose of a plurality of groups of color cameras and depth cameras further comprises the following steps: multiple groups of color cameras and depth cameras observe the same Zhang Qipan grid or multiple chessboards; taking one of the color cameras as a reference, and obtaining the positions of all the color cameras; calculating the relative positions of a plurality of groups of color cameras; the pose of the depth camera and the pose of the color camera on each device are fixed, so that the poses of all the depth cameras are obtained;

the method comprises the steps of obtaining three-dimensional coordinates of a relative depth camera coordinate system of a current shooting scene from a depth map, obtaining point cloud data from the depth map, corresponding pixels of the color map to the point cloud data, deleting the point cloud data which are not in a depth range, and further comprises the following steps: setting a plane with the average depth vertical to the Z axis as a projection plane; projecting the point cloud data obtained by the depth map onto the projection surface to obtain a plurality of two-dimensional points; filling the two-dimensional points on the plane according to the pixel values obtained by projecting each three-dimensional point to the color map; hole filling optimization is carried out on the color map of the projection surface; obtaining a complete spliced image frame; each point cloud data has a three-dimensional coordinate, a Z-axis coordinate is used as a depth value, a depth range is set, each point is traversed, and point clouds with the Z-axis value not in the range are deleted.

2. The video stitching method based on depth map and color map as recited in claim 1, comprising the steps of: acquiring a color image M and a depth image D of a certain moment in a plurality of groups of color cameras and depth cameras; sets of color map and depth map matching pairs (M1, D1), (M2, D2), …, (Mn, dn).

3. A method of video stitching based on depth maps and color maps as claimed in claim 2, wherein the three-dimensional coordinates of the relative depth camera coordinate system of the current scene taken are obtained from the depth maps, comprising the steps of: converting the local three-dimensional coordinates into three-dimensional point clouds according to the calculated relative pose; fusing all three-dimensional point cloud data; processing the three-dimensional point cloud data, including denoising and reprojection error optimization; deleting the point cloud data which are not in the depth range; and obtaining information of the observation scene.