CN116266356A

CN116266356A - Panoramic video transition rendering method and device and computer equipment

Info

Publication number: CN116266356A
Application number: CN202111550384.0A
Authority: CN
Inventors: 袁文亮; 陈聪; 谢亮; 姜文杰
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2023-06-20

Abstract

The application relates to a panoramic video transition rendering method, a device, a computer device, a storage medium and a computer program product. The method comprises the following steps: respectively carrying out feature matching on each non-shielded image area of each two adjacent frames of images of at least one image group in the image group set to obtain a matching point pair between each non-shielded image area of each two adjacent frames of images; optimizing the translation amount between every two adjacent frames of images according to the matching point pairs between the image areas where the two adjacent frames of images are not blocked, and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the optimized translation amount between every two adjacent frames of images. Therefore, mismatching in the characteristic matching process can be avoided due to the existence of the shielding area; in addition, the optimized translation amount between every two adjacent frames of images is subjected to histogram statistical analysis, so that the problem of poor calculation of weak textures and the near static scene advancing direction is effectively solved.

Description

Panoramic video transition rendering method and device and computer equipment

Technical Field

The present disclosure relates to the field of panoramic video processing technologies, and in particular, to a panoramic video transition rendering method, apparatus, computer device, storage medium, and computer program product.

Background

In the related art, when a panoramic video is rendered, an absolute view angle of a fixed panoramic camera or a view angle of a smoothly rotated panoramic camera is generally used as a view angle of a virtual camera. However, when the absolute view angle of the panoramic camera is fixed as the view angle of the virtual camera, this fixed view angle may not be a view angle of direct interest to the user; when the view angle of the panoramic camera is smoothly rotated as the view angle of the virtual camera, it is possible that most of the view angles are not view angles of interest to the user. In practice, the user will generally pay more attention to the front of the panoramic camera. However, the method for rendering panoramic video in the related art cannot meet the requirements of users.

In order to solve the above-mentioned problems, patent entitled "panoramic video rendering method with automatically adjusted viewing angle" discloses a panoramic video rendering method with automatically adjusted viewing angle, which includes "obtaining the rotation amount of a panoramic camera relative to a world coordinate system when shooting a current video frame and multiple fisheye images corresponding to the current video frame and a previous video frame of the panoramic video; respectively extracting corner points of multi-path fisheye images corresponding to a previous video frame of the panoramic video, and obtaining a corner point sequence to be tracked; respectively tracking the angular point sequences to be tracked, and obtaining matching point pairs to be tracked in fisheye images corresponding to the current video frame and the previous video frame; optimizing the displacement of the current video frame of the panoramic camera relative to the previous video frame according to the matching point pairs to obtain the optimized displacement; and taking the optimized displacement as the advancing direction of the virtual camera, calculating a rotation matrix of the current virtual camera, and performing transition rendering on the current video frame of the panoramic video by utilizing the rotation amount of the panoramic camera relative to the world coordinate system and the rotation matrix of the current virtual camera when shooting the current video frame.

Although the problem of user's demand has been solved, but panoramic camera is that camera is taken or is carried on motorcycle, car to the both hands of general people when actually shooting, because panoramic camera's shooting lens adopts the fisheye lens, panoramic camera's shooting range is 360 degrees, and the regional object that can be by people's hand or on-vehicle support camera of unavoidable panoramic camera in shooting process lens can be blocked from, leads to appearing the mismatching at the in-process of characteristic matching for virtual camera's advancing direction is inaccurate.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a panoramic video transition rendering method, apparatus, computer device, storage medium, and computer program product that can improve the accuracy of the heading direction of a virtual camera.

In a first aspect, the present application provides a panoramic video transition rendering method. The method comprises the following steps:

acquiring an image group set, wherein each image group in the image group set is obtained by extracting frames from original video, and each original video frame in the original video is synthesized by multiple paths of fisheye images;

determining a plurality of shielding area masks according to at least one image group in the image group set, and determining an image area which is not shielded by each frame of image in the image group set according to the shielding area masks;

Respectively carrying out feature matching on each non-shielded image area of each two adjacent frames of images of at least one image group in the image group set to obtain a matching point pair between each non-shielded image area of each two adjacent frames of images;

optimizing the translation amount between every two adjacent frames of images according to the matching point pairs between the image areas where the two adjacent frames of images are not blocked, and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the optimized translation amount between every two adjacent frames of images;

according to the advancing direction of the virtual camera at the corresponding moment of each original video frame, calculating a rotation matrix of the virtual camera at the corresponding moment of each original video frame;

and performing transition rendering on each original video frame according to the rotation matrix of the virtual camera at the corresponding moment of each original video frame and the rotation amount of the panoramic camera relative to the world coordinate system when shooting each original video frame.

In one embodiment, determining a plurality of occlusion region masks from at least one image group in the set of image groups includes:

blocking the multi-path fisheye images corresponding to each frame of image of at least one image group in the image group set to obtain a block region set corresponding to at least one image group in the image group set;

Determining a maximum gray average value according to the gray average value of each block area in the block area set, and calculating the difference value between the gray average value of each block area and the maximum gray average value;

taking a blocking area corresponding to a difference value larger than a preset threshold value in all the difference values as a target blocking area, wherein the target blocking area is a shielding area in the fisheye image corresponding to the target blocking area;

and determining a plurality of shielding area masks according to shielding areas in the fisheye image corresponding to each frame of image.

In one embodiment, feature matching is performed on each non-occluded image area of each two adjacent frames of images of at least one image group in the image group set, so as to obtain a matching point pair between each non-occluded image area of each two adjacent frames of images, including:

extracting features of the non-shielded image areas in each frame of image of at least one image group in the image group set to obtain feature points in the non-shielded image areas in each frame of image of at least one image group in the image group set;

and carrying out feature point matching on feature points in the image areas where each two adjacent images of at least one image group in the image group set are not blocked, so as to obtain matching point pairs corresponding to each two adjacent images of at least one image group in the image group set.

In one embodiment, feature extraction is performed on an image area which is not blocked in each frame of image of at least one image group in the image group set, including:

for any non-occluded image area, taking any non-occluded image area as a current image area, and extracting feature points in the current image area; the extraction result meets the preset condition, wherein the preset condition comprises that every two adjacent characteristic points are equidistant or the ratio between the area of the area formed by the surrounding of all the extracted characteristic points and the area of the current image area is larger than a preset threshold value.

In one embodiment, before performing feature point matching on feature points in an image area where each two adjacent frames of images of at least one image group in the image group set are not blocked, the method further includes:

and screening the feature points in the non-occluded image area in each frame of image of at least one image group in the image group set based on a random sampling algorithm.

In one embodiment, determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the optimized translation amount between every two adjacent frames of images includes:

Determining a preset direction vector of each two adjacent frames of images under a camera coordinate system according to the translation before optimization between each two adjacent frames of images and the translation after optimization between each two adjacent frames of images;

determining a real main direction corresponding to each sub-direction sequence according to a preset direction vector of each two adjacent frames of images under a camera coordinate system;

and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the real main direction corresponding to each sub-direction sequence.

In one embodiment, determining a preset direction vector of each two adjacent frames of images under a camera coordinate system according to a translation amount before optimization between each two adjacent frames of images and a translation amount after optimization between each two adjacent frames of images includes:

weighting according to the translation amount before optimization between every two adjacent frames of images and the translation amount after optimization between every two adjacent frames of images to obtain the integrated translation amount between every two adjacent frames of images;

converting the integrated translation quantity between every two adjacent frames of images into a camera coordinate system to obtain a direction vector of every two adjacent frames of images in the camera coordinate system;

and calculating an included angle between the direction vector of each two adjacent frames of images under the camera coordinate system and each preset direction vector, and determining the preset direction vector of each two adjacent frames of images, in which the direction vector of each two adjacent frames of images under the camera coordinate system falls, according to the included angle corresponding to each two adjacent frames of images.

In one embodiment, a direction vector sequence is formed by the direction vectors of every two adjacent frames of images under a camera coordinate system; according to the preset direction vector of each two adjacent frames of images under the camera coordinate system, determining the real main direction corresponding to each sub-direction sequence comprises the following steps:

segmenting the direction vector sequence based on the time sequence in the direction vector sequence to obtain a plurality of sub-direction sequences;

determining the times that each sub-direction sequence falls on each preset direction vector according to the preset direction vector in which the direction vector of each adjacent two frames of images falls under the camera coordinate system, and taking the preset direction vector with the largest falling times of each sub-direction sequence as the main direction corresponding to each sub-direction sequence;

if the total number of times that the main direction corresponding to each sub-direction sequence falls into is within the preset number of times range, the main direction corresponding to each sub-direction sequence is taken as the real main direction corresponding to each sub-direction sequence.

In one embodiment, determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the real main direction corresponding to each sub-direction sequence includes:

smoothing and interpolating the real main direction corresponding to each sub-direction sequence to obtain a direction vector corrected at the corresponding moment of each original video frame;

Converting the direction vector corrected at the corresponding moment of each original video frame into a world coordinate system to obtain the direction vector at the corresponding moment of each original video frame;

and taking the direction vector of the corresponding moment of each original video frame as the advancing direction of the virtual camera at the corresponding moment of each original video frame.

In a second aspect, the application also provides a panoramic video transition rendering device. The device comprises:

the acquisition module is used for acquiring an image group set, wherein each image group in the image group set is obtained by extracting frames from original video, and each original video frame in the original video is synthesized by multiple paths of fisheye images;

the first determining module is used for determining a plurality of shielding area masks according to at least one image group in the image group set and determining an image area where each frame of image in the image group set is not shielded according to the shielding area masks;

the second determining module is used for respectively carrying out feature matching on the image areas which are not blocked by each two adjacent frames of images of at least one image group in the image group set to obtain matching point pairs between the image areas which are not blocked by each two adjacent frames of images;

the third determining module is used for optimizing the translation amount between every two adjacent frames of images according to the matching point pairs between the image areas where the two adjacent frames of images are not blocked, and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the optimized translation amount between every two adjacent frames of images;

The calculation module is used for calculating a rotation matrix of the virtual camera at the corresponding moment of each original video frame according to the advancing direction of the virtual camera at the corresponding moment of each original video frame;

and the rendering module is used for performing transition rendering on each original video frame according to the rotation matrix of the virtual camera at the corresponding moment of each original video frame and the rotation amount of the panoramic camera relative to the world coordinate system when shooting each original video frame.

In a third aspect, the present application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:

According to the panoramic video transition rendering method, the device, the computer equipment, the storage medium and the computer program product, as the characteristic points of the shielding area do not need to be considered in the characteristic matching process, the occurrence of mismatching in the characteristic matching process due to the existence of the shielding area can be avoided, and the accuracy of the optimized translation amount between every two adjacent frames of images can be improved. That is, the accuracy of the advancing direction of the virtual camera at the corresponding timing of each frame image in the original video can be improved. In addition, when determining the mask of the shielding area or performing feature matching, each original video frame does not need to be processed, but an image group set is obtained by extracting frames from the original video, and at least one image group is processed, so that the calculation amount when determining the mask of the shielding area or performing feature matching can be reduced. In addition, the optimized translation amount between every two adjacent frames of images is subjected to histogram statistical analysis, so that the problem of poor calculation of weak textures and the near static scene advancing direction is effectively solved.

Drawings

FIG. 1 is an application environment diagram of a panoramic video transition rendering method in one embodiment;

FIG. 2 is a flow chart of a panoramic video transition rendering method in one embodiment;

FIG. 3 is a block diagram of a panoramic video transition rendering device in one embodiment;

fig. 4 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various terms, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one term from another.

Although the problem of user's demand has been solved, but panoramic camera is shooting or is carried on motorcycle, car with camera holding in the both hands of people in actual shooting, because panoramic camera's shooting lens adopts the fisheye lens, panoramic camera's shooting range is 360 degrees, unavoidable panoramic camera can be sheltered from by people's hand or on-vehicle support camera's object in shooting process a part of area, lead to every frame image in the video that shoots to obtain to have and shelter from the area, and shelter from the area can lead to present video frame and preceding video frame to appear mismatching at the in-process of characteristic matching, thereby according to the matching point pair, optimize panoramic camera present video frame displacement volume relative to preceding video frame, the displacement volume after the optimization that obtains is inaccurate, and then make the advancing direction of the virtual camera that calculates inaccurate.

In view of the problems in the related art described above, an embodiment of the present application provides a panoramic video transition rendering method, which may be applied to the application scenario in fig. 1. Fig. 1 includes a panoramic camera 101 and a server 102. The panoramic camera 101 is a panoramic camera in which a plurality of fisheye lenses are mounted, and the number of fisheye lenses is not limited. When shooting panoramic video, a person can take the panoramic camera 101 or fix the panoramic camera on a vehicle through a bracket for shooting. A panoramic video is photographed by the panoramic camera 101, and transmitted to the server 102. The server 102 mainly processes the panoramic video captured by the panoramic camera 101 to obtain a rendered panoramic video. Of course, in the actual implementation process, the processing function of the server 102 may also be directly integrated into the panoramic camera 101, that is, the panoramic camera 101 shoots to obtain a panoramic video, and renders the panoramic video to obtain a rendered panoramic video, and when the rendered panoramic video is needed to be used later, only the panoramic camera 101 is needed to output the rendered panoramic video.

In addition, the processing device for processing the panoramic video is not necessarily in the form of a server, but may be a special processing device, such as a personal computer or a notebook computer, which is not particularly limited in the embodiment of the present application. It should be noted that, the number of "plural" and the like mentioned in each embodiment of the present application refers to the number of "at least two", for example, "plural" refers to "at least two".

Based on this, referring to fig. 2, a panoramic video transition rendering method is provided. Taking the method as an example, the method is applied to a server, and an execution subject is taken as the server for explanation, and the method comprises the following steps:

202. and acquiring an image group set, wherein each image group in the image group set is obtained by extracting frames from an original video, and each original video frame in the original video is synthesized by a plurality of fish-eye images.

The original video may be a panoramic video, and the panoramic video may be a panoramic video after anti-shake processing. In addition, the extraction interval corresponding to each image group in the image group set may be different.

204. And determining a plurality of shielding area masks according to at least one image group in the image group set, and determining an image area which is not shielded by each frame of image in the image group set according to the shielding area masks.

It can be understood that, since each original video frame in the original video is synthesized by multiple fisheye images, the images in the image group are extracted from the original video, so that the existence of the occlusion of the camera lens for shooting the multiple fisheye images is reflected in the images. In particular, it may result in several occlusion regions in the image. It will also be appreciated that since the position at which the camera lens taking the multiple fisheye images is blocked is typically fixed, the position and shape of the blocked area present in each image is the same in the respective image. Thus, it will also be appreciated that the determined number of occlusion region masks at this step from at least one image group may be applied to each image of each image group in the set of image groups, i.e. different images all have the same number of occlusion region masks.

206. And respectively carrying out feature matching on the image areas which are not blocked by each two adjacent frames of images of at least one image group in the image group set to obtain matching point pairs between the image areas which are not blocked by each two adjacent frames of images.

The "at least one image group" mentioned in this step may be identical to the "at least one image group" mentioned in the above step 204, may be partially identical to the "at least one image group" or may be completely different from the "at least one image group", which is not specifically limited in this embodiment of the present application. It will be appreciated that if there is more scene overlap per two adjacent frames of images, the more matching point pairs are formed. Therefore, in order to form more matching point pairs between the image areas where each two adjacent frames of images are not blocked, the "at least one image group" mentioned in this step may be obtained by extracting frames from the original video based on the frame extraction interval as small as possible, so as to ensure that more scenery overlap exists between each two adjacent frames of images. It will also be appreciated that the greater the number of matching point pairs, the more advantageous it is to obtain a more accurate amount of translation.

It should be noted that, in the actual implementation process, for the "at least one image group" mentioned in this step and the "at least one image group" mentioned in the above step 204, the frame extraction interval corresponding to the two image groups may be determined according to the moving speed of capturing the multi-path fisheye image. Specifically, if the motion speed of the panoramic camera is faster, the corresponding shooting scene changes faster, and the frame extraction interval corresponding to the two can be appropriately selected to be smaller; conversely, if the motion speed of the panoramic camera is slower, the corresponding shooting scene changes slower, and the frame extraction interval corresponding to the two can be appropriately selected to be larger.

For ease of understanding, an arbitrary two adjacent frame images will be taken as an example, and the two adjacent frame images are respectively taken as a previous frame image and a subsequent frame image. In the actual implementation process, the matching point pair between the previous frame image and the next frame image can be obtained by means of feature matching or optical flow tracking.

208. Optimizing the translation amount between every two adjacent frames of images according to the matching point pairs between the image areas where the two adjacent frames of images are not blocked, and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the optimized translation amount between every two adjacent frames of images.

It should be noted that in the practical implementation process, the multiple fisheye images are captured by multiple fisheye lenses, each fisheye lens is used for capturing one path of fisheye image, and the multiple fisheye lenses may be integrated on one panoramic camera. The "shift amount between every two adjacent frame images" mentioned in this step may be used as a shift amount of a subsequent frame image in every two adjacent frame images, so as to represent a motion vector of the panoramic camera when capturing the subsequent frame image with respect to when capturing the previous frame image.

For ease of understanding, the process of optimizing the amount of translation between each two adjacent frames of images in the image group will be described taking as an example any of the image groups mentioned in step 206 as "at least one image group"). Initializing a translation amount of a first frame image in the image group to be (0, 1); taking any two adjacent frame images as a t-2 frame image, a t-1 frame image and a t frame image as examples, the translation amount between the t-2 frame image and the t-1 frame image can be obtained first, and then the translation amount between the t-2 frame image and the t-1 frame image is taken as the translation amount between the t-1 frame image and the t frame image.

Likewise, taking the t-1 frame image and the t frame image as an example, the translation amount between the t-1 frame image and the t frame image can be optimized by the objective function, and the objective function can be expressed by the following formula (1):

wherein F is _i (T _t ) The function can be expressed by the following formula (2):

in the above formula (1) and formula (2), T _t Is the firstAmount of translation between T-1 frame image and T-th frame image, T _t-1 For the translation amount between the t-2 frame image and the t-1 frame image, F is a cost function to be optimized, and i is a matching point pair sequence number; ρ(s) is a Loss function, N is the number of optimized feature points, a is a smoothing parameter, and a may be 0.1; d, d _t,i And d _t-1,i Respectively is

And

corresponding depth->

And->

Respectively representing matching point pairs between image areas which are not blocked by the t-th frame image and the t-1 th frame image in the world coordinate system. The loss function ρ(s) can be expressed by the following formula (3):

specifically F _i (t _t ) The optimization iterative process of (1) may be: firstly, solving depth of all matching point pairs between the image areas which are not blocked by the t-1 frame image and the t-1 frame image by adopting a least square method, and then calculating residual errors

F (F) _i (T _t ) Jacob function->

Where k is the number of iterations, T _t ^k For T after k iterations _t ，/>

And->

D after k iterations respectively _t,i And d _t-1,i ，/>

And->

Respectively->

And->

Is the inverse depth of (2).

The specific process of solving the depth can be shown in the following formula (4):

it should be noted that the result of the formula (4) can be used

And (3) representing. When->

When in use, then

When->

When in use, then->

It will be appreciated that the panoramic camera may be occluded when capturing multiple fisheye images, as the panoramic camera is typically fixed to a movable carrier, and the fixation process may occlude the fisheye lens. The movable carrier may be a person or a movable device, which is not particularly limited in the embodiments of the present application. The concept of the application is mainly to abstract the panoramic camera and the movable carrier into a movable virtual camera, and perform transition rendering by calculating a rotation matrix of the virtual camera at a corresponding moment of each original video frame. In this step, the moving direction of the virtual camera at the corresponding time of each original video frame may be determined first. Since the movable carrier is usually moved in the forward direction of its front face, this step is also mainly to determine the forward direction of the virtual camera at the corresponding moment of each original video frame. In the embodiment of the application, interpolation processing can be performed through the translation amount between every two adjacent frames of images of the image group, so that the translation amount of the virtual camera at the corresponding moment of each original video frame is obtained and used as the advancing direction of the virtual camera at the corresponding moment of each original video frame.

210. And calculating a rotation matrix of the virtual camera at the corresponding moment of each original video frame according to the advancing direction of the virtual camera at the corresponding moment of each original video frame.

Specifically, for any original video frame, the corresponding time of the original video frame is denoted as t, and the advancing direction of the virtual camera at the corresponding time of the original video frame is denoted as

According to->

And calculating a rotation matrix Rv of the virtual camera at the moment corresponding to the original frame. Wherein Rv may be denoted as [ e ] ₀ ，e ₁ ，e ₂ ]，/>

e ₂ ＝e ₀ *e ₁ ，/>

212. And performing transition rendering on each original video frame according to the rotation matrix of the virtual camera at the corresponding moment of each original video frame and the rotation amount of the panoramic camera relative to the world coordinate system when shooting each original video frame.

Specifically, the panoramic camera rotates an amount relative to the world coordinate system when each original video frame is captured

The method can be calculated according to the numerical value of the gyroscope of the panoramic camera when shooting each original video frame, and specifically can be calculated according to the following formula:

wherein (1)>

For each original video frame corresponding to the amount of rotation of the moment gyroscope to the world coordinate system,

for each original video frame corresponding moment panoramic camera to gyroscope rotation amount, +.>

Is->

And the gyroscope values of the panoramic camera at the corresponding moment of each original video frame are obtained.

According to the method provided by the embodiment of the application, as the characteristic points of the shielding area do not need to be considered in the process of carrying out characteristic matching, mismatching in the process of characteristic matching caused by the existence of the shielding area can be avoided, and the accuracy of the optimized translation amount between every two adjacent frames of images can be improved. That is, the accuracy of the advancing direction of the virtual camera at the corresponding timing of each frame image in the original video can be improved. In addition, when determining the mask of the shielding area or performing feature matching, each original video frame does not need to be processed, but an image group set is obtained by extracting frames from the original video, and at least one image group is processed, so that the calculation amount when determining the mask of the shielding area or performing feature matching can be reduced.

In combination with the foregoing embodiments, in one embodiment, determining a plurality of occlusion region masks from at least one image group in the set of image groups includes: blocking the multi-path fisheye images corresponding to each frame of image of at least one image group in the image group set to obtain a block region set corresponding to at least one image group in the image group set; determining a maximum gray average value according to the gray average value of each block area in the block area set, and calculating the difference value between the gray average value of each block area and the maximum gray average value; taking a blocking area corresponding to a difference value larger than a preset threshold value in all the difference values as a target blocking area, wherein the target blocking area is a shielding area in the fisheye image corresponding to the target blocking area; and determining a plurality of shielding area masks according to shielding areas in the fisheye image corresponding to each frame of image.

Specifically, for any image group in the image group set, two frames of images are included in the image group, the images A and B are respectively obtained by combining a 1-way fisheye image and a 2-way fisheye image, each way fisheye image is divided into 4 blocks, the average gray value of each block area after the a 1-way fisheye image of the image A is divided into 4 blocks is respectively 113, 123, 178 and 146, the average gray value of each block area after the a 2-way fisheye image of the image A is divided into 4 blocks is respectively 120, 172, 166 and 159, the average gray value of each block area after the a 1-way fisheye image of the image B is divided into 4 blocks is respectively 115, 125, 174 and 145, the average gray value of each block area after the a 2-way fisheye image of the image B is divided into 4 blocks is respectively 124, 166, 160 and 148, and the preset threshold value is 40.

Thus, the group of images may correspond to 16 partitioned areas in total. In combination with the above-listed gray average value for each segmented region, a maximum gray average value of 178 can be determined therefrom. Then, the difference between the gray average value and the maximum gray average value of each segmented region is calculated to be 65, 55, 0, 32, 58, 6, 12, 19, 63, 53, 4, 33, 54, 12, 18 and 30, respectively. Since the preset threshold value is 40, the block areas having the gradation averages 113, 123, 120, 115, 125, and 124 can be regarded as the target block areas. The block region with the average gray value of 113 and the block region with the average gray value of 123 are set as shielding regions in the a 1-path fisheye image of the A image, and the block region with the average gray value of 120 is set as shielding regions in the a 2-path fisheye image of the A image. The block region with the average gray value of 115 and the block region with the average gray value of 125 are set as the shielding regions in the a 1-path fisheye image of the B image, and the block region with the average gray value of 124 is set as the shielding regions in the a 2-path fisheye image of the B image.

Thus, the shielding region in the a 1-path fisheye image of the A image, the shielding region in the a 2-path fisheye image of the A image, the shielding region in the a 1-path fisheye image of the B image and the shielding region in the a 2-path fisheye image of the B image can be integrated, and the position of the shielding region mask in the image can be determined. While reference is made to the above embodiments, it is understood that the determined number of occlusion region masks may be applied to each image of each image group in the set of image groups, i.e. different images have the same number of occlusion region masks, depending on at least one image group. Therefore, the position of the shielding region mask in the image determined in the step is the position of the shielding region mask of each image in the image group set.

According to the method provided by the embodiment of the application, the shielding area is higher in gray value, so that the position of the shielding area in the fisheye image can be accurately determined based on the gray value difference, and the position of the shielding area mask in the image can be accurately determined.

In combination with the foregoing embodiments, in one embodiment, feature matching is performed on each of non-occluded image areas of each two adjacent frames of images of at least one image group in the image group set, so as to obtain a matching point pair between each of non-occluded image areas of each two adjacent frames of images, where the matching point pair includes: extracting features of the non-shielded image areas in each frame of image of at least one image group in the image group set to obtain feature points in the non-shielded image areas in each frame of image of at least one image group in the image group set; and carrying out feature point matching on feature points in the image areas where each two adjacent images of at least one image group in the image group set are not blocked, so as to obtain matching point pairs corresponding to each two adjacent images of at least one image group in the image group set.

The feature extraction and matching modes may be one or more of SIFT (Scale-invariant feature transform, scale invariant feature transform), SURF (Speeded Up Robust Features, acceleration robustness feature), ORB (Oriented FAST and Rotated BRIEF), harris corner detection, goodFeatureToTrack, and the like, which are not particularly limited in the embodiments of the present application.

According to the method provided by the embodiment of the application, the image groups are obtained by extracting frames from the original video, and the translation amount between every two adjacent original video frames in the original video can be determined based on the matching point pairs between the image areas where every two adjacent frames of images of at least one image group are not blocked, so that the overall calculation amount can be reduced.

In combination with the foregoing embodiments, in one embodiment, performing feature extraction on an image area that is not occluded in each frame image of at least one image group in the image group set includes: for any non-occluded image area, taking any non-occluded image area as a current image area, and extracting feature points in the current image area; the extraction result meets the preset condition, wherein the preset condition comprises that every two adjacent characteristic points are equidistant or the ratio between the area of the area formed by the surrounding of all the extracted characteristic points and the area of the current image area is larger than a preset threshold value.

The preset conditions are mainly set in order to limit that the extracted feature points are not concentrated in one or a plurality of local areas when the feature points are extracted in the current image area. Therefore, by setting the preset condition that every two adjacent characteristic points are equidistant, the extracted characteristic points can cover the whole current image area. The preset conditions are set so that every two adjacent characteristic points are equidistant, and the extracted characteristic points can be uniformly distributed, so that the characteristic points can be conveniently distributed subsequently. And the preset condition is set to be that the ratio between the area of the area formed by surrounding all the extracted characteristic points and the area of the current image area is larger than a preset threshold value, so that the extracted characteristic points are prevented from being concentrated in one or a plurality of local areas.

According to the method provided by the embodiment of the application, the extracted characteristic points can be prevented from being limited in one or more local areas in the image, so that the coverage of the characteristic point pairs can be improved, and the accuracy of the optimized translation amount between every two adjacent frames of images in the subsequent image group can be improved.

In combination with the foregoing embodiments, in one embodiment, before performing feature point matching on feature points in an image area where each two adjacent frames of images of at least one image group in the image group set are not blocked, the method further includes: and screening the feature points in the non-occluded image area in each frame of image of at least one image group in the image group set based on a random sampling algorithm.

According to the method provided by the embodiment of the application, the extracted characteristic points are screened through the random sampling algorithm, and the false extraction of the characteristic points caused by factors such as illumination, imaging angles, geometric deformation, ground feature change and the like can be eliminated, so that the accuracy of characteristic point extraction can be improved, and the accuracy of translation amount after optimization between every two adjacent frames of images can be improved.

In combination with the foregoing embodiments, in one embodiment, determining, according to the optimized translation amount between each two adjacent frames of images, a forward direction of the virtual camera at a time corresponding to each original video frame includes: determining a preset direction vector of each two adjacent frames of images under a camera coordinate system according to the translation before optimization between each two adjacent frames of images and the translation after optimization between each two adjacent frames of images; determining a real main direction corresponding to each sub-direction sequence according to a preset direction vector of each two adjacent frames of images under a camera coordinate system; and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the real main direction corresponding to each sub-direction sequence.

The advancing direction of the virtual camera at the corresponding moment of each original video frame can be determined by carrying out histogram statistical analysis on the optimized translation amount between every two adjacent frames of images.

According to the method provided by the embodiment of the application, as the weak textures of part of images in the image group easily cause mismatching in the characteristic matching process, more mismatching exists in formed matching point pairs, and the obtained optimized translation amount of the mismatching characteristic point pairs is inaccurate. In addition, because the approximately static scene does not have a determined advancing direction, the problem of disordered advancing direction calculation easily occurs, and the advancing direction of the virtual camera at the corresponding moment of each frame image in the original video is determined according to the real main direction corresponding to each sub-direction sequence, the error influence caused by weak textures of part of images in the original video and the scene with the approximately static time period in part of the original video can be eliminated, so that the accuracy of the advancing direction of the virtual camera at the corresponding moment of each original video frame can be improved.

In combination with the foregoing embodiments, in one embodiment, determining a preset direction vector of each two adjacent frames of images in a camera coordinate system according to a translation amount before optimization between each two adjacent frames of images and a translation amount after optimization between each two adjacent frames of images includes: weighting according to the translation amount before optimization between every two adjacent frames of images and the translation amount after optimization between every two adjacent frames of images to obtain the integrated translation amount between every two adjacent frames of images; converting the integrated translation quantity between every two adjacent frames of images into a camera coordinate system to obtain a direction vector of every two adjacent frames of images in the camera coordinate system; and calculating an included angle between the direction vector of each two adjacent frames of images under the camera coordinate system and each preset direction vector, and determining the preset direction vector of each two adjacent frames of images, in which the direction vector of each two adjacent frames of images under the camera coordinate system falls, according to the included angle corresponding to each two adjacent frames of images.

Specifically, for any one of at least one image group, taking any two adjacent frames of images in the image group as an example, the direction vectors of the two adjacent frames of images under the camera coordinate system may be shown in the following formula (5):

Tc＝Rc_w1*Tw； (5)

where Tw represents an integrated translation amount between the two adjacent frame images, and rc_w1 represents a rotation amount of the panoramic camera with respect to the world coordinate system when capturing a next frame image of the two adjacent frame images.

For example, taking one image group including 3 frames of images T1, T2 and T3, respectively, the direction vectors used for calculating the included angles are denoted as C, D, E and F, respectively. Thus, if the angles between the direction vectors of the two adjacent frames of images T1 and T2 in the camera coordinate system and the preset direction vectors C, D, E and F are 30 degrees, 25 degrees, 20 degrees and 5 degrees, respectively, and the angles between the direction vectors of the two adjacent frames of images T2 and T3 in the camera coordinate system and the preset direction vectors C, D, E and F are 30 degrees, 25 degrees, 5 degrees and 3 degrees, respectively. And judging which preset direction vector of the two adjacent frames of images falls into under the camera coordinate system, if the condition that the corresponding included angle of the two adjacent frames of images is smaller than the preset threshold value by 7 degrees, determining that the preset direction vector of the T1 and T2 falling into under the camera coordinate system is F, and determining that the preset direction vector of the T2 and T3 falling into under the camera coordinate system is F.

In combination with the above embodiments, in one embodiment, a direction vector sequence is formed by the direction vectors of every two adjacent frames of images under the camera coordinate system; according to the preset direction vector of each two adjacent frames of images under the camera coordinate system, determining the real main direction corresponding to each sub-direction sequence comprises the following steps: segmenting the direction vector sequence based on the time sequence in the direction vector sequence to obtain a plurality of sub-direction sequences; determining the times that each sub-direction sequence falls on each preset direction vector according to the preset direction vector in which the direction vector of each adjacent two frames of images falls under the camera coordinate system, and taking the preset direction vector with the largest falling times of each sub-direction sequence as the main direction corresponding to each sub-direction sequence; if the total number of times that the main direction corresponding to each sub-direction sequence falls into is within the preset number of times range, the main direction corresponding to each sub-direction sequence is taken as the real main direction corresponding to each sub-direction sequence.

Specifically, taking an example that one image group includes 10 frames of images, i.e., T1, T2, T3, T4, T5, T6, T7, T8, T9 and T10, respectively, the direction vectors used for calculating the included angles are respectively noted as C, D, E and F, and the direction vector sequences are segmented, so as to obtain three sub-direction sequences, i.e., Y1, Y2 and Y3, respectively. The sub-direction sequence Y1 is composed of three pairs of direction vectors of two adjacent frames of images of T1 and T2, T2 and T3, and T3 and T4 under a camera coordinate system. The sub-direction sequence Y2 is composed of three pairs of direction vectors of adjacent two frames of images of T4 and T5, T5 and T6 and T7 under a camera coordinate system. The sub-direction sequence Y3 is composed of three pairs of direction vectors of adjacent two-frame images of T7 and T8, T8 and T9 and T10 under a camera coordinate system.

Counting the total times of falling, wherein the times of falling the sub-direction sequence Y1 into C, D, E and F four preset direction vectors are respectively 0, 1 and 3; the times of the sub-direction sequence Y2 falling into the C, D, E and F four preset direction vectors are 2, 1, 0 and 0 respectively, and the times of the sub-direction sequence Y3 falling into the C, D, E and F four preset direction vectors are 2, 3, 0 and 0 respectively. From this, it can be determined that the main direction corresponding to the sub-direction sequence Y1 is the preset direction vector F, the main direction corresponding to the sub-direction sequence Y2 is the preset direction vector C, and the main direction corresponding to the sub-direction sequence Y3 is the preset direction vector D. In summary, the main direction corresponding to the sub-direction sequence Y1 may be used as the real main direction corresponding to the sub-direction sequence Y1, that is, the preset direction vector F is used as the real main direction corresponding to the sub-direction sequence Y1. The main direction corresponding to the sub-direction sequence Y3 is used as the real main direction corresponding to the sub-direction sequence Y3, namely the preset direction vector D is used as the real main direction corresponding to the sub-direction sequence Y3.

In combination with the foregoing embodiments, in one embodiment, determining, according to the real main direction corresponding to each sub-direction sequence, the advancing direction of the virtual camera at the time corresponding to each original video frame includes: smoothing and interpolating the real main direction corresponding to each sub-direction sequence to obtain a direction vector corrected at the corresponding moment of each original video frame; converting the direction vector corrected at the corresponding moment of each original video frame into a world coordinate system to obtain the direction vector at the corresponding moment of each original video frame; and taking the direction vector of the corresponding moment of each original video frame as the advancing direction of the virtual camera at the corresponding moment of each original video frame.

The smoothing processing may use an n-time weighted smoothing algorithm, or may use a smooth function, which is not specifically limited in the embodiment of the present application. The interpolation may use quaternion transform spherical linear interpolation or three-dimensional rotation spherical linear interpolation, which is not particularly limited in the embodiment of the present application.

According to the method provided by the embodiment of the application, as the processing of every two adjacent frames of images in the original video is not needed, the smoothing and interpolation processing is carried out on the real main direction corresponding to each sub-direction sequence, so that the direction vector of each original video frame at the corresponding moment is obtained, and the calculated amount can be reduced.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a panoramic video transition rendering device for realizing the panoramic video transition rendering method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more panoramic video transition rendering apparatuses provided below may be referred to the limitation of the panoramic video transition rendering method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 3, there is provided a panoramic video transition rendering apparatus, comprising: an acquisition module 301, a first determination module 302, a second determination module 303, a third determination module 304, a calculation module 305, and a rendering module 306, wherein:

the acquiring module 301 is configured to acquire an image group set, where each image group in the image group set is obtained by extracting frames from an original video, and each original video frame in the original video is synthesized by multiple fisheye images;

a first determining module 302, configured to determine a plurality of shielding area masks according to at least one image group in the image group set, and determine an image area where each frame of image in the image group set is not shielded according to the plurality of shielding area masks;

A second determining module 303, configured to perform feature matching on each non-occluded image area of each two adjacent frames of images of at least one image group in the image group set, so as to obtain a matching point pair between each non-occluded image area of each two adjacent frames of images;

the third determining module 304 is configured to optimize a translation amount between each two adjacent frames of images according to a matching point pair between image areas where each two adjacent frames of images are not blocked, and determine an advancing direction of the virtual camera at a corresponding moment of each original video frame according to the optimized translation amount between each two adjacent frames of images;

the calculating module 305 is configured to calculate a rotation matrix of the virtual camera at a time corresponding to each original video frame according to a direction of advance of the virtual camera at the time corresponding to each original video frame;

the rendering module 306 is configured to perform transition rendering on each original video frame according to the rotation matrix of the virtual camera at the time corresponding to each original video frame and the rotation amount of the panoramic camera relative to the world coordinate system when shooting each original video frame.

According to the device provided by the embodiment of the application, as the characteristic points of the shielding area do not need to be considered in the process of carrying out characteristic matching, mismatching in the process of characteristic matching caused by the existence of the shielding area can be avoided, and the accuracy of the translation amount after optimization between every two adjacent frames of images can be improved. That is, the accuracy of the advancing direction of the virtual camera at the corresponding timing of each frame image in the original video can be improved. In addition, when determining the mask of the shielding area or performing feature matching, each original video frame does not need to be processed, but an image group set is obtained by extracting frames from the original video, and at least one image group is processed, so that the calculation amount when determining the mask of the shielding area or performing feature matching can be reduced.

In one embodiment, the first determining module 302 includes:

the blocking unit is used for blocking the multi-path fisheye images corresponding to each frame of image of at least one image group in the image group set to obtain a blocking area set corresponding to at least one image group in the image group set;

the first determining unit is used for determining a maximum gray average value according to the gray average value of each block area in the block area set, and calculating a difference value between the gray average value of each block area and the maximum gray average value;

the unit is used for taking a blocking area corresponding to a difference value larger than a preset threshold value in all difference values as a target blocking area, wherein the target blocking area is a shielding area in the fisheye image corresponding to the target blocking area;

and the second determining unit is used for determining a plurality of shielding area masks according to shielding areas in the fisheye image corresponding to each frame of image.

In one embodiment, the second determining module 303 includes:

the characteristic extraction unit is used for extracting the characteristics of the non-shielded image areas in each frame of image of at least one image group in the image group set to obtain the characteristic points in the non-shielded image areas in each frame of image of at least one image group in the image group set;

And the matching unit is used for carrying out characteristic point matching on the characteristic points in the image areas where each two adjacent images of at least one image group in the image group set are not blocked, so as to obtain matching point pairs corresponding to each two adjacent images of at least one image group in the image group set.

In one embodiment, the feature extraction unit is further configured to, for any non-occluded image area, use any non-occluded image area as a current image area, and extract feature points in the current image area; the extraction result meets the preset condition, wherein the preset condition comprises that every two adjacent characteristic points are equidistant or the ratio between the area of the area formed by the surrounding of all the extracted characteristic points and the area of the current image area is larger than a preset threshold value.

In one embodiment, the second determining module 303 further includes:

and the screening unit is used for screening the characteristic points in the non-occluded image area in each frame of image of at least one image group in the image group set based on a random sampling algorithm.

In one embodiment, the third determination module 304 includes:

the third determining unit is used for determining a preset direction vector of each two adjacent frames of images under a camera coordinate system according to the translation amount before optimization between each two adjacent frames of images and the translation amount after optimization between each two adjacent frames of images;

A fourth determining unit, configured to determine a real main direction corresponding to each sub-direction sequence according to a preset direction vector of each two adjacent frames of images in a camera coordinate system;

and the fifth determining unit is used for determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the real main direction corresponding to each sub-direction sequence.

In one embodiment, the third determining unit includes:

the weighting subunit is used for weighting according to the translation amount before optimization between every two adjacent frames of images and the translation amount after optimization between every two adjacent frames of images to obtain the integrated translation amount between every two adjacent frames of images;

the first conversion subunit is used for converting the integrated translation quantity between every two adjacent frames of images into a camera coordinate system to obtain a direction vector of every two adjacent frames of images in the camera coordinate system;

and the calculating subunit is used for calculating the included angle between the direction vector of each two adjacent frames of images under the camera coordinate system and each preset direction vector, and determining the preset direction vector in which the direction vector of each two adjacent frames of images under the camera coordinate system falls according to the included angle corresponding to each two adjacent frames of images.

In one embodiment, a direction vector sequence is formed by the direction vectors of every two adjacent frames of images under a camera coordinate system; a fourth determination unit including:

A segmentation subunit, configured to segment the direction vector sequence based on a timing sequence in the direction vector sequence, to obtain a plurality of sub-direction sequences;

the determining subunit is used for determining the times that each sub-direction sequence falls on each preset direction vector according to the preset direction vector that the direction vector of each adjacent two frames of images falls on the camera coordinate system, and taking the preset direction vector with the largest falling times of each sub-direction sequence as the main direction corresponding to each sub-direction sequence; if the total number of times that the main direction corresponding to each sub-direction sequence falls into is within the preset number of times range, the main direction corresponding to each sub-direction sequence is taken as the real main direction corresponding to each sub-direction sequence. .

In one embodiment, the fifth determining unit includes:

the smoothing subunit is used for carrying out smoothing and interpolation processing on the real main direction corresponding to each sub-direction sequence to obtain a direction vector corrected at the corresponding moment of each original video frame;

the second converting subunit is used for converting the direction vector corrected at the corresponding moment of each original video frame into a world coordinate system to obtain the direction vector at the corresponding moment of each original video frame;

and the sub-unit is used for taking the direction vector of the corresponding moment of each original video frame as the advancing direction of the virtual camera at the corresponding moment of each original video frame.

The above-described respective modules in the panoramic video transition rendering apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data such as the original video and the rendered original video. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a panoramic video transition rendering method.

Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only and do not constitute a limitation of the computer device on which the present aspects apply, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of:

In one embodiment, a direction vector sequence is formed by the direction vectors of every two adjacent frames of images under a camera coordinate system; the processor when executing the computer program also implements the steps of:

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, a direction vector sequence is formed by the direction vectors of every two adjacent frames of images under a camera coordinate system; the computer program when executed by the processor also performs the steps of:

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A panoramic video transition rendering method, the method comprising:

acquiring an image group set, wherein each image group in the image group set is obtained by extracting frames from an original video, and each original video frame in the original video is synthesized by a plurality of fish-eye images;

determining a plurality of shielding area masks according to at least one image group in the image group set, and determining an image area where each frame of image in the image group set is not shielded according to the shielding area masks;

Respectively carrying out feature matching on each non-occluded image area of each two adjacent frames of images of at least one image group in the image group set to obtain a matching point pair between each non-occluded image area of each two adjacent frames of images;

optimizing the translation amount between every two adjacent frames of images according to the matching point pairs between the image areas which are not blocked by the two adjacent frames of images, and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the optimized translation amount between every two adjacent frames of images;

calculating a rotation matrix of the virtual camera at the corresponding moment of each original video frame according to the advancing direction of the virtual camera at the corresponding moment of each original video frame;

2. The method of claim 1, wherein determining a number of occlusion region masks from at least one of the set of image sets comprises:

Partitioning a plurality of paths of fisheye images corresponding to each frame of image of at least one image group in the image group set to obtain a partitioned area set corresponding to at least one image group in the image group set;

determining a maximum gray average value according to the gray average value of each segmented region in the segmented region set, and calculating the difference value between the gray average value of each segmented region and the maximum gray average value;

taking a blocking area corresponding to a difference value larger than a preset threshold value in all difference values as a target blocking area, wherein the target blocking area is a shielding area in a fisheye image corresponding to the target blocking area;

3. The method according to claim 1, wherein the performing feature matching on the image areas that are not occluded in each of the two adjacent frames of images of at least one image group in the image group set to obtain matching point pairs between the image areas that are not occluded in each of the two adjacent frames of images respectively includes:

4. A method according to claim 3, wherein the feature extraction of the non-occluded image area in each frame of image of at least one image group in the set of image groups comprises:

for any non-occluded image area, taking the any non-occluded image area as a current image area, and extracting feature points in the current image area; the extraction result meets the preset condition, wherein the preset condition comprises that every two adjacent characteristic points are equidistant or the ratio between the area of the area formed by the surrounding of all the extracted characteristic points and the area of the current image area is larger than a preset threshold value.

5. A method according to claim 3, wherein before the feature point matching is performed on the feature points in the image area where each two adjacent images of at least one image group in the image group set are not blocked, the method further comprises:

6. The method according to claim 1, wherein determining the advancing direction of the virtual camera at the corresponding time of each original video frame according to the optimized translation amount between every two adjacent frames of images comprises:

determining a preset direction vector of each two adjacent frames of images under a camera coordinate system according to the translation before optimization and the translation after optimization;

determining a real main direction corresponding to each sub-direction sequence according to the preset direction vector of each two adjacent frames of images under a camera coordinate system;

7. The method according to claim 6, wherein determining the preset direction vector of each two adjacent frames of images in the camera coordinate system according to the translation before the optimization between each two adjacent frames of images and the translation after the optimization between each two adjacent frames of images comprises:

converting the integrated translation quantity between every two adjacent frames of images into a camera coordinate system to obtain a direction vector of the every two adjacent frames of images in the camera coordinate system;

and calculating an included angle between the direction vector of each two adjacent frames of images under the camera coordinate system and each preset direction vector, and determining the preset direction vector in which the direction vector of each two adjacent frames of images under the camera coordinate system falls according to the included angle corresponding to each two adjacent frames of images.

8. The method according to claim 7, wherein the sequence of direction vectors is composed of direction vectors of every two adjacent frames of images in the camera coordinate system; the determining the real main direction corresponding to each sub-direction sequence according to the preset direction vector of each two adjacent frames of images under the camera coordinate system comprises the following steps:

segmenting the direction vector sequence based on a time sequence in the direction vector sequence to obtain a plurality of sub-direction sequences;

Determining the times that each sub-direction sequence falls on each preset direction vector according to the preset direction vector in which the direction vector of each adjacent two frames of images falls under a camera coordinate system, and taking the preset direction vector with the largest falling times of each sub-direction sequence as the main direction corresponding to each sub-direction sequence;

9. The method according to claim 6, wherein determining the advancing direction of the virtual camera at the corresponding time of each original video frame according to the real main direction corresponding to each sub-direction sequence comprises:

10. A panoramic video transition rendering device, said device comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image group set, each image group in the image group set is obtained by extracting frames from original video, and each original video frame in the original video is synthesized by multiple paths of fisheye images;

the first determining module is used for determining a plurality of shielding area masks according to at least one image group in the image group set, and determining an image area where each frame of image in the image group set is not shielded according to the shielding area masks;

the third determining module is used for optimizing the translation amount between each two adjacent frames of images according to the matching point pairs between the image areas which are not blocked by each two adjacent frames of images, and determining the advancing direction of the virtual camera at the corresponding moment of each original video frame according to the optimized translation amount between each two adjacent frames of images;

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 9.