CN111340695A

CN111340695A - Super-resolution reconstruction method of dome screen video

Info

Publication number: CN111340695A
Application number: CN202010083678.6A
Authority: CN
Inventors: 秦永进; 李艳玲
Original assignee: Shanghai Zhihuan Software Technology Co ltd
Current assignee: Shanghai Zhihuan Software Technology Co ltd
Priority date: 2020-02-10
Filing date: 2020-02-10
Publication date: 2020-06-26

Abstract

The invention relates to a super-resolution reconstruction method of a spherical screen video, which comprises the following steps: introducing distortion correction parameters into the spherical screen video image to perform 3D correction processing; and (4) extracting the features of the spherical screen video image, and performing super-resolution reconstruction on the extracted features through a wide activation WDSR-B neural network. The jump connection and splicing operation between the shallow layer and the deep layer is introduced, based on a residual SISR network, the simple feature expansion before Relu can be obviously improved on the premise of no additional parameters and computation amount, and the feature expansion before Relu allows more information to pass through, and meanwhile, the high nonlinearity of the deep neural network is still maintained. The low-level SISR characteristics from the shallow layer can be more easily transmitted to the last layer to obtain better dense pixel value prediction, and the technology is applied to the dome cinema, so that the definition of images is improved, and the viewing experience of audiences is better.

Description

Super-resolution reconstruction method of dome screen video

Technical Field

The invention relates to a super-resolution reconstruction technology of a dome screen video image, in particular to a super-resolution reconstruction method of a dome screen video.

Background

With the continuous development of pattern recognition and artificial intelligence, the super-resolution reconstruction technology has high application value in many fields such as video monitoring, medical images, satellite remote sensing and the like. The super-resolution reconstruction technique of images refers to restoring a given low-resolution image into a corresponding high-resolution image through a specific algorithm. The spherical screen video image applied to the spherical screen cinema keeps the scenes except the center of the picture unchanged, and other scenes which are supposed to be horizontal or vertical are correspondingly changed, so that the resolutions of different areas of the imaging surface of the spherical screen lens are different, the closer to the center of the picture, the higher the resolution, the more detailed information, the more deviated from the center of the picture, the lower the resolution, the less detailed information and the more serious deformation are caused.

The former image super-resolution network uses relatively shallow convolutional neural networks, which have poor accuracy, and the increase of depth can more effectively improve the resolution, but does not effectively utilize the characteristic information from the shallow layer, thereby increasing the time complexity.

At present, image processing is directly carried out on a distorted spherical screen video image, but because the spherical screen video image data are stored in a nonlinear mode and cannot be directly processed, the processing mode cannot obtain a good image deblurring effect.

In the hyper-resolution reconstruction algorithm, shallow information is not effectively utilized in the deep layer at present, the training network is very complex, and even a plurality of redundant convolutional layers exist, so that the influence on the hyper-resolution reconstruction effect of only considering a single image is not large, but when the hyper-resolution reconstruction algorithm is applied to a video, the speed is very low, the consumed time is long, and the real-time performance is poor under the condition that the network is complex.

Disclosure of Invention

In order to solve the problem of the definition of a spherical screen video image, the invention provides a hyper-resolution reconstruction method of a spherical screen video.

In order to solve the problems, the invention adopts the following technical scheme:

a method for hyper-resolution reconstruction of a dome screen video comprises the following steps:

s1, extracting an image from a spherical screen video by frames, and introducing distortion correction parameters into the image to perform 3D correction processing;

s2, extracting the characteristics of the corrected image;

and S3, carrying out hyper-resolution reconstruction on the extracted features through a wide-activation WDSR-B neural network.

Preferably, in the 3D correction in S1, a projection plane is set, and a mapping relationship is established between a distance r from a spherical screen pixel point to an optical axis center and a projection plane light incident angle θ'; and calculating a projection plane light ray incidence angle according to the distance meter from the spherical screen pixel point to the center of the optical axis through the mapping relation, and performing back projection on the projection plane light ray incidence angle to the plane to obtain a corrected point.

Preferably, the 3D correction is calculated according to different equations for different angles of incidence:

when the light incidence angle is less than or equal to a first threshold value, an orthogonal model calculation mode is adopted:

r＝fθ′；

when the first threshold value is larger than the light ray incidence angle and smaller than or equal to the second threshold value, an equiangular model calculation mode is adopted:

r＝2·f sinθ′/2；

when the light incidence angle is larger than a second threshold value, adopting an equidistant model calculation mode:

r＝f sinθ′；

f is the distance from the center of the optical axis to the plane of the spherical screen.

Preferably, the S2 performing feature extraction on the image after the rectification processing includes performing feature extraction on brightness, edges, textures and/or colors.

Preferably, the wide-activation WDSR-B neural network comprises a feature extraction module, a residual module, and an output module; the characteristic extraction module adopts the image characteristics extracted by the convolution layer; extracting the characteristics of the potential region by a residual error module; and the output module performs convolution on the extracted features and the residual error output by the residual error module through the convolution layer and then outputs the convolution result.

Preferably, the residual module includes sequentially arranged 1 × 1 convolutional layers for expanding the number of the same tracks, convolutional layers for feature extraction, a Relu activation link for adding a nonlinear feature, 1 × 1 convolutional layer for shallow feature extraction, and two 3 × 3 convolutional layers.

Preferably, the S3 further includes weight normalizing the WDSR-B neural network, the weight normalized parameterized weight vector being:

v is a k-dimensional vector, k being the dimension of the extracted feature; g is a scalar, w | | | is g; i v i is the euclidean norm of v, which will result in g, independent of the parameter v.

The technical scheme of the invention has the following beneficial technical effects:

(1) the jump connection and splicing operation between the shallow layer and the deep layer is introduced, based on a residual SISR network, the simple feature expansion before Relu can be obviously improved on the premise of no additional parameters and computation amount, and the feature expansion before Relu allows more information to pass through, and meanwhile, the high nonlinearity of the deep neural network is still maintained. The low-level SISR characteristics from the shallow layer can be more easily transmitted to the last layer to obtain better dense pixel value prediction, and the technology is applied to the dome cinema, so that the definition of images is improved, and the viewing experience of audiences is better.

(2) And 3D correction processing is carried out on the fisheye image of the spherical screen video, distortion correction parameters are introduced, and then characteristic extraction is carried out to obtain a better image deblurring effect. Reflecting each point of the spherical curtain to a plane, so that pixel holes or discontinuity cannot be formed; the correction method is introduced into the spherical screen video, and the problem that the spherical screen video and the common video are different in spatial special distribution is solved.

Drawings

FIG. 1 is a schematic diagram of a hyper-resolution reconstruction method for a dome screen video;

FIG. 2 is a schematic diagram of a spherical screen video image 3D correction model of a spherical screen video hyper-resolution reconstruction method;

fig. 3 is a flowchart of the hyper-resolution reconstruction of the dome video.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

Referring to fig. 1, a method for super-resolution reconstruction of a dome screen video includes: 3D correction processing of a dome video image and wide activation WDSR-B processing.

Referring to fig. 2, the dome video image 3D correction process:

the spherical screen video image is from a plane video image, and each frame of video image in the video is obtained. Images of certain frames may also be selected. And 3D correction processing is carried out on the image to obtain a corresponding projection mode from the plane to the spherical screen.

Introducing distortion correction parameters into the spherical screen video image to perform 3D correction processing, establishing a 3D correction model, establishing a mapping relation between the distance from a spherical screen pixel point P to an optical axis center 0 and a space object light incident angle, then giving a projection plane 0', and reversely projecting the projection plane along the solved incident angle to obtain a corrected point. And further establishing a mapping relation between each point on the spherical screen and each point on the plane, and calculating projection points on the spherical screen corresponding to each point of the image according to the mapping relation to finish 3D correction. And performing feature extraction on brightness, edges, textures, colors and other data with features of the corrected image.

The 3D correction model is used for establishing a mapping relation between the distance from the spherical screen video image point to the center of the optical axis and the incident angle of the light rays of the space object.

D is a correction model;

r is the distance from the spherical screen pixel point to the center of the optical axis;

theta is an included angle between the incident ray and the plumb line; theta' is an included angle between the incident ray of the projection plane and the plumb line;

psi, psi' are the azimuth angles (radians) before and after correction of the incident ray;

p is a pixel point of the space object light ray on the spherical screen;

p' is the point where the incident ray is back projected onto the correction plane.

In one embodiment, in conjunction with FIG. 2, the 3D correction is calculated using different equations based on different angles of incidence.

r＝fθ′；

r＝2·f sinθ′/2；

r＝f sinθ′；

r is the distance from the spherical screen pixel point to the center of the optical axis; theta' is an included angle between the incident ray and the main shaft; f is the focal length of the spherical screen; the 3D correction is performed using different calculation methods by different incident angles. The first threshold value is, for example, 60 °, and the second threshold value is, for example, 65 °. And adjusting according to the angle of the actual plane video shooting lens.

The azimuth angle ψ' of the incident ray is substantially not distorted in the actual screen video image or is distorted to a negligible extent, i.e., the equation: since ψ 'is always true, the model d can be expressed as θ'd (r). The formula of theta ═ D (r) only contains one parameter of the distance r between the point on the spherical screen video image and the optical axis, and the point of the mapping relation between r and theta' after 3D correction through the formula is located on the projection plane to form a plane image.

For the wide-activation WDSR-B neural network, features are expanded before a ReLU activation layer, and the wide activation effect in a residual block is considered, the most basic mode is to directly increase the number of channels in all the features, namely, the number of channels is increased by using convolution of 3X3, then an activation function is used, and then the number of channels is reduced by using convolution of 3X3, wherein the structure mainly aims at the condition that the expansion multiple is small (the value of the structure is between 2 and 4). When the value is large, the effect is rapidly reduced, and the method is not applicable.

As shown in fig. 1, in order to solve the above limitation, in A WDSR-B neural network, A large convolution kernel in an original WDSR- A neural network is split into two low-rank convolution kernels while keeping the number of equivalent mapping path channels unchanged, so as to save parameters. The WDSR-B neural network comprises a feature extraction module, a residual error module and an output module; the characteristic extraction module adopts the image characteristics extracted by the convolution layer; extracting the characteristics of the potential region by a residual error module; and the output module performs convolution on the extracted features and the residual error output by the residual error module through the convolution layer and then outputs the convolution result. The residual module includes 1 × 1 convolutional layer, nonlinear (Relu) activation, 1 × 1 convolutional layer, two 3 × 3 convolutional layers.

First the number of channels is enlarged with 1 x 1 convolutional layers (Conv), then a non-linear function (Relu) is applied after the convolutional layers, further proposing an efficient linear low-rank convolution, which decomposes a large convolution kernel into two low-rank convolution kernels. The method is a combination of 1 × 1Conv for reducing the number of channels and 3 × 3Conv for extracting spatial features, further increases the number of feature map channels before Relu under the condition of the same parameters, and can improve the resolution ratio without increasing the time complexity so as to make the dome screen video image clearer.

On the premise of not increasing the calculation overhead, the number of filters of the convolution kernel before the Relu activation layer is increased so as to increase the width of the feature mapping.

The residual network of the wide-active WDSR-B effectively reduces the time complexity. And a plurality of redundant convolution layers are removed, the time complexity is reduced, and the real-time performance of the dome screen video is met.

Weight normalization is the reparameterization of weight vectors in neural networks, and the length of those weight vectors and their direction can be decoupled. It has no dependency on the samples in the mini-batch, and has the same formula when training and testing.

When the model reaches a certain depth (180 layers), the model is difficult to train, and therefore weight normalization operation is provided. The method aims to recalculate the distribution of the network in the network, introduce weight normalization to allow training with higher learning rate, and improve the accuracy of training and testing.

The weight normalization algorithm is as follows:

y＝w.x+b (1)

where w is a k-dimensional vector matrix; x is a vector of input features in k dimensions; y is a vector of output features; b is a bias, a parameter for adjusting balance; the k is a dimension;

weight normalization parameterized weight vector algorithm:

where v is a k-dimensional vector, k being the dimension of the extracted feature; g is a scalar quantity, has no direction and only represents the size; if v is the euclidean norm of v, then we will get w, which is independent of the parameter v; w ranges between 0 and 1.

W in the formula (2) affects y in the formula (1), and y represents the output feature vector, so that weight normalization w affects the y output feature vector, and the accuracy of the output feature vector is improved by adding a weight normalization method.

And training the neural network model by using a sample, packaging and packaging after meeting the precision requirement, and performing over-resolution reconstruction after correcting the image.

Referring to fig. 3, the invention provides a method for hyper-resolution reconstruction of a dome video, which is mainly implemented by the following steps:

s2, extracting the characteristics of the corrected image;

And in the process of wide activation of the WDSR-B neural network, weight normalization is adopted to improve the accuracy of the spherical screen video image.

Through the steps, various jump connection and splicing operations between the shallow layer and the deep layer are realized, based on the residual SISR network, the simple feature expansion before Relu can be obviously improved on the premise of no additional parameters and calculation amount, and the feature expansion before Relu allows more information to pass through, and meanwhile, the high nonlinearity of the deep neural network is still maintained. Low-level SISR features from shallow layers can be more easily passed to the last layer for better dense pixel value prediction.

In summary, the present invention relates to a method for super-resolution reconstruction of a dome screen video, including: introducing distortion correction parameters into the spherical screen video image to perform 3D correction processing; and (4) extracting the features of the spherical screen video image, and performing super-resolution reconstruction on the extracted features through a wide activation WDSR-B neural network. The jump connection and splicing operation between the shallow layer and the deep layer is introduced, based on a residual SISR network, the simple feature expansion before Relu can be obviously improved on the premise of no additional parameters and computation amount, and the feature expansion before Relu allows more information to pass through, and meanwhile, the high nonlinearity of the deep neural network is still maintained. The low-level SISR characteristics from the shallow layer can be more easily transmitted to the last layer to obtain better dense pixel value prediction, and the technology is applied to the dome cinema, so that the definition of images is improved, and the viewing experience of audiences is better.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A method for hyper-resolution reconstruction of a dome screen video is characterized by comprising the following steps:

s2, extracting the characteristics of the corrected image;

2. The method for super-resolution reconstruction of dome video according to claim 1, wherein the 3D correction in S1 includes: setting a projection plane, and calculating the distance r from a projection point on the spherical screen to the center of an optical axis through a light incident angle theta' passing through the point on the projection screen; the azimuth angles of the projection points on the spherical screen and the projection points on the projection screen are the same, and then the corresponding projection points on the spherical screen are obtained.

3. The method of claim 2, wherein the 3D correction is calculated according to different formula based on different light incidence angles θ':

r＝fθ′；

r＝2·f sinθ′/2；

r＝f sinθ′；

4. The method for reconstructing the dome screen video according to claim 1 or 2, wherein the S2 performs feature extraction on the rectified image, including performing feature extraction on brightness, edge, texture and/or color.

5. The method of claim 1 or 2, wherein the wide-activation WDSR-B neural network comprises a feature extraction module, a residual error module and an output module; the characteristic extraction module adopts the image characteristics extracted by the convolution layer; extracting the characteristics of the potential region by a residual error module; and the output module performs convolution on the extracted features and the residual error output by the residual error module through the convolution layer and then outputs the convolution result.

6. The method according to claim 5, wherein the residual module comprises sequentially arranged 1 x 1 convolutional layers for increasing the number of the same channel, convolutional layers for feature extraction, Relu activation links for adding nonlinear features, 1 x 1 convolutional layers for shallow feature extraction, and two 3x3 convolutional layers.

7. The method of super-resolution reconstruction of dome video according to claim 1 or 2, wherein said S3 further comprises performing weight normalization on WDSR-B neural network, the weight normalization parameterized weight vector is: