CN112037129A

CN112037129A - Image super-resolution reconstruction method, device, equipment and storage medium

Info

Publication number: CN112037129A
Application number: CN202010873181.4A
Authority: CN
Inventors: 邹超洋
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-12-04
Anticipated expiration: 2040-08-26
Also published as: CN112037129B

Abstract

The embodiment of the application discloses an image super-resolution reconstruction method, a device, equipment and a storage medium, which relate to the field of image processing and comprise the following steps: acquiring N continuous frames of images to be processed in video data, wherein the images to be processed are low-resolution images, N is more than or equal to 2, and the N frames of images to be processed comprise current frames of images to be processed; aligning the N frames of images to be processed to obtain N frames of aligned images; determining a consistent pixel set corresponding to each frame of the aligned image; carrying out consistency processing on the N frames of the aligned images based on the consistency pixel set to obtain N frames of consistent images; and taking the N frames of the consistent images as the input of a neural network model to obtain a super-resolution reconstruction image of the current frame of the image to be processed. By adopting the method, the technical problem that the super-resolution reconstruction effect is easily influenced by mis-registration generated when the super-resolution reconstruction is carried out by utilizing image registration in the prior art can be solved.

Description

Image super-resolution reconstruction method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an image super-resolution reconstruction method, device, equipment and storage medium.

Background

The super-resolution is to improve the resolution of the original image by a hardware or software method, and the process of obtaining a high-resolution image by a series of low-resolution images is super-resolution reconstruction. In the prior art, super-resolution reconstruction is usually performed on each frame of image in video data, but the method does not fully consider information between adjacent frames of images in the video, so that the super-resolution reconstruction effect is limited. In order to effectively utilize information between adjacent frame images, in the prior art, continuous multi-frame images in video data are registered (i.e. image alignment), and then super-resolution reconstruction is performed by utilizing information of sub-pixel misalignment in the registered images. In the process of implementing the invention, the inventor finds that the prior art has the following defects: when a blocked area or a moving area exists in a multi-frame image, the registration process of the multi-frame image is easily subjected to misregistration, which greatly affects the effect of super-resolution reconstruction.

Disclosure of Invention

The embodiment of the application provides an image super-resolution reconstruction method, device, equipment and storage medium, and aims to solve the technical problem that in the prior art, the super-resolution reconstruction effect is easily influenced by mis-registration generated when image registration is used for super-resolution reconstruction.

In a first aspect, an embodiment of the present application provides an image super-resolution reconstruction method, including:

acquiring N continuous frames of images to be processed in video data, wherein the images to be processed are low-resolution images, N is more than or equal to 2, and the N frames of images to be processed comprise current frames of images to be processed;

aligning the N frames of images to be processed to obtain N frames of aligned images;

determining a consistent pixel set corresponding to each frame of the aligned image;

carrying out consistency processing on the N frames of the aligned images based on the consistency pixel set to obtain N frames of consistent images;

and taking the N frames of the consistent images as the input of a neural network model to obtain a super-resolution reconstruction image of the current frame of the image to be processed.

Further, the determining a consistent set of pixels corresponding to the aligned image for each frame includes:

taking each frame of the alignment image as a reference image respectively;

calculating the gray scale distance between the neighborhood pixel block corresponding to each pixel point in the reference image and the neighborhood pixel block corresponding to the pixel point with the same pixel coordinate in each frame of the aligned image;

determining a pixel consistency degree value between pixel points with the same pixel coordinates in the reference image and the corresponding aligned image according to the gray scale distance, and constructing a sub-consistency pixel set between the reference image and the corresponding aligned image according to the pixel consistency degree value;

and combining the sub-consistent pixel sets corresponding to each frame of the reference image to obtain a consistent pixel set of the reference image.

Further, the pixel consistency degree value corresponding to the pixel point with the pixel coordinate (i, j) in the sub-consistency pixel set is recorded as Map_o,n(i,j)，

The N-th frame of aligned image is a reference image, N is more than or equal to 1 and less than or equal to N, D (i, j) represents the gray scale distance between a neighborhood pixel block corresponding to a pixel point with pixel coordinates (i, j) in the reference image and a neighborhood pixel block corresponding to a pixel point with pixel coordinates (i, j) in the o-th frame of aligned image, o is more than or equal to 1 and less than or equal to N, and tau is a set distance threshold.

Further, the consistency processing on the N frames of the aligned images based on the consistency pixel set to obtain N frames of consistent images includes:

aiming at the pixel points of each frame of the aligned image, carrying out weighted average by using the consistent pixel set corresponding to the aligned image to obtain a corresponding de-noised image;

and performing point multiplication on pixel points in each frame of the denoised image and the corresponding sub-consistent pixel set to obtain a corresponding consistent image, wherein the reference image corresponding to the sub-consistent pixel set and the current frame image to be processed are the same frame, and the alignment image corresponding to the sub-consistent pixel set and the denoised image are the same frame.

Further, the pixel value of the pixel point with the pixel coordinate (i, j) in the n-th frame of denoised image is:

wherein the content of the first and second substances,

representing the pixel value of a pixel point with pixel coordinates (i, j) in the N-th frame of denoised image, N is more than or equal to 1 and less than or equal to N, Src_k(i, j) represents the pixel value of the pixel point with the pixel coordinate (i, j) in the k frame alignment image, Map_k,n(i, j) represents the pixel consistency degree value recorded in the sub-consistency pixel set between the n-th frame of aligned image and the k-th frame of aligned image, and the pixel consistency degree value corresponds to the pixel point with the pixel coordinate (i, j) in the n-th frame of aligned image and the k-th frame of aligned image.

Further, the pixel value of the pixel point with the pixel coordinate (i, j) in the n-th frame of the consistent image is:

wherein, I'_n(i, j) represents the pixel value of the pixel point with the pixel coordinate (i, j) in the consistency image of the nth frame, N is more than or equal to 1 and less than or equal to N,

representing the pixel value, Map, of the pixel point with the pixel coordinate (i, j) in the n frame of denoised image_n,p(i, j) represents a pixel consistency degree value recorded in a sub-consistency pixel set between the p-th frame of aligned image and the n-th frame of aligned image, the pixel consistency degree value corresponds to a pixel point with pixel coordinates (i, j) in the p-th frame of aligned image and the n-th frame of aligned image, and the p-th frame of aligned image is an aligned image corresponding to the current frame of image to be processed.

Further, the aligning the N frames of images to be processed to obtain N frames of aligned images includes:

calculating a homography transformation matrix between the current frame image to be processed and other frames of images to be processed;

and carrying out coordinate transformation on each pixel point in the other frames of images to be processed according to the homography transformation matrix so as to obtain an aligned image.

Further, the method also comprises the following steps:

acquiring a high-resolution image set and a low-resolution image set, wherein the high-resolution image set comprises multiple frames of continuous high-resolution training images, each frame of high-resolution training image has a corresponding low-resolution training image, and each low-resolution training image forms the low-resolution image set;

selecting a current frame high-resolution training image from the high-resolution image set as monitoring information, and selecting N frames of low-resolution training images from the low-resolution image set as input of a neural network model to train the neural network model, wherein the N frames of low-resolution training images comprise low-resolution training images corresponding to the current frame high-resolution training images.

Further, the loss function of the neural network model is:

wherein W H is the pixel size of the current frame high resolution training image,

representing the pixel value of the pixel point with the pixel coordinate (i, j) in the high-resolution training image of the current frame,

and (3) representing the pixel value of a pixel point with the pixel coordinate (i, j) in the current frame super-resolution reconstructed image, wherein the current frame super-resolution reconstructed image is an output image obtained after N frames of the low-resolution training images are input into the neural network model.

Further, the acquiring the high resolution image set and the low resolution image set comprises:

acquiring a high-resolution image set;

and downsampling each frame of the high-resolution training image in the high-resolution image set to obtain a corresponding low-resolution training image, and forming the low-resolution training image into a low-resolution image set.

Further, the selecting N frames of low resolution training images in the set of low resolution images as input to a neural network model comprises:

selecting N frames of low resolution training images in the set of low resolution images;

obtaining corresponding N frames of consistency training images based on the N frames of low-resolution training images;

and taking N frames of the consistency training images as the input of the neural network model.

In a second aspect, an embodiment of the present application further provides an image super-resolution reconstruction apparatus, including:

the image acquisition module is used for acquiring N continuous frames of images to be processed in the video data, wherein the images to be processed are low-resolution images, N is more than or equal to 2, and the N frames of images to be processed comprise current frames of images to be processed;

the image alignment module is used for aligning the N frames of images to be processed to obtain N frames of aligned images;

a set determining module, configured to determine a consistent pixel set corresponding to each frame of the aligned image;

the consistency processing module is used for carrying out consistency processing on the aligned images of the N frames based on the consistency pixel set so as to obtain consistent images of the N frames;

and the super-resolution reconstruction module is used for taking the N frames of the consistent images as the input of a neural network model so as to obtain a super-resolution reconstruction image of the current frame of image to be processed.

In a third aspect, an embodiment of the present application further provides an image super-resolution reconstruction apparatus, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the image super-resolution reconstruction method according to the first aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the image super-resolution reconstruction method according to the first aspect.

According to the image super-resolution reconstruction method, the device, the equipment and the storage medium, the technical problem that the super-resolution reconstruction effect is easily influenced by mis-registration generated when super-resolution reconstruction is performed by image registration in the prior art can be solved by acquiring N continuous frames of images to be processed with low resolution in video data, then aligning the N frames of images to be processed to obtain N aligned images, calculating the corresponding consistent pixel set of each frame of aligned image, performing consistent processing on the aligned images based on the consistent pixel set to obtain consistent images, and using the consistent images as the input of a neural network model to obtain the super-resolution reconstructed images. The consistency degree of pixel points between the corresponding aligned image and each frame of aligned image can be determined by calculating the consistency pixel set, and then the consistency images of each aligned image are obtained according to the consistency pixel set, so that the pixel points with low consistency (namely unaligned pixel points) are eliminated from the consistency images, the influence caused by the pixel points with low consistency is reduced during the processing of the neural network model, the influence caused by misregistration caused by the movement, the shielding and the like of objects during the super-resolution reconstruction by using the continuous frame images is effectively improved, and the accuracy of the super-resolution reconstruction result is ensured.

Further, the consistency degree of pixel points between the corresponding aligned image and each frame of aligned image can be determined by calculating the consistency pixel set (the consistency of unaligned pixel points is low, and the consistency of aligned pixel points is high), and the denoising image is obtained by using the consistency degree to perform weighted average on the pixel points in each aligned image, so that time domain denoising is realized, and the influence of noise on the super-resolution reconstruction effect is effectively relieved.

Furthermore, a consistency image is obtained by utilizing the point multiplication result of the denoised image and the corresponding sub-consistency pixel set, so that pixel points with low consistency are eliminated from the consistency image, the influence caused by the pixel points with low consistency is reduced during the processing of the neural network model, and the accuracy of the super-resolution reconstruction result is ensured.

Drawings

Fig. 1 is a flowchart of an image super-resolution reconstruction method according to an embodiment of the present application;

fig. 2 is a flowchart of a super-resolution image reconstruction method according to another embodiment of the present application;

fig. 3 is a data flow diagram of an image super-resolution reconstruction method provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not limitation. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

The image super-resolution reconstruction method provided by the embodiment of the application can be executed by an image super-resolution reconstruction device, the image super-resolution reconstruction device can be realized in a software and/or hardware mode, and the image super-resolution reconstruction device can be formed by two or more physical entities or one physical entity. For example, the image super-resolution reconstruction device may be an intelligent device with data operation and analysis capabilities, such as a computer, a mobile phone, a tablet or an interactive smart tablet.

Fig. 1 is a schematic flow chart of an image super-resolution reconstruction method according to an embodiment of the present application. Referring to fig. 1, the image super-resolution reconstruction method specifically includes:

and 110, acquiring N continuous frames of images to be processed in the video data, wherein the images to be processed are low-resolution images, N is more than or equal to 2, and the N frames of images to be processed comprise the current frame of images to be processed.

Specifically, the video data refers to data that needs super-resolution reconstruction, and the video data includes multiple frames of images, and the resolution of each frame of image is the same and is lower. The source of the video data and the data content are not limited, for example, the video data is video conference data obtained from the internet. In the embodiment, super-resolution reconstruction is performed on continuous frame images contained in the video data, and then the high-resolution video data is formed based on the reconstructed images.

In one embodiment, N consecutive frames of images are captured from the video data, and in this embodiment, the captured images are recorded as images to be processed. It should be noted that, in the embodiment, the N frames of to-be-processed images captured are to-be-processed images in the same scene, optionally, after the to-be-processed images are captured, the similarity between the to-be-processed images is calculated, and whether the to-be-processed images belong to the same scene is determined according to the similarity. It can be understood that each frame of image to be processed in the same scene contains the same or highly similar background, object, etc. Further, if the N frames of images to be processed belong to the same scene, performing subsequent processing, otherwise, not processing the N frames of images to be processed, and re-intercepting the N frames of images to be processed belonging to the same scene.

The image formats of the N frames of images to be processed are the same, and the specific embodiment of the image format is not limited. The continuous N frames of images to be processed refer to the continuous relation of the frames of images to be processed in the time domain. Further, the N frames of images to be processed include a current frame of images to be processed. It can be understood that the embodiment of the intercepting method for the N frames of images to be processed is not limited, for example, the image of the current frame and the multi-frame images before and after the current frame are obtained from the video data, and then the N consecutive images to be processed are obtained. In one embodiment, the image to be processed is a low-resolution image, and the specific resolution embodiment is not limited. Alternatively, the value of N can be set according to actual conditions, and in general, N is more than or equal to 2. In other words, in this embodiment, super-resolution reconstruction is performed on the N consecutive frames of images to be processed, so that it can be ensured that information in adjacent frames of images is considered in the reconstructed images, and further the accuracy of the reconstructed images is ensured.

And step 120, aligning the N frames of images to be processed to obtain N frames of aligned images.

In an embodiment, the alignment process (image alignment) may also be referred to as a registration process (image registration) for converting two frames of images with different shooting angles into images with the same shooting angle, wherein, in the process, the shooting angle of one frame of image is used as a reference angle, and the other frame of image is subjected to coordinate transformation to transform the shooting angle of the other frame of image to the reference angle.

It should be noted that the alignment process may use an existing image alignment method, and in the embodiment, the alignment process performed by the homography transformation matrix is described as an example. Specifically, when the alignment processing is performed, a homography transformation matrix between two frames of images to be processed is calculated first, and then the alignment processing is performed according to the homography transformation matrix. Here, the homographic transformation matrix may be understood as a transformation matrix used for coordinate transformation, and may be referred to as a homographic matrix. Typically, the homography transformation can be simply understood as being used to describe the position mapping relationship of an object between a world coordinate system and a pixel coordinate system, wherein the world coordinate system is an absolute coordinate system, and the pixel coordinate system is a relative coordinate system established based on the image resolution. The transformation matrix used in the homography change is referred to as a homography matrix. The homography transformation matrix may adopt an existing calculation method, for example, pixel coordinates of feature points are first found in two frames of images to be processed, where the feature points in the two frames of images to be processed are in a corresponding relationship, for example, the two frames of images to be processed both contain the same face image, and at this time, the feature points may be pixel points where eyes are located, pixel points where a nose is located, or pixel points where a mouth is located. Alternatively, the feature points may be calculated by a SIFT algorithm or an ORB algorithm, where a Scale-invariant feature transform (SIFT) is a description used in the image processing field, and the feature points may be detected in the image. ORB (organized FAST and rotaed BRIEF) is an algorithm for FAST feature point extraction and description. And then, calculating a transformation matrix required when the pixel coordinates of the characteristic points in one frame of image to be processed are mapped to the pixel coordinates of the characteristic points in the other frame of image to be processed, and recording the transformation matrix as a homography transformation matrix. The pixel coordinate refers to a coordinate located in a pixel coordinate system, and a coordinate point in the pixel coordinate system can be understood as a pixel point. And then, carrying out coordinate transformation on the pixel coordinates of each pixel point in one frame of image to be processed by using the homography transformation matrix, wherein the image to be processed after the coordinate transformation and the other frame of image to be processed are aligned images. In the embodiment, the image to be processed after the coordinate transformation and the image to be processed of another frame are both recorded as the aligned image. It can be understood that, in the embodiment, the current frame to-be-processed image is selected, the homography transformation matrix between the other frame to-be-processed image and the current frame to-be-processed image is calculated, then, the coordinate transformation is performed on the other frame to-be-processed image according to the homography transformation matrix, and then, the other to-be-processed image after the coordinate transformation and the current frame to-be-processed image are recorded as the alignment image together. Namely, the N frames of images to be processed correspond to the N frames of aligned images.

And step 130, determining a consistent pixel set corresponding to each frame of the aligned image.

In consideration of the fact that misaligned pixel points may exist during alignment, for example, an arm of a user in a current frame of image to be processed is in a straight state, and an arm of the user in another frame of image to be processed belongs to a slightly bent state, at this time, after coordinate transformation is performed on the other frame of image to be processed, a pixel coordinate where the arm is located is not completely consistent with a pixel coordinate where the arm is located in the current frame of image to be processed, at this time, misaligned pixel points appear in two frames of aligned images, and this situation can also be understood as a mis-registration situation. Then, when the pixel points which are not aligned in the two aligned images are used for super-resolution reconstruction, the accuracy of the pixel points is affected.

Therefore, in the embodiment, the influence of the above situation on super-resolution reconstruction is reduced in a consistent manner. Specifically, each frame of the aligned image corresponds to one consistent pixel set, and in the embodiment, the aligned image corresponding to the consistent pixel set is regarded as a reference image. The consistency pixel set can represent the consistency degree of pixel points between each frame of aligned image and the reference image, and the higher the consistency degree is, the more similar the pixels between the two frames of images are, that is, the more similar the two frames of images are. In one embodiment, when the consistent pixel set is calculated, a pixel point may be selected from an aligned image (including a reference image), and a pixel point having the same pixel coordinate as the pixel point is selected from the reference image. It should be noted that the two pixel blocks have the same pixel coordinates, and the two pixel blocks are preferably square pixel blocks. Thereafter, the distance between two pixel blocks is calculated, wherein the distance is preferably a gray scale distance. The distance is calculated by, for example, using a euclidean distance to calculate the distance between two pixel blocks. It can be understood that the smaller the distance is, the more similar the pixels contained in the two pixel blocks are, and the higher the consistency between the selected pixel point and the reference pixel point can also be considered. According to the method, the distance between each pixel point and the corresponding reference pixel point in all the aligned images can be obtained. And then marking all pixel points in all the aligned images based on the calculated distance so as to record the consistency between each pixel point and the corresponding reference pixel point. In the embodiment, the pixel points with high consistency are marked by the first character, the pixel points with low consistency are marked by the second character, and the contents of the first character and the second character can be set according to actual conditions. Furthermore, consistency can be distinguished in a mode of setting a distance threshold (specific values can be set according to actual conditions), when the distance is larger than or equal to the set distance threshold, the similarity between the two pixel blocks is low, and further the consistency between the two pixel blocks is low, at this moment, the corresponding pixel points can be determined as pixel points with low consistency and marked by second characters. Otherwise, determining the pixel points with the distance smaller than the distance threshold value as the pixel points with high consistency, and marking the pixel points with the first character. And then combining the marks corresponding to the pixel points into a consistent pixel set corresponding to the current reference image according to the position relation among the pixel coordinates. In the above manner, after each frame of aligned image is used as a reference image, a consistent pixel set corresponding to the frame of aligned image (reference image) can be obtained. The consistent pixel set corresponding to a frame of the aligned image comprises N multiplied by W '× H' characters, and W '× H' is the pixel size of the aligned image. Each character represents the consistency between a corresponding pixel point in the frame of aligned image and a pixel point in the corresponding aligned image.

It can be understood that when the consistent pixel set of each frame of the aligned image is calculated, other frames which are continuous in time are considered, so that the consistency of the same object in each aligned image can be better reflected. For example, when the same object is displayed in each alignment image, the closer the pixel coordinates occupied by the same object are, the more the consistent pixel sets are calculated, the more the pixel points with high consistency are. Therefore, the consistency of the moving object or the shielded object in each frame of aligned image can be accurately identified.

And 140, performing consistency processing on the N frames of aligned images based on the consistency pixel set to obtain N frames of consistent images.

Specifically, a consistency pixel set of a certain frame of aligned image is used for conducting consistency processing on the frame of aligned image, and an image obtained after processing is marked as a consistency image, wherein one frame of aligned image corresponds to one frame of consistency image. In one embodiment, the consistent image eliminates pixel points with low consistency with other aligned images in the corresponding aligned image, so that the influence of the pixel points with low consistency on the reconstruction result is reduced during the subsequent super-resolution reconstruction. In the embodiment, the description is given by taking the example of eliminating the pixel points in the other frame aligned images, which have low consistency with the current frame aligned image.

When consistency processing is performed, dot multiplication can be performed on a pixel point in a certain frame of aligned image and a character (a first character or a second character) used for marking consistency of a pixel point in the frame of aligned image in a consistency pixel set corresponding to a current frame of aligned image, so as to obtain a consistency image. Or, denoising the aligned images to obtain a denoised image corresponding to each frame of aligned image. The denoising method may be set according to actual conditions, for example, selecting a pixel point having the same pixel coordinate in each frame of aligned image, then performing weighted average on the pixel point based on a character used for marking the consistency of the pixel point in a consistency pixel set corresponding to a certain frame of aligned image, and taking a calculation result as a denoised pixel point corresponding to the pixel point. Then, dot multiplication is carried out on pixel points in the denoised image of a certain frame and characters (first characters or second characters) which are used for marking consistency of pixel points in the aligned image of the certain frame (corresponding to the denoised image of the certain frame) in a consistency pixel set corresponding to the aligned image of the current frame, so that a consistency image is obtained. It can be understood that, since the pixel point with low consistency is marked with 0 in the consistency pixel set, when performing point multiplication, the pixel point at the corresponding position is multiplied by the 0 point to obtain 0, thereby eliminating the pixel point (in the consistency image, the pixel value of the pixel point after elimination is 0, and therefore, the pixel point can be represented as a black point). At the moment, pixel points with low consistency are removed from the obtained consistent image of each frame, and further the influence of the pixel points with low consistency on the super-resolution reconstruction result is avoided.

And 150, taking the N frames of consistent images as the input of a neural network model to obtain a super-resolution reconstruction image of the current frame of image to be processed.

Specifically, the currently obtained N frames of consistent images are used as input of a neural network model, so that super-resolution reconstruction is performed on the N frames of consistent images through the neural network model, and then the neural network model outputs a frame of high-resolution image, wherein the high-resolution image can be understood as a super-resolution reconstruction result of the neural network model on the current frame of image to be processed. In one embodiment, the neural network model is a convolutional neural network, and the specific structure (such as the number of convolutional layers, the size of convolutional core, convolutional layer input channel, convolutional layer output channel, etc.) of the convolutional neural network can be set according to actual conditions. In practical applications, the neural network model may also adopt other neural networks (such as a residual neural network).

It can be understood that, according to the above manner, after each frame of image to be processed in the video data is processed, the corresponding high-resolution video data can be obtained.

It should be noted that the calculation of the pixel point in steps 130 to 140 may be understood as the calculation of the pixel value at the pixel point.

The technical means that the super-resolution reconstruction effect is easily influenced by mis-registration generated when super-resolution reconstruction is performed by image registration in the prior art can be solved by acquiring N continuous frames of low-resolution images to be processed in video data, then aligning the N frames of images to be processed to obtain N frames of aligned images, calculating the corresponding consistent pixel set of each frame of aligned image, performing consistent processing on the aligned images based on the consistent pixel set to obtain consistent images, and using the consistent images as input of a neural network model to obtain the super-resolution reconstruction images. The consistency degree of pixel points between the corresponding aligned image and each frame of aligned image can be determined by calculating the consistency pixel set, and then the consistency images of each aligned image are obtained according to the consistency pixel set, so that the pixel points with low consistency (namely unaligned pixel points) are eliminated from the consistency images, the influence caused by the pixel points with low consistency is reduced during the processing of the neural network model, the influence caused by misregistration caused by the movement, the shielding and the like of objects during the super-resolution reconstruction by using the continuous frame images is effectively improved, and the accuracy of the super-resolution reconstruction result is ensured.

Fig. 2 is a flowchart of an image super-resolution reconstruction method according to another embodiment of the present application. The present embodiment is embodied on the basis of the above-described embodiments. In this example. Referring to fig. 2, the image super-resolution reconstruction method specifically includes:

step 210, obtaining continuous N frames of to-be-processed images in the video data, where the to-be-processed images are low-resolution images, N is greater than or equal to 2, and the N frames of to-be-processed images include a current frame of to-be-processed image.

Specifically, in the examples, N ═ 3 is described as an example. At this time, fig. 3 is a data flow diagram of an image super-resolution reconstruction method provided in an embodiment of the present application. Referring to FIG. 3, the N frames of images to be processed are respectively marked as the (N-1) th frame of images to be processed F_n-1The nth frame of image F to be processed_nAnd the (n + 1) th frame to-be-processed image F_n+1And n is more than or equal to 2, and the nth frame of image to be processed is the current frame of image to be processed.

Step 220, calculating a homography transformation matrix between the current frame image to be processed and other frames of images to be processed.

Specifically, a current frame image to be processed, namely an nth frame image to be processed, is selected, and then a homography transformation matrix between the (n-1) th frame image to be processed and the nth frame image to be processed and a homography transformation matrix between the (n + 1) th frame image to be processed and the nth frame image to be processed are calculated. In the embodiment, a homographic transformation matrix between the image to be processed of the n-1 th frame and the image to be processed of the n-th frame is recorded as H_n-1，nRecording the homography transformation matrix between the n +1 th frame to-be-processed image and the n th frame to-be-processed image as H_n，n+1. The calculation method of the homography transformation matrix may refer to the calculation method described in step 120.

And step 230, performing coordinate transformation on each pixel point in the to-be-processed images of other frames according to the homography transformation matrix to obtain an aligned image.

In particular, with reference to FIG. 3, a homography transformation matrix H is utilized_n-1，nImage to be processed F_n-1Performing homography transformation to obtain an aligned image I_n-1Wherein, I_n-1＝warp(F_n-1，H_n-1，n) Using homographic transformation matrix H_n，n+1Image to be processed F_n+1Performing homography transformation to obtain an aligned image I_n+1Wherein, I_n+1＝warp(F_n+1，H_n，n+1). As described above, warp () represents a homography transform. Optionally, when performing the homography transformation, bilinear interpolation may be combined to perform interpolation in the horizontal direction and the vertical direction of the aligned image, so as to obtain the aligned image.

At the same time, directly processing the image F to be processed_n-1As its own aligned image I_n。

And step 240, taking each frame of aligned image as a reference image respectively.

And step 250, calculating the gray scale distance between the neighborhood pixel block corresponding to each pixel point in the reference image and the neighborhood pixel block corresponding to the pixel point with the same pixel coordinate in each frame of aligned image.

Specifically, the images I are respectively aligned_n-1Aligning image I_nAligning image I_n+1Is a reference image. The pixel blocks are set to be m × m pixel blocks, where the value of m can be set according to practical situations, for example, m can be 3, 5, 7, etc. Using reference image as alignment image I_nFor example, the pixel range of the reference image is W '× H', where W 'denotes the pixel range in the horizontal direction and H' denotes the pixel range in the vertical direction. Further, a pixel point is selected from the reference image, the pixel coordinate of the pixel point is marked as (i, j), wherein i is greater than or equal to 0 and less than or equal to W ', j is greater than or equal to 0 and less than or equal to H', further, an m × m pixel block is selected from the reference image by taking the pixel point (i, j) as the center, and at this time, the selected pixel block can be regarded as a neighborhood pixel block of the pixel point (i, j). Then, in aligning the image I_n-1The pixel point (I, j) is also selected and centered on the pixel point (I, j) in the alignment image I_n-1An m x m pixel block, i.e. a neighborhood pixel block, is selected. Then, the distance between two neighboring pixel blocks is calculated, in the embodiment, the distance is a gray scale distance, and the gray scale distance can also be understood as a distance between gray scale images corresponding to the neighboring pixel blocks. Specifically, the gray scale distance is recorded as a reference imageAnd aligning image I_n-1The gray scale distance of the middle pixel point (i, j). It can be understood that the smaller the gray scale distance is, the more similarity between two neighboring pixel blocks is shown, and the higher the consistency between two pixel points is shown. Then, traversing all pixel points in the reference image to obtain the reference image and the aligned image I_n-1The gray scale distance of each pixel point. Thereafter, a reference image and an alignment image I are calculated_nThe gray scale distance of each pixel point, and the reference image and the alignment image I_n+1The gray scale distance of each pixel point. At this time, 3 × W '× H' distances can be obtained based on the current reference image, that is, each pixel in the reference image corresponds to 3 gray distances, and each gray distance corresponds to one frame of aligned image. Thereafter, image I is aligned_n-1As a reference image, the gray scale distance of the pixel points is calculated again, and the image I is aligned_n+1And as a reference image, calculating the gray distance of the pixel point again.

And step 260, determining a pixel consistency degree value between pixel points with the same pixel coordinates in the reference image and the corresponding aligned image according to the gray scale distance, and constructing a sub-consistency pixel set between the reference image and the corresponding aligned image according to the pixel consistency degree value.

Specifically, the sub-consistency pixel set is used for recording consistency of pixel points between a current reference image and a corresponding frame of aligned image, in the embodiment, different values are adopted to represent different consistency, and the value is recorded as a pixel consistency degree value, so that it can be understood that a pixel consistency degree value is large, which indicates that consistency of corresponding pixel points is high, and conversely, indicates that consistency of corresponding pixel points is low. Further, a pixel consistency degree value of the pixel point recorded in the sub-consistency pixel set is determined according to the gray scale distance corresponding to the pixel point. In one embodiment, the pixel consistency degree value corresponding to the pixel point with the pixel coordinate (i, j) in the sub-consistency pixel set is recorded as Map_o,n(i,j)，

The N-th frame alignment image is a reference image, N is greater than or equal to 1 and less than or equal to N, D (i, j) represents the gray scale distance between the neighborhood pixel block corresponding to the pixel point with the pixel coordinate (i, j) in the reference image and the neighborhood pixel block corresponding to the pixel point with the pixel coordinate (i, j) in the o-th frame alignment image (namely the gray scale distance between the pixel point (i, j) in the N-th frame alignment image and the pixel point (i, j) in the o-th frame alignment image), o is greater than or equal to 1 and less than or equal to N, and tau is a set distance threshold. Understandably, Map_o,nRepresenting a set of sub-uniform pixels between the n-th frame of aligned images and the o-th frame of aligned images. Further, τ can be set according to actual conditions, and the larger τ is, the stricter the requirement of consistency is. When D (i, j) < tau, it indicates that the consistency between the pixel point (i, j) in the reference image and the pixel point (i, j) in the o-th frame alignment image is high, so that the pixel consistency degree value corresponding to the pixel point (i, j) is recorded as 1 in the sub-consistency pixel set between the reference image and the o-th frame alignment image, so as to retain the pixel point in the subsequent calculation process. In contrast, when D (i, j) is greater than or equal to τ, it indicates that the consistency between the pixel point (i, j) in the reference image and the pixel point (i, j) in the alignment image of the o-th frame is low, and therefore, the pixel consistency degree value corresponding to the pixel point (i, j) is marked as 0 in the sub-consistency pixel set between the reference image and the alignment image of the o-th frame, so as to eliminate the pixel point in the subsequent calculation process. It can be understood that Map when o ═ n_n,o(i, j) is always 1.

According to the formula, the sub-consistent pixel set between each frame of the aligned image and the reference image can be calculated. It will be appreciated that for any two frames of aligned images, the set of sub-uniform pixels calculated when one frame is the reference image is the same as the set of sub-uniform pixels calculated when the other frame is the reference image, i.e. Map_n,n-1(i,j)＝Map_n-1,n(i, j), therefore, in an embodiment, the set of sub-uniform pixels may be computed only once for the two aligned frames. For example, referring to FIG. 3, for a 3-frame aligned image, aligned image I_nThe corresponding three sub-consistent pixel sets can be respectively denoted as Map_n-1,n、Map_n,n、Map_n,n+1. Alignment ofImage I_n-1The corresponding three sub-consistent pixel sets can be respectively denoted as Map_n-1,n-1、Map_n-1,n、Map_n-1,n+1. Aligning images I_n+1The corresponding three sub-consistent pixel sets can be respectively denoted as Map_n-1,n+1、Map_n,n+1、Map_n+1,n+1. And Map_n,n、Map_n-1,n-1、Map_n+1,n+1All 1's. It is understood that fig. 3 shows the sub-uniform pixel set in black and white, and in practical application, the sub-uniform pixel set is in a matrix form.

And 270, combining the sub-consistent pixel sets corresponding to each frame of reference image to obtain a consistent pixel set of the reference image.

Specifically, sub-consistent pixel sets obtained based on a certain frame of aligned image are combined, and the combined set is used as a consistent pixel set of the frame of aligned image.

For example, aligning image I_nThe corresponding consistent pixel set is denoted as Map_nAnd Map_n＝(Map_n-1,n，Map_n,n，Map_n,n+1) Aligning the image I_n-1The corresponding consistent pixel set is denoted as Map_n-1And Map_n-1＝(Map_n-1,n-1，Map_n-1,n，Map_n-1,n+1) Aligning the image I_n+1The corresponding consistent pixel set is denoted as Map_n+1And Map_n+1＝(Map_n-1,n+1，Map_n,n+1，Map_n+1,n+1)。

And step 280, carrying out weighted average on the pixel points of each frame of the aligned image by using the consistent pixel set corresponding to the aligned image to obtain a corresponding de-noised image.

Specifically, the alignment image is denoised, and during denoising, a corresponding consistency pixel set is referred to, that is, the weight of a pixel point with high consistency is great, and the weight of a pixel point with low consistency is small, and then, according to the pixel point with the same pixel coordinate in each alignment image and the corresponding weight, the pixel value of the pixel point in the denoised image corresponding to the current alignment image (the alignment image corresponding to the currently adopted consistency pixel set) is calculated in a weighted average manner.

In one embodiment, the pixel value of the pixel point with pixel coordinate (i, j) in the n-th frame of denoised image is:

wherein the content of the first and second substances,

representing the pixel value of a pixel point with pixel coordinates (i, j) in the N-th frame of denoised image, N is more than or equal to 1 and less than or equal to N, Src_k(i, j) represents the pixel value of the pixel point with the pixel coordinate (i, j) in the k frame alignment image, Map_k,n(i, j) represents a pixel consistency degree value recorded in a sub-consistency pixel set between the n-th frame alignment image and the k-th frame alignment image, and the pixel consistency degree value corresponds to a pixel point with a pixel coordinate (i, j) in the n-th frame alignment image and the k-th frame alignment image.

Specifically, when Map_k,nWhen (i, j) is 0, the consistency between the pixel point with the pixel coordinate (i, j) in the n-th frame alignment image and the pixel point with the pixel coordinate (i, j) in the k-th frame alignment image is low, so that the pixel point with the pixel coordinate (i, j) in the k-th frame alignment image is multiplied by Map in calculation_k,nAfter (i, j), the pixel point with the pixel coordinate (i, j) in the n frame denoising image is not influenced, and when Map is used_k,nWhen (i, j) is 1, the consistency between the pixel point with the pixel coordinate (i, j) in the n-th frame alignment image and the pixel point with the pixel coordinate (i, j) in the k-th frame alignment image is high, so that the pixel point with the pixel coordinate (i, j) in the k-th frame alignment image is multiplied by Map in calculation_k,nAnd (i, j) influencing a pixel point with the pixel coordinate (i, j) in the n frame of denoised image. And calculating the pixel value of each pixel point in the de-noised image corresponding to the n-th frame of aligned image according to the mode. And further obtaining a denoised image corresponding to the n frame of aligned image.

According to the method, the de-noised image corresponding to each frame of the aligned image can be obtained.

For example, N is 3, and corresponds to the (N-1) th frame, the nth frame, and the (N + 1) th frame, where the pixel value of the pixel point with the pixel coordinate (i, j) in the N-th frame denoised image is represented as:

according to the formula, the information of the pixel points in the continuous frame alignment image is considered by the pixel points in each frame of the de-noised image, so that the de-noising process can be regarded as a time domain de-noising process.

And 290, performing point multiplication on pixel points in each frame of de-noised image and the corresponding sub-consistent pixel set to obtain a corresponding consistent image, wherein the reference image corresponding to the sub-consistent pixel set and the current frame to-be-processed image are the same frame, and the alignment image corresponding to the sub-consistent pixel set and the de-noised image are the same frame.

Wherein each frame of denoised image corresponds to one frame of consistent image. Since the super-resolution reconstructed image of the current frame image to be processed is finally obtained, when the consistent image is calculated, the consistent pixel set corresponding to the current frame aligned image (corresponding to the current frame image to be processed) is used for calculation so as to eliminate the pixel points in each de-noised image, which have low consistency with the current frame de-noised image (corresponding to the current frame image to be processed). Specifically, for a certain frame of denoised image, a sub-consistent pixel set between the frame of denoised image and a current frame of denoised image is obtained, and then, each pixel point in the frame of denoised image is point-multiplied with the obtained consistent pixel set to obtain a consistent image corresponding to the frame of denoised image.

In one embodiment, the pixel value of the pixel point with the pixel coordinate (i, j) in the n-th frame of the consistent image is:

In the above manner, when Map_n,pWhen (i, j) is 0,

and Map_n,pAnd (5) the dot product result of (i, j) is 0, and at the moment, a pixel point (i, j) with low consistency in the alignment image of the nth frame can be removed from the consistency image, so that the accuracy of subsequent neural network model processing is ensured.

For example, referring to FIG. 3, using denoised images

And corresponding sub-consistent pixel set Map_n-1,nDot-multiplied to obtain a consistent image I'_n-1Using de-noised images

And corresponding consistent pixel set Map_n,nDot-multiplied to obtain a consistent image I'_nDe-noised image

And corresponding consistent pixel set Map_n,n+1Dot-multiplied to obtain a consistent image I'_n+1. It can be appreciated that the coincidence in the consistent image

Low-consistency pixel points have been culled, as in FIG. 3I'_n+1The pixel points in the inner user arm area 21 have been rejected due to low consistency.

Step 2100, using the N frames of consistent images as input of a neural network model to obtain a super-resolution reconstructed image of the current frame of image to be processed.

For example, in fig. 3, the neural network model includes a convolution layer conv, a feature combination (convolution result concatenation) layer concat, and a deconvolution layer deconv. After the neural network model processing, a super-resolution reconstruction image corresponding to the current frame image to be processed can be obtained.

In the above, by obtaining the continuous N frames of low-resolution images to be processed in the video data, and then obtaining the N frames of aligned images by adopting the homography transformation, the alignment effect is ensured, and then, determining the consistency degree between corresponding pixel points in any two frames of aligned images by calculating the gray scale distance of a neighborhood pixel block between any two frames of aligned images so as to construct a consistency pixel set, and then, a technical means for carrying out time domain denoising processing on the aligned image to obtain a denoised image, carrying out point multiplication on the denoised image and the corresponding sub-consistent pixel set to obtain a consistent image, using the consistent image as the input of a neural network model to obtain a super-resolution reconstructed image, the method can solve the technical problem that the super-resolution reconstruction effect is easily influenced by mis-registration generated when the super-resolution reconstruction is carried out by utilizing image registration in the prior art. The consistency degree of pixel points between the corresponding aligned image and each frame of aligned image can be determined by calculating the consistency pixel set (the consistency of unaligned pixel points is low, and the consistency of aligned pixel points is high), and the consistency degree is used for obtaining a de-noised image by adopting a weighted average mode for the pixel points in each aligned image, so that time domain noise reduction is realized, and the influence of noise on the super-resolution reconstruction effect is effectively relieved. Furthermore, a consistency image is obtained by utilizing the point multiplication result of the denoised image and the corresponding sub-consistency pixel set, so that pixel points with low consistency are eliminated from the consistency image, the influence caused by the pixel points with low consistency is reduced during the processing of the neural network model, and the accuracy of the super-resolution reconstruction result is ensured.

On the basis of the above embodiment, the method further includes a training step of the neural network model, and the training step specifically includes steps 2110 to 2120:

step 2110, acquiring a high-resolution image set and a low-resolution image set, wherein the high-resolution image set comprises multiple frames of continuous high-resolution training images, each frame of high-resolution training image has a corresponding low-resolution training image, and each low-resolution training image forms the low-resolution image set.

The high-resolution image set comprises a plurality of frames of continuous images, the images have high resolution and are used for training the neural network model, and in the embodiment, the images are recorded as high-resolution training images. Similarly, the low-resolution image set includes a plurality of consecutive images, which have a low resolution and are also used for training the neural network model, and in the embodiment, the images are recorded as low-resolution training images. Further, the high resolution training images and the low resolution training images are in a one-to-one correspondence relationship, that is, for each high resolution training image, there is a low resolution training image with the same display content and with a resolution that is obviously lower.

In one embodiment, the low resolution training images may be derived from corresponding high resolution training images. At this time, acquiring the high resolution image set and the low resolution image set includes steps 2111 to 2112:

step 2111, a high resolution image set is acquired.

Specifically, the high-resolution image set may be obtained by capturing high-resolution video data. Wherein different video data may result in different sets of high resolution images. The source embodiment of the video data is not limited. For example, high resolution video data is captured by a high precision camera.

And step 2112, down-sampling each frame of high-resolution training image in the high-resolution image set to obtain a corresponding low-resolution training image, and forming the low-resolution training images into a low-resolution image set.

Down-sampling may be understood as reducing the image, i.e. reducing the resolution of the image. In the embodiment, a bicubic (bicubic) algorithm is adopted to perform downsampling on each frame of high-resolution training image so as to obtain a corresponding low-resolution training image. And then, sequencing the low-resolution training images according to the sequencing mode of the high-resolution training images to form continuous low-resolution training images, and further obtaining a low-resolution image set.

And step 2120, selecting a current frame high-resolution training image from the high-resolution image set as monitoring information, and selecting N frames of low-resolution training images from the low-resolution image set as input of a neural network model to train the neural network model, wherein the N frames of low-resolution training images comprise low-resolution training images corresponding to the current frame high-resolution training images.

Specifically, in order to ensure the super-resolution reconstruction effect of the neural network model, in the embodiment, before inputting the low-resolution training image into the neural network model, the low-resolution training image may be processed in the manner described in the above embodiment to obtain a corresponding consistent image, and at this time, selecting N frames of low-resolution training images in the low-resolution image set as the input of the neural network model includes steps 2121 to 2123:

and step 2121, selecting N frames of low-resolution training images from the low-resolution image set.

Specifically, the N frames of low-resolution training images include a current frame of low-resolution training image.

And step 2122, obtaining corresponding N frames of consistency training images based on the N frames of low-resolution training images.

Specifically, N frames of low-resolution training images are aligned to obtain corresponding N frames of aligned images, then N consistent pixel sets corresponding to the N frames of aligned images are determined, time-domain denoising is performed on the N frames of aligned images to obtain N frames of denoised images, and the denoised images and corresponding sub-consistent pixel sets are subjected to dot multiplication to obtain N consistent images.

And step 2123, taking the N frames of consistency training images as the input of the neural network model.

Specifically, N frames of consistent training images are used as input, and a high-resolution training image corresponding to a current frame of low-resolution training image is used as supervision information to train the neural network model.

In one embodiment, the loss function is constructed using the norm L1, and in this case, the loss function of the neural network model is:

and (3) representing the pixel value of a pixel point with the pixel coordinate (i, j) in the current frame super-resolution reconstructed image, wherein the current frame super-resolution reconstructed image is an output image obtained after N frames of low-resolution training images are input into the neural network model. Namely, the super-resolution reconstruction effect of the neural network model is determined through the difference between the supervision information and the output image. It should be noted that, in practical applications, the loss function may also take other forms, such as an L2 norm.

It can be understood that in the training process, a large number of N frames of low-resolution training images are used as input, and the corresponding high-resolution training images are used as supervision information to train the neural network model until the loss function converges. The more the loss function is converged, the smaller the difference between the current frame super-resolution reconstructed image and the monitoring information is, namely the higher the similarity between the current frame super-resolution reconstructed image and the monitoring information is, so that the accuracy of the neural network model for super-resolution reconstruction becomes higher.

Fig. 4 is a schematic structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present application. Referring to fig. 4, the image super-resolution reconstruction apparatus includes: an image acquisition module 301, an image alignment module 302, a set determination module 303, a consistency processing module 304, and a super-resolution reconstruction module 305.

The image obtaining module 301 is configured to obtain N continuous frames of to-be-processed images in the video data, where the to-be-processed images are low-resolution images, N is greater than or equal to 2, and the N frames of to-be-processed images include a current frame of to-be-processed image; an image alignment module 302, configured to perform alignment processing on the N frames of images to be processed to obtain N frames of aligned images; a set determining module 303, configured to determine a consistent pixel set corresponding to each frame of the aligned image; a consistency processing module 304, configured to perform consistency processing on the N frames of aligned images based on the consistency pixel set to obtain N frames of consistent images; a super-resolution reconstruction module 305, configured to use the N frames of the consistent images as input of a neural network model, so as to obtain a super-resolution reconstructed image of the current frame of to-be-processed image.

On the basis of the above embodiment, the set determining module 303 includes: a reference image selection unit, configured to take each frame of the aligned image as a reference image; the distance calculation unit is used for calculating the gray scale distance between the neighborhood pixel block corresponding to each pixel point in the reference image and the neighborhood pixel block corresponding to the pixel point with the same pixel coordinate in each frame of the aligned image; the sub-set construction unit is used for determining the pixel consistency degree value between pixel points with the same pixel coordinate in the reference image and the corresponding aligned image according to the gray scale distance and constructing a sub-consistency pixel set between the reference icon and the corresponding aligned image according to the pixel consistency degree value; and the set combination unit is used for combining the sub-consistent pixel sets corresponding to the reference images of each frame to obtain the consistent pixel set of the reference images.

On the basis of the above embodiment, the pixel consistency degree value corresponding to the pixel point with the pixel coordinate (i, j) in the sub-consistency pixel set is recorded as Map_o,n(i,j)，

On the basis of the above embodiment, the consistency processing module 304 includes: the denoising unit is used for carrying out weighted average on the consistent pixel set corresponding to each aligned image aiming at the pixel points of each aligned image so as to obtain a corresponding denoising image; and the dot multiplication unit is used for performing dot multiplication on the pixel points in each frame of the de-noised image and the corresponding sub-consistent pixel set to obtain a corresponding consistent image, the reference image corresponding to the sub-consistent pixel set and the current frame to-be-processed image are the same frame, and the alignment image corresponding to the sub-consistent pixel set and the de-noised image are the same frame.

On the basis of the above embodiment, the pixel value of the pixel point with the pixel coordinate (i, j) in the n-th frame of denoised image is:

wherein the content of the first and second substances,

On the basis of the above embodiment, the pixel value of the pixel point with the pixel coordinate (i, j) in the n-th frame of the consistent image is:

representing the pixel value, Map, of the pixel point with the pixel coordinate (i, j) in the n frame of denoised image_n,p(i, j) represents the pixel consistency degree value recorded in the sub-consistency pixel set between the p frame alignment image and the n frame alignment image, and the pixel consistency degree value and the pixel point with the pixel coordinate (i, j) in the p frame alignment image and the n frame alignment imageCorrespondingly, the p-th frame alignment image is an alignment image corresponding to the current frame image to be processed.

On the basis of the above embodiment, the image alignment module 302 includes: the matrix calculation unit is used for calculating a homography transformation matrix between the current frame image to be processed and other frames of images to be processed; and the coordinate transformation unit is used for carrying out coordinate transformation on each pixel point in the other frames of images to be processed according to the homography transformation matrix so as to obtain an aligned image.

On the basis of the above embodiment, the method further includes: the training set acquisition module is used for acquiring a high-resolution image set and a low-resolution image set, wherein the high-resolution image set comprises a plurality of continuous high-resolution training images, each high-resolution training image has a corresponding low-resolution training image, and each low-resolution training image forms the low-resolution image set; and the model training module is used for selecting a current frame high-resolution training image in the high-resolution image set as monitoring information, and selecting N frames of low-resolution training images in the low-resolution image set as input of a neural network model so as to train the neural network model, wherein the N frames of low-resolution training images comprise low-resolution training images corresponding to the current frame high-resolution training images.

On the basis of the above embodiment, the loss function of the neural network model is:

representing the pixel value of a pixel point with pixel coordinates (i, j) in the high-resolution training image of the current frame,

representing pixel coordinates in a super-resolution reconstructed image of a current frameAnd (e) if the pixel value of the pixel point of (i, j) is obtained, the current frame super-resolution reconstruction image is an output image obtained after the N frames of the low-resolution training image are input into the neural network model.

On the basis of the above embodiment, the training set acquisition module includes: the device comprises a set acquisition unit, a processing unit and a processing unit, wherein the set acquisition unit is used for acquiring a high-resolution image set, and the high-resolution image set comprises a plurality of frames of continuous high-resolution training images; and the set determining unit is used for carrying out down-sampling on each frame of high-resolution training image in the high-resolution image set to obtain a corresponding low-resolution training image, and forming the low-resolution training image into a low-resolution image set.

On the basis of the above embodiment, the model training module includes: a supervision determining unit, configured to select a current frame high-resolution training image in the high-resolution image set as supervision information; an image selection unit, configured to select N frames of low-resolution training images from the low-resolution image set; the image processing unit is used for obtaining corresponding N frames of consistency training images based on the N frames of low-resolution training images; the image input unit is used for taking the N frames of the consistency training images as the input of the neural network model; and the network training unit is used for training the neural network model, and the N frames of low-resolution training images comprise low-resolution training images corresponding to the current frame of high-resolution training images.

The image super-resolution reconstruction device provided by the above can be used for executing the image super-resolution reconstruction method provided by any of the above embodiments, and has corresponding functions and beneficial effects.

It should be noted that, in the embodiment of the image super-resolution reconstruction apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

Fig. 5 is a schematic structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present application. As shown in fig. 5, the image super-resolution reconstruction apparatus includes a processor 40, a memory 41, an input device 42, an output device 43, and a communication module 44; the number of the processors 40 in the image super-resolution reconstruction device can be one or more, and one processor 40 is taken as an example in fig. 5. The processor 40, the memory 41, the input device 42, the output device 43, and the communication module 44 in the image super-resolution reconstruction apparatus may be connected by a bus or other means, and fig. 5 illustrates the connection by the bus as an example.

The memory 41, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image super-resolution reconstruction method in the embodiments of the present invention (e.g., the image acquisition module 301, the image alignment module 302, the set determination module 303, the consistency processing module 304, and the super-resolution reconstruction module 305 in the image super-resolution reconstruction apparatus). The processor 40 executes various functional applications of the image super-resolution reconstruction apparatus and data processing, i.e., implements the image super-resolution reconstruction method described above, by executing software programs, instructions, and modules stored in the memory 41.

The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the image super-resolution reconstruction apparatus, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 41 may further include a memory remotely located from the processor 40, and these remote memories may be connected to the image super-resolution reconstruction device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 42 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the image super-resolution reconstruction apparatus. The output device 43 may include a display device such as a display screen. The communication device 44 may perform data communication, such as acquiring video data, using a network.

The image super-resolution reconstruction device comprises an image super-resolution reconstruction device, can be used for executing any image super-resolution reconstruction method, and has corresponding functions and beneficial effects.

In addition, the embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform operations related to the image super-resolution reconstruction method provided in any embodiment of the present application, and have corresponding functions and advantages.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product.

Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An image super-resolution reconstruction method is characterized by comprising the following steps:

2. The method of claim 1, wherein the determining the consistent pixel set corresponding to each frame of the aligned image comprises:

taking each frame of the alignment image as a reference image respectively;

3. The image super-resolution reconstruction method according to claim 2, wherein the pixel consistency degree value corresponding to the pixel point with the pixel coordinate (i, j) in the sub-consistency pixel set is recorded as Map_o,n(i,j)，

4. The image super-resolution reconstruction method according to claim 2, wherein the consistency processing of the aligned images of N frames based on the consistency pixel set to obtain consistent images of N frames comprises:

5. The image super-resolution reconstruction method according to claim 4, wherein the pixel value of the pixel point with pixel coordinate (i, j) in the n-th frame of the de-noised image is:

wherein the content of the first and second substances,

6. The image super-resolution reconstruction method according to claim 4 or 5, wherein the pixel value of the pixel point with pixel coordinate (i, j) in the n-th frame of the consistent image is:

wherein, I'_n(i, j) represents the pixel value of the pixel point with the pixel coordinate (i, j) in the consistency image of the nth frame,

7. The image super-resolution reconstruction method according to claim 1, wherein the aligning the N frames of the image to be processed to obtain N aligned frames comprises:

8. The image super-resolution reconstruction method according to claim 1, further comprising:

9. The image super-resolution reconstruction method according to claim 8, wherein the loss function of the neural network model is:

10. The image super-resolution reconstruction method according to claim 8, wherein the acquiring the high resolution image set and the low resolution image set comprises:

acquiring a high-resolution image set;

11. The image super-resolution reconstruction method according to claim 8, wherein the selecting N frames of low resolution training images in the low resolution image set as input of a neural network model comprises:

12. An image super-resolution reconstruction apparatus, comprising:

13. An image super-resolution reconstruction apparatus, comprising:

one or more processors

A memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the image super-resolution reconstruction method of any of claims 1-11.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for super-resolution image reconstruction as claimed in one of claims 1 to 11.