CN112825187A

CN112825187A - Super-resolution method, medium and device based on machine learning

Info

Publication number: CN112825187A
Application number: CN201911149461.4A
Authority: CN
Inventors: 何平征; 林金发
Original assignee: Fuzhou Rockchip Electronics Co Ltd
Current assignee: Fuzhou Rockchip Electronics Co Ltd
Priority date: 2019-11-21
Filing date: 2019-11-21
Publication date: 2021-05-21

Abstract

The invention provides a super-resolution method, a medium and a system based on machine learning, wherein the super-resolution method based on machine learning comprises the following steps: acquiring content statistical characteristics of a video image; classifying the video images according to the content statistical characteristics of the video images to obtain at least one group; and selecting a corresponding machine learning model for super-resolution reconstruction according to the grouping of the video images and the noise statistical characteristics of the video images to obtain corresponding high-resolution images. According to the super-resolution method based on machine learning, the removal of compression noise and the reconstruction of the super-resolution of the image can be realized in one step, so that the quality of the image obtained by the super-resolution is improved.

Description

Super-resolution method, medium and device based on machine learning

Technical Field

The invention belongs to the field of image enhancement, relates to a super-resolution method, and particularly relates to a super-resolution method, medium and system based on machine learning.

Background

Due to the limitation of transmission bandwidth, common internet videos generally have smaller resolution and lower bitrate. When these internet videos are displayed on a large screen, ordinary differential amplification of a small-resolution picture may cause problems of edge jaggies, detail blurring, etc., and super-resolution is generally required to reduce high-frequency loss caused by amplification. Super Resolution (SR) refers to the recovery of a high resolution image (HR) from a low resolution image (LR) or a sequence of images. HR means that the image has a high pixel density and can provide more detail that tends to play a critical role in the application. The process of obtaining a high-resolution image through a series of low-resolution images is super-resolution reconstruction. Conventional Super-Resolution algorithms typically include difference-based and learning-based approaches such as SRCNN (Super-Resolution Convolutional Neural Network), DRRN (Deep Recursive Residual Network), SRGAN (Single Image Super-Resolution Using a genetic adaptive Network, Single Image Super-Resolution based on Generative countermeasure networks).

However, low-resolution compressed video generally has image quality problems such as detail blurring, blocking effect, ringing noise, mosquito noise, and the like. If the super-resolution and detail enhancement are directly performed on the picture, the noise can be supplemented as the detail while the high-frequency detail lost due to amplification is supplemented, so that the noise cannot be ideally filtered by a subsequent denoising module. If denoising is performed first, part of details are filtered as noise, so that the subsequent super-resolution is difficult to recover more details. Therefore, the super-resolution effect is not ideal no matter the super-resolution is carried out after the denoising is carried out, or the super-resolution is carried out before the denoising is carried out.

Disclosure of Invention

In view of the above disadvantages of the prior art, an object of the present invention is to provide a super-resolution method, medium, and system based on machine learning, for solving the problem in the prior art that the super-resolution effect is poor whether denoising is performed first and then super-resolution or performed first and then denoising is performed.

To achieve the above and other related objects, the present invention provides a machine learning-based super-resolution method, comprising: acquiring content statistical characteristics of a video image; classifying the video images according to the content statistical characteristics of the video images to obtain at least one group; and selecting a corresponding machine learning model for super-resolution reconstruction according to the grouping of the video images and the noise statistical characteristics of the video images to obtain corresponding high-resolution images.

In an embodiment of the present invention, an implementation method for selecting a corresponding machine learning model to perform super-resolution reconstruction according to a group to which each video image belongs and by combining noise statistical characteristics of the video images to obtain a corresponding high-resolution image includes: carrying out noise estimation on the video image to obtain a corresponding noise level distribution map; channel splicing is carried out on the video image and the corresponding noise level distribution map to obtain a corresponding high-dimensional spliced image; and taking the high-dimensional spliced image as the input of the corresponding machine learning model, wherein the output of the corresponding machine learning model is the corresponding high-resolution image.

In an embodiment of the present invention, the machine learning model corresponding to each group is trained; the training method of the machine learning model comprises the following steps: acquiring multiple groups of data as first training data; each group of data comprises a high-dimensional spliced image corresponding to the low-resolution image with the compressed noise belonging to the group and a corresponding high-resolution image without the compressed noise; and training a machine learning model by using the first training data to obtain the machine learning model corresponding to the group.

In an embodiment of the present invention, an implementation method for classifying the video image according to the content statistical characteristics and the noise level of the video image to obtain at least one packet includes: acquiring signal characteristics and a noise level of the video image; classifying the video images according to the content statistical characteristics to obtain at least one type of images; carrying out secondary classification on various images according to the noise level to obtain at least one subclass; and classifying the subclasses again according to the signal characteristics to obtain at least one group.

In an embodiment of the present invention, an implementation method for selecting a corresponding machine learning model to perform super-resolution reconstruction according to a group to which each video image belongs and by combining noise statistical characteristics of the video images to obtain a corresponding high-resolution image includes: and taking the video image as the input of the corresponding machine learning model, wherein the output of the corresponding machine learning model is the corresponding high-resolution image.

In an embodiment of the present invention, the machine learning model corresponding to each group is trained; the training method of the machine learning model comprises the following steps: acquiring a plurality of groups of data as second training data; each set of data comprises a low resolution image with compression noise and a corresponding high resolution image without compression noise belonging to said packet; and training a machine learning model by using the second training data to obtain the machine learning model corresponding to the group.

In an embodiment of the invention, the machine learning model is a neural network model.

In an embodiment of the invention, the machine learning model is a plurality of filter banks.

The present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the machine-learning-based super-resolution method of the present invention.

The invention also provides a super-resolution system based on machine learning, which comprises: the noise estimation module is used for acquiring a noise level distribution map of the video image; the image splicing module is connected with the noise estimation module and used for carrying out channel splicing on the video image and the noise level distribution map to obtain a corresponding high-dimensional spliced image; the content statistical characteristic acquisition module is used for acquiring the content statistical characteristics of the video image; the classification module is connected with the content statistical characteristic acquisition module and is used for classifying the video images according to the content statistical characteristics of the video images to obtain at least one group; the model switching module is connected with the classification module and used for selecting a corresponding machine learning model according to the group to which the video image belongs and by combining the noise statistical characteristics of the video image; and the machine learning module is respectively connected with the image splicing module and the model switching module and is used for calling the corresponding machine learning model to carry out super-resolution reconstruction on the video image so as to obtain a corresponding high-resolution image.

The invention also provides a super-resolution system based on machine learning, which is characterized by comprising the following components: the noise estimation module is used for acquiring the noise level of the video image; the content statistical characteristic acquisition module is used for acquiring the content statistical characteristics of the video image; the signal characteristic acquisition module is used for acquiring the signal characteristics of the video image; the classification module is respectively connected with the noise estimation module, the content statistical characteristic acquisition module and the signal characteristic acquisition module and is used for classifying the video images according to the noise level, the content statistical characteristic and/or the signal characteristic of the video images to obtain at least one group; the model switching module is connected with the classification module and used for selecting a corresponding machine learning model according to the group to which the video image belongs and by combining the noise statistical characteristics of the video image; and the machine learning module is connected with the model switching module and used for calling the corresponding machine learning model to carry out super-resolution reconstruction on the video image so as to obtain a corresponding high-resolution image.

As described above, the super-resolution method, medium, and system based on machine learning according to the present invention have the following advantages:

in the super-resolution method based on machine learning, denoising and super-resolution are realized simultaneously through a machine learning model, so that the problem that the super-resolution effect is not ideal in the prior art because the super-resolution is carried out after denoising or after denoising is carried out firstly is effectively avoided;

in addition, the two-step operation of denoising and super-resolution is combined into one step, so that the multiplexing of the machine learning model can be realized, the denoising and the super-resolution are not required to be trained and tested respectively, and the operation amount is reduced; meanwhile, self-adaptive balance can be carried out between super-resolution reconstruction for adding high-frequency details and filtering high-frequency ringing noise;

the super-resolution method based on machine learning comprises the steps of estimating the level of compression noise, and taking the compression noise level distribution as the input and classification basis of a machine learning model, so that the robustness of the video super-resolution effect under various image qualities is improved, the problem that a plurality of different model parameters need to be loaded repeatedly during playing is avoided, the storage space required by hardware is reduced, and the playing efficiency is improved;

the super-resolution method based on machine learning takes the statistical characteristics, compression noise and signal characteristics of videos as classification bases to classify the video images, and selects a corresponding machine learning model according to the groups to which the video images belong, so that the method is strong in pertinence, high in accuracy and better in realized image super-resolution effect.

Drawings

FIG. 1 is a flowchart illustrating a super-resolution method based on machine learning according to an embodiment of the present invention.

Fig. 2A is a flowchart of step S13 of the super resolution method based on machine learning according to an embodiment of the present invention.

FIG. 2B shows a video image of the super-resolution method based on machine learning according to an embodiment of the present invention.

FIG. 2C shows a noise level image of the super-resolution method based on machine learning according to an embodiment of the present invention.

FIG. 2D shows an image channel slice of the super-resolution method based on machine learning according to an embodiment of the present invention.

FIG. 2E shows an image channel slice of the super-resolution method based on machine learning according to an embodiment of the present invention.

FIG. 3 is a flowchart illustrating the training of the machine learning model according to the super-resolution method based on machine learning of the present invention.

Fig. 4 is a flowchart of step S12 of the super resolution method based on machine learning according to an embodiment of the present invention.

FIG. 5 is a flowchart illustrating the training of the machine learning model according to the super-resolution method based on machine learning of the present invention.

FIG. 6A is a flowchart illustrating the training of a neural network model according to the super-resolution method based on machine learning of the present invention.

Fig. 6B is a flowchart illustrating super-resolution reconstruction according to the super-resolution method based on machine learning of the present invention.

FIG. 7A is a flowchart illustrating the training of a neural network model according to the super-resolution method based on machine learning of the present invention.

FIG. 7B is a flowchart illustrating super-resolution reconstruction according to an embodiment of the super-resolution method based on machine learning of the present invention.

FIG. 8A is a flowchart illustrating the filter bank training process according to the super-resolution method based on machine learning of the present invention.

Fig. 8B is a flowchart illustrating super-resolution reconstruction according to the super-resolution method based on machine learning of the present invention.

FIG. 9 is a block diagram of a super-resolution system based on machine learning according to an embodiment of the present invention.

FIG. 10 is a block diagram of a super-resolution system based on machine learning according to an embodiment of the present invention.

Description of the element reference numerals

900 super-resolution system based on machine learning

910 noise estimation module

920 image mosaic module

930 content statistical characteristic obtaining module

940 classification module

950 model switching module

960 machine learning module

1000 super-resolution system based on machine learning

1010 noise estimation module

1020 content statistical characteristic acquisition module

1030 signal characteristic acquisition module

1040 classification module

1050 model switching module

1060 machine learning module

S11-S13

S131 to S133

S31-S32

S121 to S124

S51-S52

S61 a-S64 a steps

S61 b-S65 b steps

S71 a-S74 a steps

S71 b-S74 b steps

S81 a-S83 a step

S81 b-S84 b steps

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

The super-resolution is to improve the resolution of the original image by a hardware or software method, and the super-resolution process is a process of adding high-frequency details to the original low-resolution image. The low-resolution compressed video generally has image quality problems of detail blurring, blocking effect, ringing noise, mosquito noise and the like. If the super-resolution and detail enhancement are directly performed on the picture, the noise can be supplemented as the detail while the high-frequency detail lost due to amplification is supplemented, so that the noise cannot be ideally filtered by a subsequent denoising module. If denoising is performed first, part of details are filtered as noise, so that the subsequent super-resolution is difficult to recover more details. Therefore, the super-resolution effect is not ideal no matter the super-resolution is carried out after the denoising is carried out, or the super-resolution is carried out before the denoising is carried out. In addition, internet video usually employs a fixed Bitrate (CBR) transcoding method, and macroblocks with different picture complexity may employ different QPs (Quantization parameters), which may result in various compression noises with different intensities in the same video sequence. Therefore, the compression of the actual online video cannot be fully simulated by one or more global rate parameters.

In view of the above problems, the present invention provides a super-resolution method based on machine learning, which includes: acquiring content statistical characteristics and noise level of a video image; classifying the video images according to the content statistical characteristics and the noise level of the video images to obtain at least one group; and selecting a corresponding machine learning model for super-resolution reconstruction according to the group to which each video image belongs and the noise statistical characteristics of the video images, so as to obtain a corresponding high-resolution image. According to the super-resolution method based on machine learning, the corresponding machine learning model is selected according to the group to which the video image belongs to carry out super-resolution reconstruction, denoising and super-resolution are simultaneously realized through the neural network, and the problem that the super-resolution effect is poor due to the fact that denoising is carried out first and then super-resolution or the super-resolution is carried out first and then denoising in the prior art is solved.

Referring to fig. 1, in an embodiment of the present invention, the super-resolution method based on machine learning includes:

and S11, acquiring the content statistical characteristics of the video image. The video image refers to an image corresponding to a part of or all frames in a segment of video, and is a low-resolution image with compression noise.

S12, classifying the video images according to the content statistical characteristics of the video images to obtain at least one group.

And S13, selecting a corresponding machine learning model for super-resolution reconstruction according to the group to which each video image belongs and the noise statistical characteristics of the video images, and obtaining a corresponding super-resolution reconstruction image. The super-resolution reconstructed image is a high-resolution image without compressed noise.

Machine Learning (ML) is a multi-domain interdisciplinary subject, and is mainly used for studying how a computer simulates or realizes human Learning behaviors to acquire new knowledge or skills and reorganize an existing knowledge structure to continuously improve the performance of the computer. Common machine learning models include linear models, kernel methods and support vector machines, decision tree models, neural network models, filter banks, and the like. Inside machine learning, the whole machine learning establishment and operation mechanism comprises the following steps: the method comprises the steps of collecting data, preparing the data (including dividing the data into training data and evaluation data), selecting a model, training the model, evaluating the model, finely adjusting parameters and actually applying, wherein the selection and the training of the model are crucial links in the whole machine learning process.

Referring to fig. 2A, in an embodiment of the present invention, an implementation method for selecting a corresponding machine learning model to perform super-resolution reconstruction according to a group to which each video image belongs and by combining noise statistical characteristics of the video images to obtain a corresponding high-resolution image includes:

s131, carrying out noise estimation on the video image to obtain a corresponding noise level distribution map; the noise is mainly compression noise. Noise estimation is carried out on each pixel point in the video image to obtain the noise level of each pixel point, and the noise level distribution diagram corresponding to the video image can be obtained according to the noise level of each pixel point. In practical application, the video image may be divided into a plurality of n × n coding blocks, and noise estimation is performed on each coding block to obtain noise levels corresponding to different coding blocks; and obtaining a noise level distribution diagram corresponding to the whole video image according to the noise level corresponding to each coding block. The value of n is determined by the video coding standard, and is, for example: 8. 16, 32.

The noise level distribution map is a gray scale map, and the gray scale value of a pixel point on the gray scale map is obtained through quantization, so that the value space of the gray scale value of each point on the noise level distribution map is limited, for example: the gray value space corresponding to 64 noise levels is {0,4,8, … …, 252 }. The gray value of each pixel point in the noise level distribution map depends on the noise level of the point.

The image can be stored as a three-dimensional matrix, and three dimensions of the three-dimensional matrix are (width, height, channel), where width is the width of the image, height is the height of the image, and channel is the channel of the image. In this embodiment, for the video image channel 3, R channel, G channel and B channel are respectively corresponding; the noise level distribution graph is a gray scale graph, and the channel of the gray scale graph is 1. And S132, carrying out channel splicing on the video image and the corresponding noise level distribution map to obtain a corresponding high-dimensional spliced image.

In the conventional method, only spatial information of a noisy image is generally analyzed, and in the embodiment, the video image is measured from two dimensions of original image spatial information and noise level auxiliary information, wherein the noise level auxiliary information is represented by gray levels of pixel points in the noise level distribution graph. In this embodiment, the video image and the noise level distribution map with the same resolution are subjected to channel splicing to obtain a corresponding high-dimensional spliced image, where the high-dimensional spliced image carries information such as original pixels and noise level of the video image. As described above, both the video image and the noise level image can be stored by using a three-dimensional matrix, and for the video image and the noise level image with the same height and width, the video image and the noise level image are spliced in the dimension of the channel, so that channel splicing can be realized and a corresponding high-dimensional spliced image can be obtained. And S133, taking the high-dimensional spliced image and the video image as the input of the corresponding machine learning model, wherein the output of the corresponding machine learning model is the corresponding high-resolution image.

In this embodiment, different groups correspond to different machine learning models. In order to realize super-resolution reconstruction, a machine learning model corresponding to a group to which the video image belongs is selected according to the group, and the high-dimensional stitched image and the video image are processed by using the machine learning model, so that a noise-free high-resolution image corresponding to the video image is obtained.

In this embodiment, the noise level distribution map and the video image are subjected to channel splicing in a third dimension to obtain a corresponding high-dimensional spliced image, so that the high-dimensional spliced image can sufficiently reflect noise levels of different macro blocks, thereby simulating a compression condition of an actual online video.

In an embodiment of the invention, please refer to fig. 2B, which is a diagram illustrating an example of a video image obtained in the embodiment. And (3) a channel in a three-dimensional matrix corresponding to the video image, namely: the video image comprises three channels of R/G/B. Please refer to fig. 2C, which shows a noise level distribution graph corresponding to the video image. The noise level distribution graph is a gray scale graph and can be represented by only one channel value, and the channel in the corresponding three-dimensional matrix is 1, namely: the noise level profile contains only one noise channel. Referring to fig. 2D, three channels R/G/B of the video image and one channel of the noise level distribution map are shown, wherein the first three slices correspond to the R channel, the G channel, and the B channel, respectively. And performing channel splicing on the R channel, the G channel and the B channel of the video image and one channel of the noise level distribution map to obtain a four-channel image, wherein a channel in a three-dimensional matrix corresponding to the four-channel image is 4, namely: the four-channel image includes an R channel, a G channel, a B channel, and a noise channel.

In another embodiment of the present invention, the video image is transferred to a Y-Cb-Cr space, and since human eyes are more sensitive to luminance information Y than chrominance information Cb/Cr, in order to improve the operation efficiency, only the Y channel may be amplified based on machine learning, and Cb/Cr employs a fast amplification algorithm based on conventional interpolation. The video image input by the model only has one brightness channel, and accordingly, the channel is 1. Referring to fig. 2E, it is shown that a luminance channel of the video image and one channel of the noise level distribution map are channel-spliced to obtain a two-channel image, where a channel in a three-dimensional matrix corresponding to the two-channel image is 2, that is: the two-channel image includes a luminance channel and a noise channel.

In an embodiment of the present invention, the noise level distribution map is obtained by a quantization parameter map (QP map) output by a decoder, and the larger the quantization parameter, the larger the corresponding compression noise. Therefore, the compression noise can be evaluated according to the size of the quantization parameter displayed in the quantization parameter map. The quantization parameter map can be directly obtained by using the existing decoder, and the detailed description thereof is omitted here.

In an embodiment of the invention, the noise level distribution map is obtained by statistically decoding local features of the image. For example: the compression noise level may be estimated using information such as the mean, variance, dynamic range, etc. within a local window or coding block.

In an embodiment of the invention, the noise level distribution map is obtained by a machine learning method. For example, another neural network is trained by using images with different noise levels and noise levels corresponding to the images as training data, and the trained neural network is used to realize compression noise estimation and image quality prediction.

Referring to fig. 3, in an embodiment of the present invention, the machine learning model corresponding to each group is trained; the training method of the machine learning model comprises the following steps:

s31, acquiring multiple groups of data as first training data; for any group of data, it contains a high-dimensional mosaic image corresponding to the low-resolution image with compression noise belonging to the group and a high-resolution image without compression noise corresponding to the low-resolution image with compression noise. The low-resolution image and the corresponding high-resolution image may be obtained from an external database, and the corresponding high-dimensional stitched image may be obtained in step S132.

In this embodiment, each packet corresponds to a set of first training data. For any packet, the content statistics and noise level of all the low resolution images in the corresponding first training data match the packet, that is: all the low resolution images in the first training data belong to the group according to the grouping manner of step S11 and step S12.

And S32, training a machine learning model by using the first training data to obtain the machine learning model corresponding to the group.

In this embodiment, each group corresponds to a set of first training data and to a machine learning model. And training a machine learning model by utilizing the first training data to obtain the machine learning model corresponding to the group. After training is finished, the machine learning model corresponding to each group can simultaneously realize super-resolution reconstruction and compressed noise removal on the low-resolution images in the group, and the problem that the super-resolution effect is poor due to the fact that denoising is carried out before super-resolution or is carried out before super-resolution in the prior art is solved.

In an embodiment of the present invention, an implementation method for selecting a corresponding machine learning model to perform super-resolution reconstruction according to a group to which each video image belongs and by combining noise statistical characteristics of the video images to obtain a corresponding high-resolution image includes: carrying out noise estimation on the video image to obtain a corresponding noise level distribution map; and taking the video image and the corresponding noise level distribution map as the input of the corresponding machine learning model, wherein the output of the corresponding machine learning model is the corresponding high-resolution image.

In this embodiment, the machine learning model corresponding to each group is trained; the training method of the machine learning model comprises the following steps: acquiring a plurality of groups of data as third training data; each group of data comprises a low-resolution image with compressed noise belonging to the group, a noise level distribution graph corresponding to the low-resolution image with compressed noise and a corresponding high-resolution image without compressed noise; and training a machine learning model by using the third training data to obtain the machine learning model corresponding to the group. The low resolution image and its corresponding high resolution image may be obtained from an external database and the corresponding noise level profile may be obtained by step S131.

Referring to fig. 4, in an embodiment of the present invention, an implementation method for classifying the video image according to the content statistical characteristic of the video image to obtain at least one packet includes:

s121, acquiring signal characteristics and noise level of the video image; signal characteristics of the video image, such as spatial local characteristics of pixel points, are used for reflecting the local characteristics of the video image;

and S122, classifying the video images according to the content statistical characteristics to obtain at least one type of images.

Due to the diversity of the played video content, the videos need to be classified in advance according to the content statistical characteristics according to the statistical characteristics of the videos on the premise of not improving the complexity of the model. In this embodiment, the content statistical characteristics may be values of the following parameters: the information amount, the average value of the gradations, the mode of the gradations, the median value of the gradations, the gradient, and/or the variance of the image may be the maximum value and/or the average value of the above parameters, or the like. Video images can be divided into a number of categories according to the above statistical properties, for example: compared with a Natural Scene Image (NSI), a Computer Graphic Image (CGI) has the characteristics of more noise-free flat areas, sharper characters, edges and the like; the flatness degree of the image area can be characterized by the maximum value and the average value of the variance, and the sharpness degree and the edge characteristic of the characters can be characterized by the gradient; therefore, video images can be classified into natural scene images and computer graphic images according to their variance and gradient characteristics. In practical application, the video images can be classified into buildings, human faces, natural scenes and the like according to the specific content statistical characteristics of the video.

Particularly, if the content statistical characteristics of all video images are the same or have a small difference, the video images can be classified into a type of image according to the content statistical characteristics according to actual requirements.

S123, carrying out secondary classification on each type of image according to the noise level to obtain at least one subclass;

information loss due to video compression is prone to blocking effects, ringing noise, mosquito noise, etc. in the image. In this embodiment, the noise level of the video image is taken as the basis of the grouping process, specifically: and classifying the classified video images again according to the noise level of the video images to obtain at least one group. For example, if the video image is divided into two types of images, i.e., CGI and NSI in step S122, the CGI image and the NSI image are secondarily classified according to the noise level in step S123, so as to obtain 4 sub-types of high-noise CGI image, low-noise CGI image, high-noise NSI image, and low-noise NSI image.

In particular, when the noise levels of all video images contained in a class of images are not very different, all the images in the class of images can be regarded as a sub-class according to actual requirements.

Preferably, the same number of sub-classes are divided for each class of images.

And S124, reclassifying the subclasses according to the signal characteristics to obtain at least one group.

In this step, signal features are extracted by performing spatial domain or frequency domain analysis on the video images to facilitate reclassification of the images in each sub-class. For example: the pixels can be divided into groups such as flat, weak texture, edge and the like by judging the local gradient characteristics, the brightness variance, the frequency bandwidth and the like of the pixels; the edge direction information of the image can also be obtained through the analysis of the signal characteristics. For another example, if the image is divided into 4 sub-classes of the high-noise CGI image, the low-noise CGI image, the high-noise NSI image, and the low-noise NSI image in step S123, in this step, the sub-classes may be classified again according to the signal characteristics to obtain 8 groups of the flat high-noise CGI image, the weak texture high-noise CGI image, the flat low-noise CGI image, the weak texture low-noise CGI image, the flat high-noise NSI image, the weak texture high-noise NSI image, the flat low-noise NSI image, and the weak texture low-noise NSI image.

Preferably, each sub-class is divided into the same number of groups.

Referring to fig. 5, in an embodiment of the present invention, the machine learning model corresponding to each group is trained; the training method of the machine learning model comprises the following steps:

s51, acquiring multiple groups of data as second training data; each set of data comprises a low resolution image with compression noise and a corresponding high resolution image without compression noise belonging to said packet. The low-resolution image and the high-resolution image corresponding to the low-resolution image can be obtained from an external database, and each group of data comprises an LR graph-HR graph mapping relation. In this embodiment, each packet corresponds to a set of second training data. For any packet, the content statistics and noise level of all LR images in its corresponding second training data match the packet, i.e.: all LR images in the second training data belong to the group according to the grouping manner of step S11 and step S12.

And S52, training a machine learning model by using the second training data to obtain the machine learning model corresponding to the group.

In the present embodiment, each group corresponds to a machine learning model. And training a machine learning model by using the second training data to obtain the machine learning model corresponding to the group. After training is finished, the machine learning model corresponding to each group can simultaneously realize super-resolution reconstruction and compressed noise removal on the low-resolution images in the group, and the problem that the super-resolution effect is poor due to the fact that denoising is carried out before super-resolution or is carried out before super-resolution in the prior art is solved.

The neural network is an operational model, which is formed by connecting a large number of nodes (or called neurons) with each other, wherein each node represents a specific output function, which is called an excitation function; every connection between two nodes represents a weighted value, called weight, for the signal passing through the connection; the output of the network is different according to the connection mode of the network, the weight value and the excitation function. The neural network can form a complete network structure through proper training, and then for any input, a corresponding output can be obtained through the network structure.

Preferably, the neural Network model used in this embodiment is a CARN (Cascading Residual Network) or an RCAN (Residual Channel Attention Network).

Referring to fig. 6A, in an embodiment of the invention, the step of training the neural network includes:

s61a, obtaining the content statistical characteristics of the video images and classifying the video images according to the content statistical characteristics to obtain a plurality of groups; the content statistics such as gradient, variance, etc.;

s62a, carrying out noise evaluation on the video image to obtain a corresponding noise level distribution map; the noise level distribution graph can be obtained through QP output by a decoder, can also be obtained through local characteristics of a statistical decoding image, and can also be obtained through a neural network;

s63a, channel splicing is carried out on the noise level distribution map and the video image to obtain a corresponding high-dimensional spliced image;

and S64a, for each group, training a neural network model by using the high-dimensional spliced image and the corresponding high-resolution image to obtain the neural network model corresponding to the group. The training of the neural network may be obtained by existing training methods, such as: and training the neural network is realized through a back propagation training model.

Referring to fig. 6B, in the present embodiment, a method for performing super-resolution reconstruction on any LR image with compression noise includes:

s61b, obtaining the content statistical characteristics of the LR images and determining the group to which the LR images belong according to the content statistical characteristics;

s62b, carrying out noise evaluation on the LR image and obtaining a corresponding noise level distribution map;

s63b, channel splicing is carried out on the noise level distribution map and the LR image to obtain a corresponding high-dimensional spliced image;

s64b, selecting a corresponding neural network model according to the group to which the LR image belongs;

s65b, the high-dimensional stitched image corresponding to the LR image is input to the neural network model selected in step S64b, and the output of the neural network model is the high-resolution image corresponding to the LR image.

Preferably, the neural network is CARN, RCAN, etc. Experiments show that by adopting proper training data, the neural network structure not only can optimize the super-resolution effect, but also can remove compression noise. In this embodiment, the number of neural networks used in the super-resolution method based on machine learning is N, where N is the number of groups obtained by classifying video images according to content statistical characteristics.

Referring to fig. 7A, in an embodiment of the invention, the step of training the neural network includes:

s71a, acquiring content statistical characteristics, noise level and signal characteristics of the video image;

s72a, classifying the video images according to the content statistical characteristics, the noise level and the signal characteristics of the video images to obtain at least one group;

and S73a, for each group, training a neural network model by using the video images in the group and the corresponding high-resolution images thereof to obtain the neural network model corresponding to the group. The training of the neural network may be obtained by existing training methods, such as: the training of the neural network is realized through a back propagation training model, and the video images and the corresponding high-resolution images can be obtained from an external database.

Referring to fig. 7B, in the present embodiment, a method for performing super-resolution reconstruction on any LR image with compression noise includes:

s71b, acquiring content statistical characteristics, noise level and signal characteristics of the LR image; the noise level may be characterized by a noise level, for example: the noise may be divided into 0-8 stages;

s72b, determining the group to which the LR image belongs according to the content statistical characteristics, the noise level and the signal characteristics of the LR image;

s73b, selecting a corresponding neural network model according to the group to which the LR image belongs;

s74b, the LR image is input to the neural network model selected in step S73b, and the output of the neural network model is the high-resolution image corresponding to the LR image.

According to the image mosaic method, the LR image and the noise grade distribution map are not needed to be spliced in the third dimension, and the operation of image mosaic processing is reduced. However, in this embodiment, the LR images need to be classified according to the content statistical characteristics, the noise level, and the signal characteristics of the LR images, and there are many obtained groups, and each group corresponds to one neural network model, so that the algorithm complexity in this embodiment is increased, and the operation speed is relatively slow.

In this embodiment, the global feature, the noise level, and the local feature (pixel-level feature) of the video image are taken into consideration for the classification of the video image, so that the classification result can sufficiently reflect the compression noise level of each macroblock, thereby simulating the compression condition of the actual online video.

Referring to fig. 8A, in an embodiment of the invention, the step of training the filter bank includes:

s81a, acquiring content statistical characteristics, noise level and signal characteristics of the video image; the noise level may be characterized by a noise level, for example: the noise may be divided into 0-8 stages;

s82a, classifying the video images according to the content statistical characteristics, the noise level and the signal characteristics of the video images to obtain at least one group;

s83a, for each group, training a plurality of filter banks using the video images in the group and the corresponding high resolution images to obtain a filter bank model corresponding to the group.

Referring to fig. 8B, in the present embodiment, a method for performing super-resolution reconstruction on any LR image with compression noise includes:

s81b, acquiring content statistical characteristics, noise level and signal characteristics of the LR image;

s82b, determining the group to which the LR image belongs according to the content statistical characteristics, the noise level and the signal characteristics of the LR image;

s83b, selecting a corresponding filter bank model according to the group to which the LR image belongs;

s84b, the LR image is input to the filterbank model selected in step S83b, and the output of the filterbank model is the high resolution image corresponding to the LR image.

Compared with a neural network model, the filter bank model is simpler in structure and faster in training and operation speed. However, in this embodiment, the video images are classified while considering the content statistical characteristics, the noise level, and the signal characteristics, so that the classification is finer and more packets are obtained after classification, and thus more filter packets need to be stored in this embodiment. In addition, since the grouping in the present embodiment takes into account the local characteristics (noise level and signal characteristics) of the video images, the adjacent pixels in the same video image may be located in different groups, so that the adjacent pixels may need to adopt different filter models. Switching between different filter models is required when neighboring pixels use different models, which may result in picture discontinuities.

Referring to fig. 9, the present invention further provides a super-resolution system based on machine learning, where the super-resolution system 900 based on machine learning includes:

a noise estimation module 910, configured to implement step S131, that is, perform noise estimation on the video image to obtain a corresponding noise level distribution map;

an image stitching module 920, connected to the noise estimation module 910, configured to implement step S132, that is: channel splicing is carried out on the video image and the corresponding noise level distribution map to obtain a corresponding high-dimensional spliced image;

the content statistical characteristic obtaining module 930 is configured to implement step S11, that is: acquiring content statistical characteristics of the video image;

a classifying module 940, connected to the content statistical characteristic obtaining module 930, configured to classify the video image according to the content statistical characteristic of the video image, so as to obtain at least one group;

a model switching module 950, connected to the classifying module 940, configured to select a corresponding machine learning model according to the group to which the video image belongs and according to the noise statistical characteristic of the video image in step S13;

and a machine learning module 960, connected to the image stitching module 920 and the model switching module 950, respectively, and configured to invoke the corresponding machine learning model to perform super-resolution reconstruction on the video image in step S13, so as to obtain a corresponding high-resolution image.

Referring to fig. 10, the present invention further provides a super-resolution system based on machine learning, wherein the super-resolution system 1000 based on machine learning comprises:

a noise estimation module 1010, configured to implement the obtaining of the noise level of the video image in step S121;

a content statistical characteristic obtaining module 1020, configured to implement step S11, that is: acquiring content statistical characteristics of a video image;

a signal feature obtaining module 1030, configured to obtain a signal feature of the video image in S121;

a classification module 1040, connected to the noise estimation module 1010, the content statistical characteristic obtaining module 1020, and the signal feature obtaining module 1030, respectively, and configured to implement step S12, classify the video image according to the noise level, the content statistical characteristic, and/or the signal feature of the video image, and obtain at least one group;

a model switching module 1050 connected to the classifying module 1050, configured to implement that, in step S13, a corresponding machine learning model is selected according to the group to which the video image belongs and in combination with the noise statistical characteristic of the video image;

and the machine learning module 1060, connected to the model switching module 1060, configured to invoke the corresponding machine learning model to perform super-resolution reconstruction on the video image in step S13, so as to obtain a corresponding high-resolution image.

The protection scope of the super-resolution method based on machine learning according to the present invention is not limited to the execution sequence of the steps listed in the embodiment, and all the solutions implemented by adding, subtracting, and replacing the steps in the prior art according to the principles of the present invention are included in the protection scope of the present invention.

The present invention also provides a super-resolution method system based on machine learning, which can implement the super-resolution method based on machine learning of the present invention, but the implementation apparatus of the super-resolution method based on machine learning of the present invention includes, but is not limited to, the structure of the super-resolution system based on machine learning as exemplified in this embodiment, and all the structural modifications and substitutions of the prior art made according to the principles of the present invention are included in the scope of the present invention.

In the invention, the super-resolution reconstruction and the decompression noise are combined together by adopting a machine learning model, the training learning and the testing of the super-resolution reconstruction and the decompression noise are not required, the multiplexing of a neural network structure or a filter bank can be realized, and the operation amount is reduced;

the method can realize super-resolution reconstruction and decompression noise at the same time, and compared with the traditional super-resolution method, the method has the advantages that the obtained image quality is obviously improved, aliasing and edge saw teeth caused by image amplification are eliminated, ringing noise caused by video compression quantization can be inhibited, and noise deterioration caused by image amplification is reduced;

in the invention, the super-resolution reconstruction and the decompression noise are realized simultaneously by adopting machine learning, and the self-adaptive balance can be carried out between the super-resolution reconstruction of adding high-frequency details and the filtering of high-frequency ringing noise by adjusting corresponding training data;

according to the method, the video images are classified according to the statistical characteristics of the images and the content statistical characteristics, corresponding machine learning models are selected according to the classification results, and the image quality of the output noise-free HR image is higher by respectively establishing the machine learning models for the video images with different contents and performing super-resolution reconstruction;

in the invention, the video images are classified, the compression noise level and the local signal characteristics of the video images are also considered, and the weight of a neural network or the coefficient of a filter bank is determined in an AI learning mode, so that the robustness of the video super-resolution under various image qualities is improved, the problem of repeatedly loading a plurality of different network models or filter bank models during playing can be avoided, the storage space required by hardware is reduced, and the playing efficiency is improved;

in the present invention, three ways can be used for the compression noise estimation, including: obtained from the QP map output by the decoder; obtaining by counting local features of the decoded image; by learning training images with different noise levels, the neural network is utilized to realize compressed noise estimation and image quality prediction; in practical application, a corresponding compression noise estimation mode can be selected according to specific situations, and the flexibility is high.

In conclusion, the present invention effectively overcomes various disadvantages of the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A super-resolution method based on machine learning is characterized in that the super-resolution method based on machine learning comprises the following steps:

acquiring content statistical characteristics of a video image;

classifying the video images according to the content statistical characteristics of the video images to obtain at least one group;

and selecting a corresponding machine learning model for super-resolution reconstruction according to the grouping of the video images and the noise statistical characteristics of the video images to obtain corresponding high-resolution images.

2. The super-resolution method based on machine learning of claim 1, wherein an implementation method for obtaining a corresponding high-resolution image by selecting a corresponding machine learning model for super-resolution reconstruction according to a group to which each video image belongs and by combining noise statistical characteristics of the video images comprises:

carrying out noise estimation on the video image to obtain a corresponding noise level distribution map;

channel splicing is carried out on the video image and the corresponding noise level distribution map to obtain a corresponding high-dimensional spliced image;

and taking the high-dimensional spliced image as the input of the corresponding machine learning model, wherein the output of the corresponding machine learning model is the corresponding high-resolution image.

3. The machine learning-based super-resolution method of claim 2, wherein:

training a corresponding machine learning model of each group; the training method of the machine learning model comprises the following steps:

acquiring multiple groups of data as first training data; each group of data comprises a high-dimensional spliced image corresponding to the low-resolution image with the compressed noise belonging to the group and a corresponding high-resolution image without the compressed noise;

and training a machine learning model by using the first training data to obtain the machine learning model corresponding to the group.

4. The super-resolution method based on machine learning of claim 1, wherein the video images are classified according to their content statistical characteristics, and one implementation method for obtaining at least one group comprises:

acquiring signal characteristics and a noise level of the video image;

classifying the video images according to the content statistical characteristics to obtain at least one type of images;

carrying out secondary classification on various images according to the noise level to obtain at least one subclass;

and classifying the subclasses again according to the signal characteristics to obtain at least one group.

5. The super-resolution method based on machine learning of claim 4, wherein an implementation method for selecting a corresponding machine learning model for super-resolution reconstruction according to the grouping to which each video image belongs and the noise statistical characteristics of the video images to obtain the corresponding high-resolution image comprises:

and taking the video image as the input of the corresponding machine learning model, wherein the output of the corresponding machine learning model is the corresponding high-resolution image.

6. The machine learning-based super-resolution method of claim 5, wherein:

acquiring a plurality of groups of data as second training data; each set of data comprises a low resolution image with compression noise and a corresponding high resolution image without compression noise belonging to said packet;

and training a machine learning model by using the second training data to obtain the machine learning model corresponding to the group.

7. The machine learning-based super resolution method according to any one of claims 1 to 6, wherein: the machine learning model is a neural network model.

8. The super-resolution method based on machine learning according to any one of claims 4-6, wherein: the machine learning model is a plurality of filter banks.

9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the machine learning-based super resolution method of any one of claims 1-8.

10. A machine learning based super resolution system, comprising:

the noise estimation module is used for acquiring a noise level distribution map of the video image;

the image splicing module is connected with the noise estimation module and used for carrying out channel splicing on the video image and the noise level distribution map to obtain a corresponding high-dimensional spliced image;

the content statistical characteristic acquisition module is used for acquiring the content statistical characteristics of the video image;

the classification module is connected with the content statistical characteristic acquisition module and is used for classifying the video images according to the content statistical characteristics of the video images to obtain at least one group;

the model switching module is connected with the classification module and used for selecting a corresponding machine learning model according to the group to which the video image belongs and by combining the noise statistical characteristics of the video image;

and the machine learning module is respectively connected with the image splicing module and the model switching module and is used for calling the corresponding machine learning model to carry out super-resolution reconstruction on the video image so as to obtain a corresponding high-resolution image.

11. A machine learning based super resolution system, comprising:

the noise estimation module is used for acquiring the noise level of the video image;

the signal characteristic acquisition module is used for acquiring the signal characteristics of the video image;

the classification module is respectively connected with the noise estimation module, the content statistical characteristic acquisition module and the signal characteristic acquisition module and is used for classifying the video images according to the noise level, the content statistical characteristic and/or the signal characteristic of the video images to obtain at least one group;

and the machine learning module is connected with the model switching module and used for calling the corresponding machine learning model to carry out super-resolution reconstruction on the video image so as to obtain a corresponding high-resolution image.