CN112184568A

CN112184568A - Image processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN112184568A
Application number: CN202010925547.8A
Authority: CN
Inventors: 刘永劼
Original assignee: Beijing Aixin Technology Co ltd
Current assignee: Beijing Aixin Technology Co ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2021-01-05

Abstract

The application provides an image processing method, an image processing device, an electronic device and a readable storage medium, wherein the method comprises the following steps: acquiring a first filter corresponding to a preset first resolution by using a neural network model; according to the first filter, rendering to obtain a target filter corresponding to a preset target size; processing a preset picture to be filtered by using a target filter; the preset target size is larger than the preset first size, and the resolution of the picture to be filtered is the preset target resolution. In this way, the process of obtaining the target filter is no longer an end-to-end approach, but is obtained indirectly based on the low resolution filter output by the neural network model, such that the coefficients of each intermediate layer can be adjusted. Meanwhile, the target filter is obtained by gradually upwards rendering, so that the training process is more controllable, and the overfitting condition is not easy to occur. In addition, the method can also save the computing resources.

Description

Image processing method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a readable storage medium.

Background

To make up for the shortcomings of the conventional image processing algorithms, algorithms for improving image quality based on neural networks have been widely used. Compared with the traditional algorithm, the image processing method based on the neural network has the advantages that the image is processed by adjusting the filter coefficient of the filter by experience, the algorithm for improving the image quality based on the neural network adopts data driving, and the filter coefficient of the filter is learned by using training data.

The current algorithms for improving image quality based on the neural network are trained in an end-to-end learning mode, namely, an output picture is obtained by directly inputting a training set picture, and a process of obtaining a filter coefficient in the middle of the output picture is a black box and cannot be adjusted manually. Meanwhile, in an end-to-end processing mode, each pixel point in the image is allocated with the same computational power for processing, so that the consumption of computational resources of the current algorithm for improving the image quality based on the neural network is high. In addition, in the algorithm for improving the image quality based on the neural network, it is inevitable to generate overfitting to the training data, and if only a little overfitting is generated, serious artifacts may be generated on the image during testing or application, thereby affecting the image quality.

Disclosure of Invention

An object of the present invention is to provide an image processing method, an image processing apparatus, an electronic device and a readable storage medium, so as to solve the above problem.

The embodiment of the application provides an image processing method, which comprises the following steps: acquiring a first filter corresponding to a preset first resolution by using a neural network model; rendering according to the first filter to obtain a target filter corresponding to a preset target resolution; the preset target resolution is greater than the preset first resolution; processing a preset picture to be filtered by using the target filter; and the respective rate of the picture to be filtered is the preset target resolution.

In the embodiment of the application, a first filter corresponding to a preset first size is obtained through a neural network model, and then a target filter of an actually required target size is obtained through rendering according to the first filter in a rendering mode. In this way, the low-resolution filter is rendered to obtain the high-resolution filter by means of rendering. The process of obtaining the target filter is not an end-to-end mode any more, but is indirectly obtained based on the low-resolution filter obtained by training the neural network model, so that the coefficient of each middle layer (namely the coefficient of each filter) can be adjusted, and the problem that the process of obtaining the filter coefficient in the middle is a black box and cannot be adjusted manually in the traditional algorithm is solved to a certain extent. Meanwhile, the target filter is obtained by gradually rendering upwards, so that the training process is more controllable, and the overfitting condition is not easy to occur (even if the overfitting condition occurs, an engineer is easy to correct and adjust based on each stage of filter, and the coefficient cannot be adjusted in the current end-to-end mode).

In addition, by adopting the mode of the application and the mode of rendering upwards step by step, only one filter with small resolution needs to be obtained initially, and then the coefficient of the filter with corresponding high resolution is rendered on the basis of the filter with small resolution, so that the same repeated calculation does not need to be carried out on each point in the calculation process, and the calculation resources can be saved.

Further, obtaining a first filter corresponding to a preset first resolution by using the neural network model includes: inputting a preset input picture into the neural network model to obtain a first initial filter; filtering the input picture down-sampled to the first resolution by using the first initial filter to obtain a first filtered picture; calculating a loss value between the color of the first filtered picture and the color of a preset standard picture down-sampled to the first resolution; when the loss value is not converged, iteratively updating parameters of the neural network model until the loss value is converged; and correcting the high-frequency coefficient in the first initial filter when the loss value is converged by using a preset network to obtain the first filter.

In the implementation process, the first filter with low resolution is obtained by iteration of the neural network model and serves as a rendering basis, so that when the target filter is rendered, the first filter serving as the basis is a filter capable of meeting the corresponding image processing requirement, and the accuracy of the target filter learned by the scheme is improved.

Further, according to the first filter, rendering to obtain a target filter corresponding to a preset target resolution; the method comprises the following steps: rendering the first filter to obtain a second initial filter corresponding to a preset second resolution; the preset second resolution is greater than the preset first resolution; correcting the high-frequency coefficient in the second initial filter by using a preset network to obtain a corrected second filter; if the preset second resolution is not the preset target resolution, continuing to render the second filter until a target filter corresponding to the preset target resolution is obtained.

In the implementation process, the target filter corresponding to the preset target resolution is rendered from the low-resolution filter step by step, and the filter coefficient of the target filter depends on the coefficient of each low-resolution filter, so that an engineer can adjust the target filter by analyzing the coefficient of each low-resolution filter according to needs, and the risk of generating serious artifacts on an image during testing or application due to overfitting can be reduced.

Further, rendering the first filter to obtain a second initial filter includes: and interpolating the first filter according to the difference multiple between the preset second resolution and the preset first resolution to obtain the second initial filter.

In the implementation process, when the upward rendering is performed, the upward rendering is performed in a simple interpolation mode, so that the calculation amount can be reduced, and the learning efficiency can be improved.

Further, correcting the high-frequency coefficient in the second initial filter by using a preset network to obtain a corrected second filter, including: acquiring a first to-be-processed picture when a preset input picture is downsampled to a preset second resolution; filtering the first picture to be processed by using the second initial filter to obtain a second filtered picture; extracting high-frequency characteristic data corresponding to the second filtering picture, and regressing the high-frequency characteristic data by using the preset network to obtain a regressed high-frequency coefficient; and replacing the high-frequency coefficient in the second initial filter with the regressed high-frequency coefficient to obtain the second filter.

It should be understood that in the image, the low frequency part usually represents the internal region of the object, the color change degree of the pixel points in the region is not high, while the high frequency part represents the boundary region between different objects, and the color change degree of the pixel points in the region is high. Since the filter coefficients are up-sampled and expanded by interpolation, the filter coefficients obtained in this way are relatively smooth and perform well in the low frequency region, but perform poorly in the high frequency region, and therefore need to be corrected. In the embodiment of the application, the regression is performed by extracting the high-frequency feature data corresponding to the second filtered picture, and then the regressed high-frequency coefficient is used for replacing the high-frequency coefficient in the first initial filter, so that the first filter can meet the corresponding image processing requirement.

In the end-to-end scheme, the input is an unprocessed input picture and the output is a processed picture. In such a process, all pixels of the input picture are subjected to the same process (the same convolution process). In the solution provided in the embodiment of the present application, since the neural network training is performed to obtain a coarse-scale (i.e., low-resolution) filter, the filter of the target size is obtained by rendering up. Therefore, in the process, the processing of all pixels of the input picture is not in the same category, and through the scheme, the pixel points (pixel points in a low-frequency region) which do not need to be repeatedly calculated can be directly obtained through interpolation, and important pixel points (pixel points in a high-frequency region) can be recalculated, so that the calculated amount of the picture processed by the neural network can be obviously reduced in a mode of comparing with an end-to-end mode.

Further, the extracting the high-frequency feature data corresponding to the second filtered picture includes: acquiring the position of a high-frequency region in the second filtering picture; and extracting the high-frequency feature data corresponding to the position of the high-frequency region in the second filtering picture from the feature map with the preset second resolution output by the neural network model.

It should be understood that in the process of training the neural network model to obtain the first filter, feature extraction needs to be performed step by step to obtain feature maps of each step. And each level of feature map can be used to learn filters of different resolutions. In the embodiment of the present application, only the high-frequency coefficient needs to be corrected for the second initial filter, so that only the high-frequency feature data corresponding to the position of the high-frequency region in the second filtered picture needs to be extracted from the feature map with the preset second resolution output by the neural network model, and the neural network processing does not need to be performed again, thereby saving the calculation resources.

Further, the preset network is a multilayer perceptron.

An embodiment of the present application further provides an image processing apparatus, including: the system comprises an acquisition module, a rendering module and a processing module; the acquisition module is used for acquiring a first filter corresponding to a preset first resolution by using a neural network model; the rendering module is used for rendering to obtain a target filter corresponding to a preset target resolution according to the first filter; the preset target resolution is greater than the preset first resolution; the processing module is used for processing a preset picture to be filtered by using the target filter; and the respective rate of the picture to be filtered is the preset target resolution.

An embodiment of the present application further provides an electronic device, including: a processor, a memory, and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to implement any of the image processing methods described above.

Also provided in an embodiment of the present application is a readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement any of the image processing methods described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic basic flow chart of an image processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a first filter obtaining process according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a calibration process for a second initial filter according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a neural network provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a vehicle according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The first embodiment is as follows:

the current algorithms for image processing based on the neural network are trained in an end-to-end learning mode, namely, an output picture is obtained by directly inputting a training set picture, and a process of obtaining a filter coefficient in the middle of the output picture is a black box and cannot be adjusted manually. It is very common to generate overfitting during training in the neural network, which results in that the filter coefficients obtained by training are often not good, and artifacts are easily generated during testing or application.

In order to solve the problem that an end-to-end learning mode is easy to generate artifacts during testing or application, an indirect learning method based on a rendering idea is provided in the embodiment of the application. Referring to fig. 1, fig. 1 is a diagram illustrating an image processing method according to an embodiment of the present application, including:

s101: and acquiring a first filter corresponding to a preset first resolution by using the neural network model.

In the embodiment of the application, a first filter with a low resolution ratio can be obtained by iterative output of a neural network model, so that the filter serving as a rendering basis is a filter capable of meeting the actual image processing requirement during rendering, that is, the filter coefficient of the filter serving as the rendering basis is ensured to be credible.

In the embodiment of the present application, in order to obtain a final target filter by learning, an input picture and a standard picture corresponding to the input picture are preset.

It should be noted that the standard picture corresponding to the input picture is a picture representing the processing effect of the input picture. For example, in the picture enhancement algorithm, the standard picture is a picture after the input picture is successfully enhanced.

In obtaining the first filter, see fig. 2 for illustration:

a preset input picture may be input into the neural network model to obtain a first initial filter.

The resolution of the first initial filter is assumed to be 1/m of the resolution of the input picture. And then m-fold down-sampling can be carried out on the input picture to obtain a target input picture with the resolution of 1/m of the input picture.

And filtering the target input picture by using a first initial filter to obtain a first filtered picture, and calculating a loss value between the color of the first filtered picture and the color of the standard picture down-sampled to the first resolution.

If the loss value is not converged, iteratively updating the neural network model parameters, updating the first initial filter, filtering the target input picture by using the first initial filter again to obtain a first filtered picture, and calculating the loss value between the color of the first filtered picture and the color of the standard picture down-sampled to the first resolution.

And if the loss value is converged, ending iteration, and correcting the high-frequency coefficient in the first initial filter when the loss value is converged by using a preset network to obtain the first filter.

As described with reference to fig. 3, the correction process for the first initial filter when the loss value converges includes:

s301: and acquiring a picture to be processed.

It should be understood that, in the embodiment of the present application, a preset input picture may be downsampled to the first resolution in advance, so as to obtain a to-be-processed picture.

S302: and filtering the picture to be processed by using the first initial filter when the loss value is converged to obtain a filtered picture.

S303: and extracting high-frequency characteristic data corresponding to the filtering picture, and regressing the high-frequency characteristic data by using a preset network to obtain a regressed high-frequency coefficient.

It should be understood that the term high frequency feature data refers to feature data that can be used to train the high frequency coefficients of the filter.

In the embodiment of the present application, the position of the high-frequency region in the filtered picture may be obtained first, and then, during the training, the high-frequency feature data corresponding to the position of the high-frequency region in the filtered picture may be extracted from the feature map with the first resolution output by the neural network model.

In the embodiment of the present application, in order to determine the position of the high-frequency region in the filtered picture, a feasible manner is to perform filtering processing on the filtered picture by using a high-frequency filter, so as to obtain the position of the high-frequency region in the filtered picture.

However, the positioning of the high frequency region may be different due to different algorithms used for different image processing requirements and different purposes. Therefore, in order to accurately determine the position of the high-frequency region, in the embodiment of the present application, a standard picture corresponding to an input picture may be downsampled to a first resolution, then a loss value between each pixel point of a filtering picture and each pixel point of the standard picture downsampled to the first resolution is calculated, and then the position of the high-frequency region is determined according to the loss value.

For example, the position of the pixel point with the loss value greater than the preset threshold may be the position of the high frequency region. Or taking N pixel points with the maximum loss value in the filtering picture, or taking the positions of M% pixel points with the maximum loss value as the positions of the high-frequency region. Wherein, the values of N and M can be set by engineers according to actual needs.

It should be understood that in neural networks, the signature graph is characterized by a matrix. After the position of the high-frequency region is determined, the data of the corresponding position can be taken out from the feature map matrix of the first resolution according to the position of the high-frequency region, so as to obtain the high-frequency feature data in the embodiment of the present application.

In the embodiment of the present application, any regression model may be used to perform regression on the high-frequency feature data, for example, MLP (multi-layer perceptron) may be used to perform regression on the high-frequency feature data to obtain a high-frequency coefficient after regression.

S304: and replacing the high-frequency coefficient in the first initial filter by using the regressed high-frequency coefficient so as to obtain a first filter.

The regression correction is carried out on the high-frequency coefficient of the obtained first filter, so that the first filter is more reliable in theory, and the effect is better when the first filter is rendered upwards.

In order to ensure that the obtained first filter is reliable, in the embodiment of the present application, the first filter may be used to filter the to-be-processed picture at the first resolution, and then calculate a loss value between a color of the filtered first to-be-processed picture and a color of the pre-set standard picture down-sampled to the first resolution.

If the loss value is not converged, the first filter may be used as a first initial filter, the correction process shown in fig. 3 is executed again, the first filter is used again to filter the picture to be processed, and then the loss value between the color of the filtered picture to be processed and the color of the predetermined standard picture down-sampled to the first resolution is calculated.

If the loss values converge, it is determined that the final first filter is obtained.

S102: and rendering according to the first filter to obtain a target filter corresponding to the preset target resolution.

It should be noted that, in the embodiment of the present application, rendering refers to a process of sequentially calculating higher resolution filters from one lower resolution filter upwards. Which may be achieved by an upsampling operation.

It should also be noted that in the embodiments of the present application, the resolution of the high-level filter should be higher than that of the low-level filter. For example, the resolution of the target filter should be greater than the resolution of the remaining filters, and the second resolution should be greater than the first resolution.

It should be understood that in the embodiments of the present application, the target resolution required by the target filter may be determined by an engineer according to the actual required image processing requirements.

In the embodiment of the present application, the first filter may be rendered to obtain a second initial filter corresponding to a preset second resolution.

And further, correcting the high-frequency coefficient in the second initial filter by using a preset network to obtain a corrected second filter.

At this time, if the target resolution required by the target filter of the current learning is the second resolution, the current learning is finished, and the second filter is the required target filter obtained by learning.

And if the target resolution required by the target filter learned this time is not the second resolution, continuing to render the second filter to obtain a third initial filter corresponding to the preset third resolution. And then, correcting the high-frequency coefficient in the third initial filter by using a preset network to obtain a corrected third filter.

And if the target resolution required by the target filter learned this time is not the third resolution, continuing to render the third filter until the target filter corresponding to the preset target resolution is obtained.

In the embodiment of the application, upward rendering can be performed according to the multiple of the difference between resolutions between adjacent layers, so as to obtain the initial filter of the next layer.

For example, in the process of rendering the first filter to obtain the second initial filter, the second initial filter may be obtained by upsampling according to a multiple of a difference between the preset second resolution and the first resolution.

For example, the second resolution is 2h × 2w, the first resolution is h × w, and the difference between the length and the width of the second resolution and the first resolution is 2, the first filter may be up-sampled by 2 times, so as to obtain the second initial filter.

In this embodiment of the application, the resolution corresponding to the filter of each level may be preset by an engineer, that is, how many times the filter needs to be rendered, how much the resolution corresponding to the filter of each level is, and the like may be preset by the engineer.

In addition, the engineer may also automatically determine, by the electronic device, whether the filter obtained by the latest rendering and correction is the required target filter by configuring the rendering multiple and the resolution of the first filter. For example, assuming that the size of the first filter configured is 1/16 (which means both the length and width directions are 1/16, the same below) of the resolution of the input image, and the target resolution is the resolution of the input image, and each rendering multiple is 2, then in the first, second, and third renderings, the sizes of the obtained filters are 1/8, 1/4, and 1/2 of the resolution of the input image, respectively, and the electronic device determines that the rendered and corrected filters are not the required target filters and need to be rendered. And the size of the filter obtained by the 4 th rendering and correction is the resolution of the input image, the electronic equipment can determine that the target filter is obtained, and then the continuous rendering is stopped.

It should be noted that, in the embodiment of the present application, upsampling may be implemented by interpolation or the like.

It should be noted that, for the process of upsampling, the iterative processing process from front to back is replaced by a manner of unfolding and pushing from back to front in the embodiment of the present application.

It should be understood that the initial filter obtained by upsampling such as interpolation usually performs well in the low frequency region but performs poorly in the high frequency region (this is because the low frequency region usually represents a region with low color variation degree in the image, such as the internal region of an object, etc., while the high frequency region usually represents a region with high color variation degree in the image, such as the boundary region between different objects, so the filter coefficient corresponding to the low frequency region is relatively smooth, and the filter coefficient corresponding to the high frequency region changes drastically. For this reason, correction of the high-frequency coefficient of the initial filter is required.

In the embodiment of the present application, the correction of the high frequency coefficient may be implemented by regressing the high frequency coefficient. It should be understood that the way of correcting the high frequency coefficients in each initial filter is consistent in the present application, and therefore, the correction process can be referred to the aforementioned high frequency coefficient correction process for the first initial filter.

Illustratively, the correction process for the second initial filter is as follows:

first, a first picture to be processed is acquired.

It should be understood that, in the embodiment of the present application, a preset input picture may be downsampled to the second resolution size in advance, so as to obtain the first to-be-processed picture.

It should also be understood that for the remaining levels of the initial filter, the input picture is then downsampled to the resolution size of the respective level.

And then, filtering the first picture to be processed by using a second initial filter to obtain a second filtered picture.

And then, extracting high-frequency characteristic data corresponding to the second filtering picture, and regressing the high-frequency characteristic data by using a preset network to obtain a regressed high-frequency coefficient.

In the embodiment of the present application, the position of the high-frequency region in the second filtered picture may be obtained first, and then, in the feature map with the size of the second resolution output by the neural network model, the high-frequency feature data corresponding to the position of the high-frequency region in the second filtered picture is extracted.

It should be understood that, in the process of training the neural network model to obtain the first filter, feature extraction needs to be performed step by step to obtain feature maps of each stage. For example, see the neural network model shown in fig. 4, which is composed of 4 non-linear convolution units (conv + bn + relu, conv characterizing convolution, bn characterizing batch normalization, relu characterizing activation function) and one 1 × n × 3 convolution layer (the first filter is n × 3 filter), the step size of conv in the 4 non-linear convolution units is 2. Thus, the length and width dimensions of the feature map output by the first nonlinear convolution unit are 1/2 of the input image; the length and width of the feature map output by the second nonlinear convolution unit are 1/2 of the feature map output by the first nonlinear convolution unit, namely 1/4 of the input image; 1/8, the length and width of the feature map output by the third nonlinear convolution unit are all the length and width of the input image; the length and width dimensions of the feature map output by the fourth non-linear convolution unit are 1/16 of the input image. In fig. 4, 4 feature maps with different sizes are obtained.

Therefore, the high-frequency feature data corresponding to the position of the high-frequency region in the second filtered picture can be directly extracted based on the feature map of the second resolution obtained when the first filter is trained, and the feature map of the second resolution can be obtained without using the neural network model for feature extraction again for the input picture.

Of course, the scheme of reusing the neural network model for feature extraction on the input picture to obtain the feature map of the second resolution again may also be adopted by the embodiment of the present application.

And finally, replacing the high-frequency coefficient in the second initial filter by using the regressed high-frequency coefficient to obtain a second filter.

Similarly, in order to ensure that the obtained second filter is reliable, in the embodiment of the present application, the second filter may be used to filter the first to-be-processed picture, and then a loss value between the color of the filtered first to-be-processed picture and the color of the preset standard picture down-sampled to the second resolution is calculated.

If the loss value is not converged, the second filter may be used as a second initial filter to perform the above-mentioned correction process again, and the second filter is used again to filter the first to-be-processed picture, and then the loss value between the color of the filtered first to-be-processed picture and the color of the preset standard picture down-sampled to the second resolution is calculated.

If the loss values converge, it is determined that the final second filter is obtained.

In the embodiment of the present application, the target filter meeting the image processing requirement can be learned through the above manner, and thereafter, the corresponding image processing task can be executed by using the target filter.

It should be noted that the loss value convergence described in the embodiment of the present application may be that the loss value is smaller than a preset loss value threshold, or that a variation value of the loss value is lower than a preset variation value threshold.

It should be noted that, in the embodiment of the present application, the target resolution corresponding to the target filter obtained by upward rendering may be different from the resolution of the input picture, and may also be equal to the resolution of the input picture. The specific size of the target resolution is limited by the algorithm in which the target filter is actually used. For example, for the demosaic algorithm, the target resolution may be equal to the resolution of the input picture.

S103: and processing the preset picture to be filtered by using the target filter.

It should be understood that, in the embodiment of the present application, the picture to be filtered is an image that needs to be processed and is adapted to the processing capability of the target filter after the target filter is obtained.

It should be noted that the term "adapting to the processing capability of the target filter" means that the resolution of the picture to be filtered is the target resolution corresponding to the target filter, and the processing function to be implemented is also the function of the target filter.

It will be appreciated that the function of the target filter is adapted to the acquisition process described previously. For example, in the process of obtaining the target filter, the adopted standard picture is a picture after the input picture is successfully enhanced, then the standard picture is a picture after the input picture is successfully enhanced, and the picture to be filtered is processed by the target filter, so that the display effect of the picture to be filtered is enhanced.

According to the image processing method provided by the embodiment of the application, the first filter corresponding to the preset first size is obtained by adopting the preset neural network model for training, and then the target filter of the actually required target size is obtained by rendering according to the first filter in a rendering mode. In this way, the low-resolution filter is rendered to obtain the high-resolution filter by means of multi-layer filtering rendering. The process of obtaining the target filter is not an end-to-end mode any more, but is indirectly obtained based on the low-resolution filter output by the neural network model. The engineer can implement the correction adjustment of the filter coefficient of the target filter based on each stage of the filter, and the engineer has the capability of correction adjustment of the filter coefficient of the target filter.

Meanwhile, because of the filters at all levels, the whole image processing algorithm is convenient to analyze and debug.

In addition, since the indirect learning method based on rendering is adopted, the obtained target filter is not easy to generate overfitting, and even if overfitting occurs, an engineer is easy to correct and adjust the target filter based on each stage of filter, unlike the conventional end-to-end method, which cannot adjust coefficients.

In the end-to-end scheme, the input is an unprocessed input picture and the output is a processed picture. In such a process, all pixels of the input picture are subjected to the same process (the same convolution process). In the solution provided in the embodiment of the present application, since the neural network training is performed to obtain a coarse-scale (i.e., low-resolution) filter, the filter of the target size is obtained by rendering up. Therefore, in the process, the processing of all pixels of the input picture is not all-in-one, in the scheme of the embodiment of the application, pixel points (such as pixel points in a low-frequency region) which do not need to be repeatedly calculated can be directly obtained through interpolation, and important pixel points (such as pixel points in a high-frequency region) can be recalculated, so that the calculated amount of the picture processed by the neural network can be obviously reduced in a mode of comparing with an end-to-end mode.

In addition, the image processing method provided by the embodiment of the application can be widely applied to various image processing algorithms, and the training of the filter in the algorithm is realized, so that the reliability of each image processing algorithm is improved.

Example two:

in this embodiment, on the basis of the first embodiment, the scheme of the present application is exemplified by taking an application in the demosaic algorithm as an example.

Setting the input picture as raw, and setting the length and width of the raw as h x w; and setting a standard picture RAW corresponding to RAW, wherein the length and the width are h w.

In the demosaic algorithm, the target resolution is the resolution of the input picture, i.e., h × w.

The neural network structure is shown in fig. 4, and the number of final convolutional layer channels is set to 1 × n × 3.

The raw is input into the neural network, i.e. the first initial filter that outputs (h/16) × (w/16) × n × 3.

The picture raw1 downsampled to (h/16) × (w/16) using the first initial filter is filtered, resulting in filtered picture r 1.

The loss calculation is performed on the filtered picture r1 and the picture RAW1 down-sampled to (h/16) × (w/16) from RAW.

And if the loss value is not converged, updating the neural network parameters, outputting the first initial filter of (h/16) × (w/16) × n × 3 again, and filtering raw1 by using the new first initial filter to obtain a new filtered picture r 1. And (4) performing loss calculation on the new filtered picture r1 and the RAW1, and repeating the process if the loss value is not converged.

If the loss value converges, using the first initial filter when the loss value converges to filter raw1, and obtaining a filtered picture r 11.

The loss value between picture r11 and pair RAW1 is calculated.

According to the position of the pixel point set A1 with the loss value larger than the preset threshold value in the picture r11, the position of the pixel point set A1 in the feature map output by the fourth nonlinear convolution unit of the neural network is determined, and the numerical value corresponding to the pixel point set A1 in the feature map matrix is extracted.

And (3) regressing the numerical value set by using MLP, replacing data of corresponding positions in a filter matrix corresponding to the first initial filter with regressed numerical values according to the positions of the numerical values in the characteristic diagram matrix, and correcting the high-frequency coefficient of the first initial filter to obtain the first filter.

The first filter is double up sampled to obtain a second initial filter of (h/8) × (w/8) × n × 3. The part of the filter coefficients of the second initial filter which are more than the filter coefficients of the first initial filter is obtained by interpolating the filter coefficients of the first initial filter.

The picture raw2 downsampled to (h/8) × (w/8) using the second initial filter is filtered, resulting in filtered picture r 2.

The loss value between picture r2 and picture RAW2 down-sampled to (h/8) × (w/8) for RAW is calculated.

According to the position of the pixel point set A2 with the loss value larger than the preset threshold value in the picture r2, the position of the pixel point set A2 in a feature map output by a third nonlinear convolution unit of the neural network is determined, and a numerical value corresponding to the pixel point set A2 in the feature map matrix is extracted.

And (4) regressing the numerical value set by using MLP, and replacing data of corresponding positions in a filter matrix corresponding to the second initial filter with regressed numerical values according to the positions of the numerical values in the characteristic diagram matrix to realize the correction of the high-frequency coefficient of the second initial filter to obtain the second filter.

The second filter is double up sampled to obtain a third initial filter of (h/4) × (w/4) × n × 3. The part of the filter coefficients of the third initial filter which is more than the second filter is obtained by interpolating the filter coefficients of the second filter.

The picture raw3 downsampled to (h/4) × (w/4) using the third initial filter is filtered, resulting in filtered picture r 3.

The loss value between picture r3 and picture RAW3 down-sampled to (h/4) × (w/4) for RAW is calculated.

According to the position of the pixel point set A3 with the loss value larger than the preset threshold value in the picture r3, the position of the pixel point set A3 in the feature map output by the second nonlinear convolution unit of the neural network is determined, and the numerical value corresponding to the pixel point set A3 in the feature map matrix is extracted.

And (4) regressing the numerical value set by using MLP, replacing data of corresponding positions in a filter matrix corresponding to the third initial filter with regressed numerical values according to the positions of the numerical values in the characteristic diagram matrix, and correcting the high-frequency coefficient of the third initial filter to obtain the third filter.

The third filter is double up sampled to obtain a fourth initial filter of (h/2) × (w/2) × n × 3. The part of the filter coefficients of the fourth initial filter which is more than the third initial filter is obtained by interpolating the filter coefficients of the third initial filter.

The picture raw4 downsampled to (h/2) × (w/2) using the fourth initial filter is filtered, resulting in filtered picture r 4.

The loss value between picture r4 and picture RAW4 down-sampled to (h/2) × (w/2) for RAW is calculated.

According to the position of the pixel point set A4 with the loss value larger than the preset threshold value in the picture r4, the position of the pixel point set A4 in a feature map output by a first nonlinear convolution unit of the neural network is determined, and a numerical value corresponding to the pixel point set A4 in the feature map matrix is extracted.

And (4) regressing the numerical value set by using MLP, replacing data of corresponding positions in a filter matrix corresponding to the fourth initial filter with regressed numerical values according to the positions of the numerical values in the characteristic diagram matrix, and correcting the high-frequency coefficient of the fourth initial filter to obtain the fourth filter.

The fourth filter is double up sampled to obtain the target initial filter of h x w x n x 3. And the part of the filter coefficients of the target initial filter which is more than the fourth filter is obtained by interpolating the filter coefficients of the fourth filter.

And filtering raw by using the target initial filter to obtain a filtered picture r 5.

The loss value between pictures r5 and RAW is calculated.

According to the position of the pixel point set A5 with the loss value larger than the preset threshold value in the picture r5, the position of the pixel point set A5 in a feature map output by a first nonlinear convolution unit of the neural network is determined, and a numerical value corresponding to the pixel point set A5 in the feature map matrix is extracted.

And (3) regressing the numerical value set by using MLP, replacing data of corresponding positions in a filter matrix corresponding to the target initial filter with regressed numerical values according to the positions of the numerical values in the characteristic diagram matrix, and correcting the high-frequency coefficient of the target initial filter to obtain the target filter.

And finally, applying the target filter to the original image raw to obtain a final output result.

According to the scheme of the embodiment of the application, the traditional end-to-end design idea is abandoned, and the filter with high resolution is rendered layer by layer through the learned filter. In this way, due to the indirect learning mode based on rendering, the obtained target filter is not easy to generate overfitting, so that serious artifacts are not easy to generate. In addition, since the target filter is rendered to high resolution, no effort is wasted on a large number of pixel points (e.g., low frequency regions) that do not require excessive processing. In addition, each filter coefficient can be consulted by engineers, and analysis and debugging algorithms are convenient.

Example three:

based on the same inventive concept, the embodiment of the application also provides an image processing device. Referring to fig. 5, fig. 5 shows an image processing apparatus 100 corresponding to the method according to the first embodiment. It should be understood that the specific functions of the image processing apparatus 100 can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy. The image processing apparatus 100 includes at least one software functional module that can be stored in a memory in the form of software or firmware or solidified in an operating system of the image processing apparatus 100. Specifically, the method comprises the following steps:

referring to fig. 5, the image processing apparatus 100 includes: the device comprises an acquisition module 101, a rendering module 102 and a processing module 103. Wherein:

the obtaining module 101 is configured to train a preset neural network model to obtain a first filter corresponding to a preset first resolution.

The rendering module 102 is configured to render the target filter corresponding to the preset target resolution according to the first filter; the preset target resolution is greater than the preset first resolution.

The processing module 103 is configured to process a preset picture to be filtered by using the target filter; and the respective rate of the picture to be filtered is the preset target resolution.

In this embodiment of the present application, the obtaining module 101 is specifically configured to input a preset input picture into the neural network model, so as to obtain a first initial filter; filtering the input picture down-sampled to the first resolution by using the first initial filter to obtain a first filtered picture; calculating a loss value between the color of the first filtered picture and the color of a preset standard picture down-sampled to the first resolution; when the loss value is not converged, iteratively updating the neural network model parameters until the loss value is converged; and correcting the high-frequency coefficient in the first initial filter when the loss value is converged by using a preset network to obtain the first filter.

In this embodiment of the application, the rendering module 102 is specifically configured to render the first filter to obtain a second initial filter corresponding to a preset second resolution; the preset second resolution is greater than the preset first resolution; correcting the high-frequency coefficient in the second initial filter by using a preset network to obtain a corrected second filter; if the preset second resolution is not the preset target resolution, continuing to render the second filter until a target filter corresponding to the preset target resolution is obtained.

In a feasible implementation manner of the embodiment of the present application, the rendering module 102 is specifically configured to interpolate the first filter according to a multiple of a difference between the preset second resolution and the preset first resolution to obtain the second initial filter.

In another possible implementation manner of the embodiment of the present application, the rendering module 102 is specifically configured to obtain a first to-be-processed picture when a preset input picture is downsampled to the preset second resolution; filtering the first picture to be processed by using the second initial filter to obtain a second filtered picture; extracting high-frequency characteristic data corresponding to the second filtering picture, and regressing the high-frequency characteristic data by using the preset network to obtain a regressed high-frequency coefficient; and replacing the high-frequency coefficient in the second initial filter with the regressed high-frequency coefficient to obtain the second filter.

In the above possible implementation, the position of the high-frequency region in the second filtered picture is obtained; and extracting the high-frequency feature data corresponding to the position of the high-frequency region in the second filtering picture from the feature map with the preset second resolution output by the neural network model.

In the above possible implementation, the preset network is a multi-layer perceptron.

It should be understood that, for the sake of brevity, the contents described in some embodiments are not repeated in this embodiment.

Example four:

the embodiment provides an electronic device, which can be seen in fig. 6 and includes a processor 601, a memory 602 and a communication bus 603. Wherein:

the communication bus 603 is used for connection communication between the processor 601 and the memory 602.

The processor 601 is configured to execute one or more programs stored in the memory 602 to implement the image processing method in the first embodiment or the second embodiment.

It will be appreciated that the configuration shown in fig. 6 is merely illustrative and that the electronic device may also include more or fewer components than shown in fig. 6, or have a different configuration than shown in fig. 6, for example, may also have components such as a keyboard, a communications module, a display screen, etc.

The present embodiment also provides a readable storage medium, such as a floppy disk, an optical disk, a hard disk, a flash Memory, a usb (Secure Digital Card), an MMC (Multimedia Card), etc., in which one or more programs for implementing the above steps are stored, and the one or more programs can be executed by one or more processors to implement the image processing method in the first embodiment/the second embodiment. And will not be described in detail herein.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

In this context, a plurality means two or more.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image processing method, comprising:

acquiring a first filter corresponding to a preset first resolution by using a neural network model;

rendering according to the first filter to obtain a target filter corresponding to a preset target resolution; the preset target resolution is greater than the preset first resolution;

processing a preset picture to be filtered by using the target filter; and the respective rate of the picture to be filtered is the preset target resolution.

2. The image processing method of claim 1, wherein obtaining the first filter corresponding to the preset first resolution using the neural network model comprises:

inputting a preset input picture into the neural network model to obtain a first initial filter;

filtering the input picture down-sampled to the first resolution by using the first initial filter to obtain a first filtered picture;

calculating a loss value between the color of the first filtered picture and the color of a preset standard picture down-sampled to the first resolution;

when the loss value is not converged, iteratively updating parameters of the neural network model until the loss value is converged;

and correcting the high-frequency coefficient in the first initial filter when the loss value is converged by using a preset network to obtain the first filter.

3. The image processing method according to claim 1 or 2, wherein, according to the first filter, a target filter corresponding to a preset target resolution is rendered; the method comprises the following steps:

rendering the first filter to obtain a second initial filter corresponding to a preset second resolution; the preset second resolution is greater than the preset first resolution;

correcting the high-frequency coefficient in the second initial filter by using a preset network to obtain a corrected second filter;

if the preset second resolution is not the preset target resolution, continuing to render the second filter until a target filter corresponding to the preset target resolution is obtained.

4. The image processing method of claim 3, wherein rendering the first filter results in a second initial filter, comprising:

and interpolating the first filter according to the difference multiple between the preset second resolution and the preset first resolution to obtain the second initial filter.

5. The image processing method of claim 3, wherein correcting the high frequency coefficients in the second initial filter using a predetermined network to obtain a corrected second filter comprises:

acquiring a first to-be-processed picture when a preset input picture is downsampled to a preset second resolution;

filtering the first picture to be processed by using the second initial filter to obtain a second filtered picture;

extracting high-frequency characteristic data corresponding to the second filtering picture, and regressing the high-frequency characteristic data by using the preset network to obtain a regressed high-frequency coefficient;

and replacing the high-frequency coefficient in the second initial filter with the regressed high-frequency coefficient to obtain the second filter.

6. The image processing method according to claim 5, wherein the extracting the high-frequency feature data corresponding to the second filtered picture comprises:

acquiring the position of a high-frequency region in the second filtering picture;

and extracting the high-frequency feature data corresponding to the position of the high-frequency region in the second filtering picture from the feature map with the preset second resolution output by the neural network model.

7. The image processing method of claim 5, wherein the predetermined network is a multi-layer perceptron.

8. An image processing apparatus characterized by comprising: the system comprises an acquisition module, a rendering module and a processing module;

the acquisition module is used for acquiring a first filter corresponding to a preset first resolution by using a neural network model;

the rendering module is used for rendering to obtain a target filter corresponding to a preset target resolution according to the first filter; the preset target resolution is greater than the preset first resolution;

the processing module is used for processing a preset picture to be filtered by using the target filter; and the respective rate of the picture to be filtered is the preset target resolution.

9. An electronic device, comprising: a processor, a memory, and a communication bus;

the communication bus is used for realizing connection communication between the processor and the memory;

the processor is configured to execute one or more programs stored in the memory to implement the image processing method of any one of claims 1 to 7.

10. A readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the image processing method according to any one of claims 1 to 7.