CN115700726A - Image processing method and device, training method and computer readable storage medium - Google Patents

Image processing method and device, training method and computer readable storage medium Download PDF

Info

Publication number
CN115700726A
CN115700726A CN202110878340.4A CN202110878340A CN115700726A CN 115700726 A CN115700726 A CN 115700726A CN 202110878340 A CN202110878340 A CN 202110878340A CN 115700726 A CN115700726 A CN 115700726A
Authority
CN
China
Prior art keywords
level
image
processing
output
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110878340.4A
Other languages
Chinese (zh)
Inventor
那彦波
卢运华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BOE Technology Group Co Ltd
Original Assignee
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BOE Technology Group Co Ltd filed Critical BOE Technology Group Co Ltd
Priority to CN202110878340.4A priority Critical patent/CN115700726A/en
Publication of CN115700726A publication Critical patent/CN115700726A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Processing (AREA)

Abstract

Image processing method and device, training method and computer readable storage medium. The present disclosure provides an image processing method, including: acquiring input images and momentum items of N-1 levels, wherein N is a positive integer and is greater than 2; generating N levels of initial feature images with the resolution ranging from high to low based on the input image; performing iterative back projection processing on the ith level based on the initial feature image of the (i + 1) th level and the momentum term of the ith level to generate an updated feature image of the ith level; i =1,2, \ 8230;, N-1; an output image is generated based on the updated feature image of level 1. The disclosure also provides an image processing apparatus, a neural network training method, and a non-transitory computer-readable storage medium.

Description

Image processing method and device, training method and computer readable storage medium
Technical Field
The present disclosure relates to the field of display technologies, and in particular, to an image processing method and apparatus, a neural network training method, and a non-transitory computer-readable storage medium.
Background
Currently, deep learning techniques based on artificial neural networks have made tremendous progress in areas such as image classification, image capture and search, face recognition, age, and speech recognition. The advantage of deep learning is that a generic structure can be used to solve very different technical problems with relatively similar systems. A Convolutional Neural Network (CNN) is an artificial Neural Network that has been developed in recent years and attracted much attention, and CNN is a special image recognition method and belongs to a very effective Network with forward feedback.
Disclosure of Invention
The present disclosure proposes an image processing method, an image processing apparatus, a neural network training method, and a non-transitory computer-readable storage medium.
In a first aspect, the present disclosure provides an image processing method, including:
acquiring input images and momentum items of N-1 levels, wherein N is a positive integer and N is greater than 2;
generating N levels of initial feature images with the resolution ranging from high to low based on the input image;
for the N levels of initial feature images, performing iterative back projection processing on the ith level based on the i +1 level of initial feature images and the ith level of momentum terms to generate an ith level of updated feature image; i =1,2, \ 8230;, N-1;
and generating an output image based on the updated feature image of the level 1.
In some embodiments, the iterative backprojection processing for each level comprises: down-sampling processing, connection processing, up-sampling processing, first superposition processing and second superposition processing;
the down-sampling process of the i-th level includes: down-sampling is carried out based on the input of the iterative back projection processing of the ith level, and down-sampling output of the ith level is generated;
the i-th level of joining processing includes: performing a connection operation based on the downsampled output of the ith level and the initial feature image of the (i + 1) th level to generate a joint output of the ith level;
the up-sampling process of the ith level includes: generating an ith level upsampled output based on the ith level joint output;
the first superimposition processing of the i-th hierarchy includes: overlapping the first overlapping input of the ith level and the up-sampling output of the ith level to generate a first overlapping output of the ith level;
the second superimposition processing of the i-th hierarchy includes: superposing the input of the iterative back projection processing of the ith level and the first superposed output of the ith level to generate the output of the iterative back projection processing of the ith level;
the iterative back projection processing of the j +1 th level is nested between the downsampling processing of the j level and the connection processing of the j level, the input of the iterative back projection processing of the j +1 th level comprises the downsampling output of the j level, wherein j =1,2, \8230;, N-2; wherein, the iterative back projection processing of at least one level is continuously executed for a plurality of times, and the input of the latter iterative back projection processing comprises the output of the former iterative back projection processing; the first superposition input of the first superposition process in the subsequent iterative back projection process comprises the first superposition output of the first superposition process in the previous iterative back projection process, the first superposition input in the first iterative back projection process comprises the momentum term of the current level, and the updated characteristic image of the 1 st level comprises the output of the last iterative back projection process of the 1 st level.
In some embodiments, the generating the joint output of the ith level based on the joining of the downsampled output of the ith level and the initial feature image of the (i + 1) th level specifically includes:
taking the downsampled output of the i-th level as an input of the iterative backprojection process of the i + 1-th level to generate an output of the iterative backprojection process of the i + 1-th level; and joining the output of the iterative backprojection processing at the i +1 th level with the initial feature image at the i +1 th level to generate a joint output at the i th level.
In some embodiments, generating the initial feature images of N levels with resolution arranged from high to low based on the input image comprises:
performing N different levels of analysis processing on the input image to generate initial feature images of the N levels with resolution arranged from high to low, respectively.
In some embodiments, generating an output image based on the level 1 updated feature image comprises:
converting the updated feature image of level 1 to generate the output image.
In some embodiments, generating the initial feature images of N levels with resolution arranged from high to low based on the input image comprises:
taking the input image as a1 st level intermediate input image, and performing down sampling on the input image to generate 2 nd-to-N th level intermediate input images with the resolution arranged from high to low respectively;
analyzing and processing the intermediate input image of each level to generate an input characteristic image of each level; taking the input characteristic image of the Nth level as an initial characteristic image of the Nth level;
for each of the first N-1 levels, sequentially performing down-sampling and analysis processing on the intermediate input image of the level to generate an intermediate characteristic image; connecting the intermediate characteristic image of the current level with the initial characteristic image of the next level, performing up-sampling on the image generated after connection, and superposing the up-sampled image with the momentum term of the current level to generate a first momentum term; and superposing the first momentum term and the input feature image of the current level to generate the initial feature image of the current level.
In some embodiments, the iterative back-projection processing of each level is performed M times consecutively, M being an integer greater than 1, each iterative back-projection processing including a down-sampling processing, a joining processing, an up-sampling processing, a first superimposing processing, and a second superimposing processing;
the down-sampling process in the m-th iterative back-projection process of the i-th level includes: performing downsampling based on the input of the mth iterative back projection processing of the ith level to generate downsampled output of the mth iterative back projection processing of the ith level; wherein the initial feature image of the ith level comprises the input of the 1 st iterative back projection processing of the ith level, and the input of each iterative back projection processing after the 1 st iterative back projection processing of the ith level comprises the output of the previous iterative back projection processing;
the join processing in the mth iteration back projection processing of the ith level comprises the following steps: performing a join operation based on the downsampled output of the mth iterative back projection processing of the i-th level and the output of the mth iterative back projection processing of the i + 1-th level to generate an mth compensation feature image of the i-th level;
the up-sampling processing in the mth iteration back projection processing of the ith level comprises the following steps: performing upsampling on the basis of the mth compensation characteristic image of the ith level to generate upsampled output of the mth iterative back projection processing of the ith level;
the first superimposition processing in the 1 st iterative back-projection processing of the i-th hierarchy includes: performing superposition operation based on the up-sampling output of the 1 st iterative back projection processing of the ith level and the first momentum item of the ith level to generate a first superposition output of the 1 st iterative back projection processing of the ith level; the first superimposition processing in each iterative backprojection processing after the 1 st time of the i-th level includes: performing a superposition operation based on the up-sampling output of the current iterative back projection processing and the first superposition output of the previous iterative back projection processing to generate the first superposition output of the current iterative back projection processing;
the second superimposition processing in the 1 st iterative back-projection processing of the i-th hierarchy includes: performing superposition operation based on the first superposition output of the 1 st iterative back projection processing of the ith level and the initial characteristic image of the ith level to generate a second superposition output of the 1 st iterative back projection processing of the ith level; the second superimposition processing in each iterative backprojection processing after the 1 st time of the i-th level includes: performing superposition operation based on the first superposition output of the current iterative back projection processing and the second superposition output of the last iterative back projection processing to generate a second superposition output of the current iterative back projection processing, wherein the second superposition output of each iterative back projection processing is used as the output of the current iterative back projection processing;
wherein M =1,2, \8230, M, the second overlay of the last iterative backprojection process of level 1 is output as said updated feature image of level 1.
In some embodiments, generating an output image based on the updated feature image of level 1 includes:
and converting the updated feature image of the level 1, and superposing the image generated after conversion and the input image to generate the output image.
In some embodiments, generating the N hierarchical levels of initial feature images with resolutions ranging from high to low based on the input image comprises:
concatenating the input image with a random noise image to generate a joint input image;
and performing N different levels of analysis processing on the joint input image to respectively generate the N levels of initial feature images with the resolution ranging from high to low.
In some embodiments, of the N levels of initial feature images, the 1 st level of initial feature image has the highest resolution, and the 1 st level of initial feature image has the same resolution as the input image;
the resolution of the initial feature image of the previous level is an integer multiple of the resolution of the initial feature image of the subsequent level.
In a second aspect, the present disclosure provides a method of training a neural network, the neural network comprising: the training method comprises the following steps of analyzing a network, iterating a back projection processing network and outputting the network:
acquiring a training input image and preset N-1 momentum items, wherein N is a positive integer and is greater than 2;
processing the training input image by using the analysis network to generate training initial characteristic images of N levels with the resolution ratio arranged from high to low;
performing iterative back projection processing of the ith level on the basis of the training initial feature image of the (i + 1) th level and the momentum term of the ith level by using the iterative back projection processing network to generate a training updating feature image of the ith level; i =1,2, \ 8230;, N-1;
generating a training output image based on the training update feature image of the level 1 by using the output network;
calculating a loss value of the neural network through a loss function based on the training output image, and correcting parameters of the neural network according to the loss value of the neural network.
In some embodiments, the loss function comprises: and the mean square error between the training standard image corresponding to the training input image and the training output image.
In a third aspect, the present disclosure provides an image processing apparatus comprising:
the image acquisition module is configured to acquire an input image and preset N-1 momentum items, wherein N is a positive integer;
an image processing module configured to generate N levels of initial feature images with resolutions arranged from high to low based on the input image, wherein N is a positive integer and N >2; for the N levels of initial feature images, carrying out iterative back projection processing on the ith level based on the i +1 level of initial feature images and the ith level of momentum terms to generate an ith level of updated feature image; i =1,2, \8230;, N-1; and generating an output image based on the updated feature image of level 1.
In a fourth aspect, the present disclosure provides an image processing apparatus comprising:
a memory and a processor, the memory having stored thereon a computer program, wherein the computer program, when executed by the processor, implements the image processing method in the above embodiment or the training method in the above embodiment.
In a fifth aspect, the present disclosure provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the image processing method in the above embodiments or the training method in the above embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
fig. 1 is a flow chart of an image processing method provided in some embodiments of the present disclosure.
Fig. 2A is a schematic flow diagram of an image processing method provided in some embodiments of the present disclosure.
Fig. 2B is a schematic flow chart diagram of an image processing method provided in further embodiments of the present disclosure.
Fig. 3 is a schematic flow chart block diagram of an image processing method provided in still further embodiments of the present disclosure.
Fig. 4 is a schematic flow chart of an image processing method in one specific example of the present disclosure.
Fig. 5 is a schematic diagram of each image processing sub-process provided in some embodiments of the present disclosure.
Fig. 6 is a block diagram schematic structure of a neural network provided in some embodiments of the present disclosure.
Fig. 7 is a flowchart of a method for training a neural network provided in some embodiments of the present disclosure.
Fig. 8 is a schematic architecture block diagram of a training method of a neural network provided in some embodiments of the present disclosure.
Fig. 9 is a schematic block diagram of an image processing apparatus provided in some embodiments of the present disclosure.
Fig. 10 is a schematic block diagram of an image processing apparatus according to other embodiments of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without inventive step, are within the scope of protection of the disclosure.
Image enhancement is one of the research hotspots in the field of image processing. The image quality is greatly reduced due to the limitations of various physical factors (for example, the image sensor size of the mobile phone camera is too small and the limitations of other software and hardware, etc.) and the interference of environmental noise in the image acquisition process. The purpose of image enhancement is to improve the gray level histogram of an image and improve the contrast of the image through an image enhancement technology, so that the detail information of the image is highlighted and the visual effect of the image is improved.
Fig. 1 is a flowchart of an image processing method provided in some embodiments of the present disclosure, wherein the image processing method may be performed using a trained neural network. As shown in fig. 1, the image processing method includes:
s101, acquiring an input image and preset momentum items (momentum) of N-1 levels, wherein N is a positive integer and is greater than 2. The momentum term is a feature for increasing the error convergence rate (i.e., increasing the parameter update rate of the neural network) in the training process of the neural network.
And S102, generating N levels of initial characteristic images with the resolution arranged from high to low based on the input image.
S103, carrying out iterative back projection processing on the ith level on the basis of the initial feature images of the (i + 1) th level and momentum terms of the ith level on the initial feature images of the N levels to generate an updated feature image of the ith level; i =1,2, \8230;, N-1.
The iterative back projection processing at the ith level may be to compensate the initial feature image at the ith level by using the initial feature image at the (i + 1) th level, and use a momentum term at the ith level in the compensation process.
And S104, generating an output image based on the updated characteristic image of the 1 st level.
In the embodiment of the disclosure, a plurality of kinds of initial feature images with different resolutions are generated based on an input image, and iterative back projection processing of the ith level is performed based on the initial feature image of the (i + 1) th level and the momentum term of the ith level, so that the quality of an output image can be improved. And the introduction of the momentum term in each level can enable the neural network to form a reversible residual neural network, and when the neural network is trained, the introduction of the momentum term can improve the training speed of the neural network and reduce the occupation of the memory in the training process. And because momentum terms are introduced into each level, the training speed of the neural network can be further improved.
Fig. 2A is a schematic flow chart diagram of an image processing method provided in some embodiments of the present disclosure, and fig. 2B is a schematic flow chart diagram of an image processing method provided in other embodiments of the present disclosure, and the image processing method of the present disclosure is described in detail below with reference to fig. 1 to 2B.
As shown in fig. 1, the image processing method includes:
s101, acquiring an input image and preset momentum items of N-1 levels, wherein N is a positive integer and is greater than 2.
For example, as shown in fig. 2A and 2B, the input image is labeled INP. The input image may include a photo captured by a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a monitoring camera, or a network camera, and the like, and may include a human image, an animal image, a plant image, a landscape image, and the like, which is not limited in this disclosure.
The input image INP may be a grayscale image or a color image. For example, color images include, but are not limited to, 3 channels of RGB images, and the like. Note that, in the embodiment of the present disclosure, when the input image INP is a grayscale image, the output image OUTP is also a grayscale image; when the input image INP is a color image, the output image OUTP is also a color image.
The image processing method provided by the embodiment of the disclosure can enhance the image, thereby improving the image quality.
It should be noted that each image in the embodiments of the present disclosure is represented in a matrix form. In some embodiments, the momentum items may be represented in a matrix form, and the matrix corresponding to the momentum items at the same level has the same dimension as the matrix corresponding to the input image. The N-1 levels of momentum items may be preset, for example, each element in each level of momentum items may be zero.
S102, generating N levels of initial feature images with the resolution ranging from high to low based on the input image.
The resolution of the 1 st level initial feature image is the highest among the N levels of initial feature images, and the resolution of the 1 st level initial feature image is the same as that of the input image;
the resolution of the initial feature image of the previous level (e.g., i-th level) is an integer multiple of the resolution of the initial feature image of the subsequent level (e.g., i + 1-th level). For example, when the resolution of the initial feature image of the previous level is twice the resolution of the initial feature image of the next level, assuming that the resolution of the initial feature image of a certain level is 32 × 32, the resolution of the initial feature image of the next level is 16 × 16.
In some embodiments, as shown in fig. 2A, the input image INP may be subjected to N different levels of analysis processing by an analysis network, thereby generating N levels of initial feature images F01 to F0N (e.g., F01 to F05 shown in fig. 2A) whose resolutions are arranged from high to low, respectively. For example, as shown in fig. 4, the analysis network includes N analysis subnetworks ASN, and each analysis subnetwork ASN performs the analysis processing at different levels described above to generate N levels of initial feature images F01 to F0N (e.g., F01 to F05 shown in fig. 2A) whose resolutions are arranged from high to low, respectively. For example, each analysis sub-network ASN may be implemented to include convolutional network modules such as convolutional neural network CNN, residual network ResNet, dense network densnet, etc., e.g., each analysis sub-network ASN may include convolutional layers, downsampling layers, normalization layers, etc., but is not limited thereto. The network parameters of different analysis sub-networks ASN may be different to generate the initial feature images F01 to F0N with different resolutions.
For example, in some embodiments, as shown in fig. 2B, the input image INP may first be concatenated (as indicated by the circle labeled "C" in fig. 2A and 2B) with a random noise image noise to generate a joint input image; then, the joint input image is subjected to N different levels of analysis processing through an analysis network to generate N levels of initial feature images F01-F0N with resolutions ranging from high to low respectively. For example, the joining process can be viewed as: each channel image of a plurality of (e.g., two or more) images to be joined is stacked so that the number of channels of the joined image is the sum of the number of channels of the plurality of images to be joined. For example, the channel images of the joint input image are the synthesis of the channel images of the input image and the channel images of the random noise image. For example, the random noise in the random noise image noise may conform to a gaussian distribution, but is not limited thereto. For example, the specific process and details of the analysis processing in the embodiment shown in fig. 2B may refer to the related description of the analysis processing in the embodiment shown in fig. 2A, and are not repeated herein.
It should be noted that, when the image enhancement processing is performed, detailed features (for example, hairs, lines, and the like) in the output image are often related to noise. When the neural network is applied to image enhancement processing, the amplitude of the input noise is adjusted according to actual needs (whether details need to be highlighted, the highlighting degree of the details and the like), so that the output image meets the actual needs. For example, in some embodiments, the noise amplitude of the random noise image may be 0; for example, in other embodiments, the noise amplitude of the random noise image may be other than 0. Embodiments of the present disclosure are not limited in this regard.
For example, in fig. 2A and 2B, the order of the respective hierarchies is determined in the top-down direction.
For example, in some embodiments, the resolution of the initial feature image F01 of the 1 st level, which has the highest resolution, may be the same as the resolution of the input image INP. For example, in some embodiments, the input image is obtained by performing resolution conversion processing (for example, image super-resolution reconstruction processing) on the original input image, in which case, the resolution of the initial feature image of the nth level with the lowest resolution may be the same as that of the original input image, and it should be noted that the embodiments of the present disclosure include but are not limited thereto.
It should be noted that, although fig. 2A and 2B each show a case where 5 levels of initial feature images F01 to F05 (i.e., N = 5) are generated, this should not be construed as limiting the present disclosure, that is, the value of N may be set according to actual needs.
S103, performing iterative back projection processing on the ith level based on the initial feature image of the (i + 1) th level and the momentum term of the ith level to generate an updated feature image of the ith level; i =1,2, \ 8230;, N-1.
And the resolution ratio of the updated characteristic image at the same level is the same as that of the initial characteristic image.
In some embodiments, each level of iterative backprojection processing includes a downsampling process, a joining process, an upsampling process, a first stacking process, and a second stacking process, and further, the 1 st level to N-1 st level iterative backprojection processes are nested.
For example, as shown in fig. 2A and 2B, the i-th level down-sampling process includes: based on the input of the iterative back projection processing of the ith level, down-sampling is performed to generate down-sampled output of the ith level. The i-th level of joining processing includes: and performing a connection operation based on the downsampled output of the ith level and the initial characteristic image of the (i + 1) th level to generate a joint output of the ith level. The ith level of upsampling process includes: generating an upsampled output of an ith level based on the joint output of the ith level. The first superimposition processing of the i-th hierarchy includes: and generating a first superposition output of the ith level based on superposition of the first superposition input of the ith level and the upsampled output of the ith level. The second superimposition processing of the i-th hierarchy includes: and superposing the input of the iterative back projection processing of the ith level and the first superposed output of the ith level to generate the output of the iterative back projection processing of the ith level.
The iterative back projection processing of the j +1 th level is nested between the down sampling processing of the j level and the connection processing of the j level, the input of the iterative back projection processing of the j +1 th level comprises the down sampling output of the j level, wherein j =1,2, \8230;, N-2. It should be noted that in the present disclosure, "nested" means that one object includes another object similar to or the same as the object, and the object includes, but is not limited to, a flow or a network structure, etc.
Wherein, the iterative back projection processing of at least one level is continuously executed for a plurality of times, and the input of the next iterative back projection processing comprises the output of the previous iterative back projection processing; the first superimposition input of the next iteration back projection processing comprises the first superimposition output of the first superimposition processing in the previous iteration back projection processing, and the first superimposition input of the first iteration back projection processing comprises the momentum item of the current level. For example, as shown in fig. 2A and 2B, the iterative back-projection process of each level may be performed two times in succession, in which case both the quality of the output image can be improved and the network structure can be avoided from being complicated.
Since the iterative back-projection processing of the j +1 th level is nested between the down-sampling processing of the j th level and the joining processing of the j th level, when the iterative back-projection processing of each level is continuously performed twice, taking N =5 as an example, the iterative back-projection processing of the 4 th level is performed twice between the down-sampling processing and the joining processing each time the iterative back-projection processing of the 3 rd level is performed; performing 3 rd-level iterative back projection processing twice between the 2 nd-level down sampling processing and the joining processing each time the 2 nd-level iterative back projection processing is performed; the iterative back projection processing of the 2 nd level is performed twice between the down sampling processing of the 3 rd level and the joining processing each time the iterative back projection processing of the 1 st level is performed. That is, the iterative back-projection processing of the 4 th level is performed 16 times in total, the iterative back-projection processing of the 3 rd level is performed 8 times in total, the iterative back-projection processing of the 2 nd level is performed 4 times in total, and the iterative back-projection processing of the 1 st level is performed 2 times in total. In this case, the 4 th level iterative back projection processing is executed 1 st time, wherein the first superimposition input of the first superimposition processing includes the 4 th level momentum term; when the 4 th-level iterative back projection processing is executed for the 2 nd time, the first superposition input of the first superposition processing comprises the first superposition output of the first superposition processing in the 1 st iterative back projection processing; when the 4 th-level iterative back-projection processing is executed 3 rd time, the first superposition input of the first superposition processing comprises the first superposition output of the first superposition processing of the 2 nd iterative back-projection processing, and so on. Similarly, when the 3 rd level iterative back projection processing is executed for the 1 st time, the first superposition input of the first superposition processing comprises the 3 rd level momentum term; when the 3 rd level iterative back projection processing is executed for the 2 nd time, the first superposition input of the first superposition processing comprises the first superposition output of the first superposition processing in the 1 st iterative back projection processing; and so on. For the iterative back projection processing of other levels, the same processing mode is adopted, namely, in each level, the first superposition input of the first superposition processing in the first iterative back projection processing comprises the momentum term of the level; the first overlap-and-add input of the first overlap-and-add process in each iterative back-projection process after the first time comprises the first overlap-and-add output of the previous iterative back-projection process at the present level.
It should be noted that, the embodiment of the present disclosure does not limit the specific execution times of the iterative back projection processing of each level.
In some embodiments, as shown in fig. 2A and 2B, the joining process at the i-th level specifically includes: taking the down-sampled output of the ith level as an input of the iterative back-projection processing of the (i + 1) th level to generate an output of the iterative back-projection processing of the (i + 1) th level; and performing a join operation on the output of the iterative back projection processing of the (i + 1) th level and the initial feature image of the (i + 1) th level to generate a joint output of the (i) th level.
The downsampling process DS is used to reduce the size of the feature map and reduce the data amount of the feature map, and may be performed by a downsampling layer, for example, but is not limited thereto. For example, the downsampling layer may implement downsampling processing by using a downsampling method such as maximum pooling (maxporoling), average pooling (averaging), and span convolution (strudled convolution).
The upsampling process US is used to increase the size of the feature map, so as to increase the data amount of the feature map, and may be performed by an upsampling layer, for example, but not limited thereto. For example, the upsampling layer may implement upsampling by using an upsampling method such as a transposed convolution (transposed transformed convolution), an interpolation algorithm, or the like. The interpolation algorithm may include, for example, an interpolation value, a bilinear interpolation, a Bicubic interpolation (Bicubic interpolation), a Lanczos interpolation, or the like. For example, when the upsampling process is performed using an interpolation algorithm, the original pixel values and interpolated values may be retained, thereby increasing the size of the feature map.
The input and output of the first superimposing process AD1 are the same in size, and for example, taking the feature image as an example, the first superimposing process may directly superimpose the two feature images, that is, add the values of each row and each column of the two feature images. For another example, the first superimposing process may superimpose two feature images in a weighted manner, that is, values of each row and each column of one of the feature images are multiplied by a first weight, values of each row and each column of the other feature image are multiplied by a second weight, and the values of the two feature images multiplied by the weights are added correspondingly. Wherein, the sum of the first weight and the second weight may be 1. The first superposition process may be implemented by a convolutional network including convolutional layers.
The second superimposition processing AD2 is similar to the first superimposition processing AD1, and the second superimposition processing AD2 may be realized by a convolutional network including convolutional layers. The input and output of the second superimposing process AD2 are the same size. For example, taking the feature image as an example, the second superimposition processing AD2 may directly superimpose two feature images; or, weighting and superposing the two characteristic images. The input of the iterative back projection processing of each hierarchy can be held in a certain proportion in the output of the iterative back projection processing of each hierarchy by the second superimposition processing AD 2.
It should be noted that, in some embodiments of the present disclosure, the down-sampling factor of the down-sampling process of the same level corresponds to the up-sampling factor of the up-sampling process, that is: when the down-sampling factor of the down-sampling process is 1/y, then the up-sampling factor of the up-sampling process is y, where y is a positive integer, and y is typically equal to or greater than 2. Thus, it is possible to ensure that the output of the up-sampling process and the input of the down-sampling process of the same level are the same in size.
It should be noted that, in some embodiments of the present disclosure (not limited to the present embodiment), parameters of downsampling processes at different levels (i.e., parameters of network structures corresponding to the downsampling processes) may be the same or different; parameters of the upsampling process of different levels (i.e. parameters of the network structure corresponding to the upsampling process) may be the same or different; the parameters of the first superposition processing of different levels can be the same or different; the parameters of the second superimposition processing at different levels may be the same or different. Embodiments of the present disclosure are not limited thereto.
In some embodiments, the updated feature image of the same level is at the same resolution as the initial feature image.
And S104, generating an output image based on the updated feature image of the level 1.
For example, in some embodiments, the updated feature image of level 1 may be transformed to generate an output image. For example, when the updated feature image includes a plurality of channels, as shown in fig. 2A and 2B, the updated feature image of level 1 may be subjected to synthesis processing by the synthesis network MERG to generate the output image OUTP. For example, in some embodiments, the synthetic network MERG may include convolutional layers or the like. For example, the output image may include a 1-channel grayscale image, and may also include, for example, a 3-channel RGB image (i.e., color image). It should be noted that the embodiment of the present disclosure does not limit the structure and parameters of the synthetic network MERG as long as it can convert the updated feature image of the level 1 into the output image OUTP.
In addition, the above-described image processing method is directly used to process an input image with a high resolution (for example, a resolution of 4k or more), and thus the requirement for the hardware condition (for example, video memory or the like) of the image processing apparatus is high. Therefore, in some embodiments, in order to solve the above problem, the input image may be subjected to a cropping process to obtain a plurality of sub-input images having overlapping regions; then, the plurality of sub-input images are respectively processed by the image processing method (for example, the foregoing steps S101 to S104, etc.) to generate a plurality of corresponding sub-output images; and finally, splicing the corresponding sub-output images into an output image.
Fig. 3 is a schematic flow chart diagram of an image processing method provided in still other embodiments of the present disclosure, and further image processing methods of the present disclosure are described in detail below with reference to fig. 1 and 3.
As shown in fig. 1, the image processing method includes:
s101, acquiring an input image and preset momentum items of N-1 levels, wherein N is a positive integer and is greater than 2.
As shown in fig. 3, the input image is labeled INP. The input image may include a photo captured by a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a monitoring camera, or a network camera, and may include a person image, an animal image, a plant image, a landscape image, or the like, which is not limited in this disclosure.
The input image INP may be a grayscale image or a color image. For example, color images include, but are not limited to, 3 channels of RGB images, and the like. Note that, in the embodiment of the present disclosure, when the input image INP is a grayscale image, the output image OUTP is also a grayscale image; when the input image INP is a color image, the output image OUTP is also a color image.
In some embodiments, the momentum items of N-1 levels can be preset, and the momentum items of the same level and the resolution of the input image are the same. For example, each element in the momentum term for each level may be zero.
S102, based on the input image, N-level initial feature images, such as F01 to F04 in fig. 3, are generated, with resolutions ranging from high to low.
Among the N levels of initial feature images, the resolution of the 1 st level of initial feature image is the highest, and the resolution of the 1 st level of initial feature image is the same as the resolution of the input image INP.
The resolution of the initial feature image of the previous level (e.g., ith level) is an integer multiple of the resolution of the initial feature image of the next level (e.g., ith +1 level). For example, if the resolution of the initial feature image of the previous level is twice the resolution of the initial feature image of the next level, assuming that the resolution of the initial feature image of a certain level is 32 × 32, the resolution of the initial feature image of the next level is 16 × 16.
In some embodiments, the input images may be processed to generate N levels of intermediate input images, such as INM 1-INM 4 in fig. 3, with resolutions ranging from high to low.
Thereafter, the intermediate input images (i.e., INM1 to INM 4) of each level are analyzed to generate input feature images of each level, for example, the intermediate input images (i.e., INM1 to INM 4) are analyzed by the analysis sub-networks ASN, and the network parameters of different analysis sub-networks ASN may be different. And taking the input characteristic image of the Nth level as an initial characteristic image of the Nth level. For each level in the first N-1 levels, sequentially performing down-sampling processing (DS) and analysis processing on the intermediate input image of the level to generate an intermediate characteristic image; connecting the intermediate characteristic image of the current level with the initial characteristic image of the next level, performing up-sampling processing US on the image generated after connection, and performing superposition processing on the image subjected to the up-sampling processing US and the momentum term of the current level to generate a first momentum term; and performing superposition processing on the first momentum term and the input characteristic image of the current level to generate an initial characteristic image of the current level.
For example, the input image INP is set as the 1 st-level intermediate input image, and the down-sampling processing is performed on the input image N-1 times to generate 2 nd-to N-th-level intermediate input images. The down-sampling process in fig. 3 is the same principle as the down-sampling process in fig. 2A and 2B, the up-sampling process in fig. 3 is the same principle as the up-sampling process in fig. 2A and 2B, and the superimposition process in fig. 3 is the same principle as the first superimposition process in fig. 2A and 2B. The coupling operation can be seen as: each channel image of a plurality of (e.g., two or more) images to be joined is stacked such that the number of channels joining the generated images is the sum of the number of channels of the plurality of images to be joined.
S103, performing iterative back projection processing on the ith level based on the initial feature image of the (i + 1) th level and the momentum term of the ith level to generate an updated feature image of the ith level; i =1,2, \8230;, N-1. And the resolution ratio of the updated characteristic image at the same level is the same as that of the initial characteristic image.
For example, the iterative back projection processing of each hierarchy is continuously performed M times, M being an integer greater than 1, and each iterative back projection processing includes a down-sampling processing, a joining processing, an up-sampling processing, a first superimposition processing, and a second superimposition processing.
Wherein the down-sampling process in the mth iterative back-projection process of the ith level comprises: and performing downsampling based on the input of the mth iteration back projection processing of the ith level to generate downsampled output of the mth iteration back projection processing of the ith level. The input of the 1 st iterative back projection processing of the ith level is the initial feature image of the ith level, and the input of each iterative back projection processing after the 1 st iterative back projection processing of the ith level is the output of the previous iterative back projection processing. Wherein m is an integer of [1, M ].
The join processing in the m-th iterative back projection processing of the ith level comprises the following steps: performing a join operation based on the downsampled output of the mth iterative backprojection process of the i-th level and the output of the mth iterative backprojection process of the i + 1-th level to generate an mth compensated feature image of the i-th level.
The up-sampling process in the mth iteration back projection process of the ith level comprises the following steps: and performing upsampling based on the m-th compensation characteristic image of the i-th level to generate an upsampled output of the m-th iterative back projection processing of the i-th level.
The first superimposition processing in the 1 st iterative back-projection processing of the i-th hierarchy includes: performing superposition operation based on the up-sampling output of the 1 st iterative back projection processing of the ith level and the first momentum item of the ith level to generate a first superposition output of the 1 st iterative back projection processing of the ith level; the first superimposition processing in each iterative back-projection processing after the 1 st time of the i-th level includes: and performing superposition operation based on the up-sampling output of the current iteration back projection processing and the first superposition output of the previous iteration back projection processing to generate the first superposition output of the current iteration back projection processing.
The second superimposition processing in the 1 st iterative back projection processing of the i-th hierarchy includes: performing superposition operation based on the first superposition output of the 1 st iterative back projection processing of the ith level and the initial characteristic image of the ith level to generate a second superposition output of the 1 st iterative back projection processing of the ith level; the second superimposition processing in each iterative backprojection processing after the 1 st time of the i-th level includes: and performing superposition operation based on the first superposition output of the current iterative back projection processing and the second superposition output of the last iterative back projection processing to generate the second superposition output of the current iterative back projection processing, wherein the second superposition output of each iterative back projection processing is used as the output of the current iterative back projection processing.
And S104, generating an output image based on the updated feature image UDP of the 1 st layer.
For example, in some embodiments, the updated feature image UDP may be converted to generate the output image OUTP. For example, when the updated feature image UDP of the 1 st level includes a plurality of channels, as shown in fig. 3, the updated feature image UDP of the 1 st level may be subjected to synthesis processing by the synthesis network MERG, and the image after the synthesis processing is superimposed with the input image INP (i.e., the intermediate input image INM 1), thereby generating the output image OUTP. For example, in some embodiments, the synthetic network MERG may include convolutional layers or the like. For example, the output image OUTP may include a grayscale image of 1 channel, and may also include an RGB image (i.e., a color image) of 3 channels, for example. It should be noted that the resolution and the number of channels of the input image UDP of the synthesis network MERG are the same as those of the input image INP, so that the input image INP can be superimposed on the input image UDP. The embodiments of the present disclosure do not limit the structure and parameters of the synthetic network MERG, as long as the converted image can be superimposed with the input image INP to generate the output image OUTP.
Fig. 4 is a schematic flowchart of an image processing method in a specific example of the present disclosure, and fig. 4 illustrates the image processing method shown in fig. 3 by taking N =4 and M =3 as examples. The image processing method shown in fig. 4 will be specifically described below.
For example, a first intermediate input image INM1, a second intermediate input image INM2, a third intermediate input image INM3, and a fourth intermediate input image INM4, whose resolutions are arranged from high to low, are generated based on the input image INP. For example, an input image is taken as a first intermediate input image INM1, and the input image is down-sampled to generate a second intermediate input image INM2; performing lower sampling processing on the second intermediate input image INM2 to generate a third intermediate input image INM3; the third intermediate input image INM3 is down-sampled to generate a fourth intermediate input image INM4.
Analyzing the first intermediate input image INM1 to generate a first input characteristic image IF1; analyzing the second intermediate input image INM2 to generate a second input feature image IF2; analyzing the third intermediate input image INM3 to generate a third input feature image IF3; the fourth intermediate input image INM4 is subjected to analysis processing, and a fourth input feature image IF4 is generated.
The fourth input feature image IF4 is taken as the 4 th-level initial feature image F04. Sequentially carrying out downsampling processing and analysis processing on the third intermediate input image INM3 to generate a third intermediate characteristic image; connecting the third intermediate characteristic image with the 4 th-level initial characteristic image F04, up-sampling the connected image, and superposing the up-sampled image with the 3 rd-level momentum item Z3 to generate a 3 rd-level first momentum item Z31; the 3 rd level first momentum term Z31 is superimposed on the third input feature image IF3, and an initial feature image F03 of the 3 rd level is generated.
Sequentially performing downsampling processing and analysis processing on the second intermediate input image INM2 to generate a second intermediate characteristic image; connecting the second intermediate characteristic image with an initial characteristic image F03 of a 3 rd level, up-sampling the connected image, and superposing the up-sampled image with a momentum term Z2 of a2 nd level to generate a first momentum term Z21 of the 2 nd level; the first momentum term Z21 of the level 2 is superimposed on the second input feature image IF2, and an initial feature image F02 of the level 2 is generated.
Sequentially performing downsampling processing and analysis processing on the first intermediate input image INM1 to generate a first intermediate characteristic image, connecting the first intermediate characteristic image with the 2 nd-level initial characteristic image F02, performing upsampling on the connected image, connecting the upsampled image with the 1 st-level momentum term Z1, and generating a1 st-level first momentum term Z11; the first momentum term Z11 at level 1 is superimposed on the first input feature image IF1 to generate an initial feature image F01 at level 1.
Down-sampling the 3 rd level initial feature image F03, and connecting the down-sampled image with the 4 th level initial feature image F04 to generate a connected image IMC1; the joint image IMC1 is up-sampled and the up-sampled image is superimposed with the 3 rd level first momentum term Z31 to generate a 3 rd level first superimposed output IA11_3. The first superimposed output IA11_3 is superimposed on the 3 rd-level initial feature image F03, and a first second superimposed output IA21_3 of the 3 rd level is generated.
Downsampling the first and second overlay output IA21_3 of the 3 rd level, and connecting the downsampled image with the initial feature image F04 of the 4 th level to generate a connected image IMC4; the joint image IMC4 is up-sampled and the up-sampled image is superimposed with the first superimposed output IA11_3 of the 3 rd level to generate a second first superimposed output IA12_3 of the 3 rd level. The first superimposed output IA12_3 is superimposed on the first second superimposed output IA21_3 at the 3 rd level, and a second superimposed output IA22_3 at the 3 rd level is generated.
Downsampling the second superimposed output IA22_3 at the 3 rd level, and connecting the downsampled image with the initial feature image F04 at the 4 th level to generate a connected image IMC7; the joint image IMC7 is up-sampled and the up-sampled image is superimposed with the second first superimposed output IA12_3 of the 3 rd level, generating a third first superimposed output IA13_3 of the 3 rd level. The first superimposed output IA13_3 is superimposed on the second superimposed output IA22_3 at the 3 rd level, and a third second superimposed output IA23_3 at the 3 rd level is generated.
Down-sampling the initial feature image F02 of the 2 nd level, and connecting the down-sampled image with the first second superposition output IA21_3 of the 3 rd level to generate a connected image IMC2; the joint image IMC2 is up-sampled, and the up-sampled image is superimposed with the first momentum term Z21 of the level 2, so as to generate a first superimposed output IA11_2 of the level 2. The first superimposed output IA11_2 is superimposed on the 2 nd-level initial feature image F02, and a first second superimposed output IA21_2 of the 2 nd level is generated.
Downsampling the first second superimposed output IA21_2 at the 2 nd level, and connecting the downsampled image with the second superimposed output IA22_3 at the 3 rd level to generate a connected image IMC5; the joint image IMC5 is upsampled and the upsampled image is superimposed with the first superimposed output IA11_2 of level 2, generating a second first superimposed output IA12_2 of level 2. The first superimposed output IA12_2 is superimposed on the first second superimposed output IA21_2 of the 2 nd hierarchy level, and a second superimposed output IA22_2 of the 2 nd hierarchy level is generated.
Downsampling the second superimposed output IA22_2 at the 2 nd level, and connecting the downsampled image with the third second superimposed output IA23_3 at the 3 rd level to generate a connected image IMC8; the joint image IMC8 is up-sampled and the up-sampled image is superimposed with the second first superimposed output IA12_2 of level 2, generating a third first superimposed output IA13_2 of level 2. The first superimposed output IA13_2 is superimposed on the second superimposed output IA22_3 at the level 2 to generate a third second superimposed output IA23_2 at the level 2.
Downsampling the initial feature pattern F01 of the 1 st level, and connecting the downsampled image with the first and second overlay output IA21_2 of the 2 nd level to generate a connected image IMC3; the joint image IMC3 is up-sampled, and the up-sampled image is superimposed with the first momentum term Z11 of level 1, so as to generate a first superimposed output IA11_1 of level 1. The first superimposed output IA11_1 is superimposed on the 1 st level initial feature image F01, and a first second superimposed output IA21_1 of the 1 st level is generated.
Downsampling the first second overlay output IA21_1 at the 1 st level, and connecting the downsampled image with the second overlay output IA22_2 at the 2 nd level to generate a connected image IMC6; the joint image IMC6 is up-sampled and the up-sampled image is superimposed with the first superimposed output IA11_1 of level 1, generating a second first superimposed output IA12_1 of level 1. The first superimposed output IA12_1 is superimposed on the first second superimposed output IA21_1 at the 1 st level, and a second superimposed output IA22_1 at the 1 st level is generated.
Downsampling the second superimposed output IA22_1 at the 1 st level, and connecting the downsampled image with the third second superimposed output IA23_2 at the 2 nd level to generate a connected image IMC9; the joint image IMC9 is up-sampled and the up-sampled image is superimposed with the second first superimposed output IA12_1 of level 1, generating a third first superimposed output IA13_1 of level 1. The first superimposed output IA13_1 is superimposed with the second superimposed output IA22_1 at the level 1 to generate a third superimposed output IA23_1 at the level 1, and the third superimposed output IA23_1 is the updated feature image UDP at the level 1.
The updated feature image UDP of the 1 st hierarchy (i.e., the second overlay output IA23_ 1) is subjected to the synthesis processing by the synthesis network MERG, and the image generated after the synthesis processing is overlaid with the input image to generate the output image OUTP.
The image processing processes in fig. 3 and 4 correspond to a process including a plurality of image processing sub-processes, and fig. 5 is a schematic diagram of each image processing sub-process provided in some embodiments of the present disclosure, as shown in fig. 5, each image processing sub-process has 4 inputs (i.e., input 1 to input 4 in fig. 5) and 4 outputs (i.e., output 1 to output 4 in fig. 5), wherein input 1 and input 2 of the image processing sub-process are superimposed, the superimposed result is output 4, and the output 4 is superimposed with input 3, the superimposed result is output 2, and the output 2 is combined with input 4, and the combined result is output 3. At the same time, output 2 is down sampled to produce output 1. It should be noted that for some image processing sub-processes that lack one or both of the inputs, the image processing sub-process will skip the corresponding operation. For example, when only input 1, input 2, input 3, output 1, output 2, and output 4 are present, then the join and upsample operations in fig. 5 need not be performed. For another example, if input 1, input 2, output 1, and output 4 are missing, input 3 is directly taken as output 2, input 3 and input 4 are coupled, and the result of the coupling is taken as output 3.
Fig. 6 is a schematic block diagram of a structure of a neural network provided in some embodiments of the present disclosure, fig. 7 is a flowchart of a training method of a neural network provided in some embodiments of the present disclosure, and fig. 8 is a schematic architecture block diagram of a training method of a neural network provided in some embodiments of the present disclosure.
Wherein, the neural network is used for executing the image processing method provided in the foregoing embodiment.
For example, as shown in fig. 6, the neural network 100 includes: an analysis network 110, an iterative backprojection processing network 120, and an output network 130. Wherein, the analysis network 110 is used to execute step S102 in the above-mentioned image processing method, that is, the analysis network 110 can process the input image so as to generate N levels of initial feature images with resolution arranged from high to low based on the input image, N is a positive integer, and N >2. The iterative back-projection processing network 120 is configured to perform step S130 in the above image processing method, that is, the iterative back-projection processing network 120 may perform iterative back-projection processing on an ith level based on the initial feature image of the (i + 1) th level and the momentum term of the ith level to generate an updated feature image of the ith level; i =1,2, \8230;, N-1. The output network is used to execute step S140 in the above-described image processing method, that is, the output network 130 generates an output image based on the updated feature image of level 1.
The specific structure of the neural network 100 and the corresponding processing process thereof may refer to the related description in the foregoing image processing method, and are not described herein again.
As shown in fig. 7 and 8, the training method of the neural network includes: step S201 to step S205.
Step S201, obtaining a training input image and preset momentum items of N-1 levels. N is a positive integer, and N >2.
For example, similar to the input image in step S101, the training input image may also include a photo captured by a camera of a smartphone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a monitoring camera, a web camera, or the like, which may include a human image, an animal image, a plant image, a landscape image, or the like, and the embodiment of the disclosure is not limited thereto. The training input image may be a grayscale image or a color image. For example, color images include, but are not limited to, 3-channel RGB images, and the like.
Step S202, processing the training input image by using the analysis network 110, and generating a training initial feature image with a resolution arranged from high to low.
Step S203, performing iterative back projection processing of the ith level based on the training initial feature image of the (i + 1) th level and the momentum term of the ith level by using the iterative back projection processing network 120 to generate an updated feature image of the ith level; i =1,2, \8230;, N-1.
The process of performing the iterative back projection processing of the ith level by the iterative back projection processing network may refer to the above description, and is not described herein again.
Step S204 is to generate a training output image based on the training update feature image of the level 1 by using the output network 130.
The specific process of generating the training output image based on the training update feature image of the level 1 by the output network may be as described in step S104 above. And may not be described in detail.
Step S205, based on the training output image, calculating a loss value of the neural network through a loss function, and correcting parameters of the neural network according to the loss value of the neural network.
The parameters of the neural network may include parameters of the analysis network, parameters of the iterative backprojection network, and parameters of the output network, among others.
In some embodiments, the Loss function Loss may include: and the mean square error between the training standard image corresponding to the training input image and the training output image. Specifically, the formula is shown as follows:
Loss=E[(x-y) 2 ]
wherein, x is the training output image, y is the training standard image, and E [ ] represents the calculation of matrix energy. For example, E [ ] may be the maximum or average of the elements in the matrix in the calculation "[ ]". For example, the training standard image and the training input image have the same scene, i.e. the content of both images is the same, while the quality of the training standard image is higher than that of the training input image. For example, the training standard image corresponds to a target output image of the neural network.
For example, the training standard image may be a photographic image taken by, for example, a digital single mirror reflex camera. For example, in some embodiments, the training standard images may be subjected to a degradation process to generate training input images.
In the training process of the neural network, the training target of the neural network is to minimize the loss value. For example, during the training process of the neural network, the parameters of the neural network are continuously corrected, so that the training output image output by the neural network after the parameters are corrected is continuously close to the training standard image, and the loss value is continuously reduced. It should be noted that the above loss function provided by this embodiment is exemplary, and the embodiments of the present disclosure include but are not limited thereto, for example, the loss function may also be an L1 regular term of the training standard image and the training output image.
An embodiment of the present disclosure further provides an image processing apparatus, and fig. 9 is a schematic block diagram of an image processing apparatus provided in some embodiments of the present disclosure, as shown in fig. 9, the image processing apparatus includes: an image acquisition module 301 and an image processing module 302. The image processing apparatus may be configured to execute the image processing method in the above-described embodiment, and the image acquisition module 301 may be configured to execute step S101 in the above-described image processing method, for example, the image acquisition module 301 may be configured to acquire an input image and momentum items of a plurality of levels. For example, the image acquisition module 301 may include a memory storing an input image and a plurality of levels of momentum items; alternatively, the image obtaining module 301 may also include one or more cameras to obtain the input images; an information input structure may also be included to obtain momentum terms.
The image processing module 302 may be configured to perform steps S102 to S104 of the aforementioned image processing method. In some embodiments, the image acquisition module 301 and the image processing module 302 may be implemented as hardware, software, firmware, or any feasible combination thereof.
Fig. 10 is a schematic block diagram of an image processing apparatus according to other embodiments of the disclosure, and as shown in fig. 10, the image processing apparatus 200 includes a memory 201 and a processor 202. For example, the memory 201 is used for non-transitory storage of computer readable instructions, and the processor 202 is used for executing the computer readable instructions, and the computer readable instructions are executed by the processor to execute the image processing method and/or the training method of the neural network provided by any embodiment of the disclosure.
Wherein the memory 201 and the processor 202 may be in direct or indirect communication with each other. For example, in some examples, the image processing apparatus 200 may further include a system bus, and the memory 201 and the processor 202 may communicate with each other through the system bus, for example, the processor 202 may access the memory through the system bus. In other examples, components such as memory 201 and processor 202 may communicate over a network connection. The network may include a wireless network, a wired network, and/or any combination of wireless and wired networks. The network may include a local area network, the Internet, a telecommunications network, an Internet of Things (Internet of Things) based on the Internet and/or a telecommunications network, and/or any combination thereof, etc. The wired network may communicate by using twisted pair, coaxial cable, or optical fiber transmission, for example, and the wireless network may use 3G/4G/5G mobile communication network, bluetooth, zigbee, or WiFi, for example. The present disclosure is not limited herein as to the type and functionality of the network.
For example, the processor 202 may control other components in the image processing apparatus to perform desired functions. The processor may be a device having data processing capability and/or program execution capability, such as a Central Processing Unit (CPU), tensor Processor (TPU), or Graphics Processor (GPU). The Central Processing Unit (CPU) may be an X86 or ARM architecture, etc. The GPU may be separately integrated directly onto the motherboard, or built into the north bridge chip of the motherboard. The GPU may also be built into a Central Processing Unit (CPU).
For example, memory 201 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, read Only Memory (ROM), a hard disk, an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), USB memory, flash memory, and the like.
For example, one or more computer instructions may be stored on memory 201 and executed by processor 202 to implement various functions. Various applications and various data may also be stored in the computer readable storage medium, such as input images, output images, first/second training input images, first/second training output images, first/second training standard images, and various data used and/or generated by the applications, among others.
For example, some computer instructions stored by the memory 201, when executed by the processor, may perform one or more steps according to the image processing method described above. As another example, other computer instructions stored by the memory 201, when executed by the processor, may perform one or more steps of a training method according to a neural network described above.
The image processing apparatus 200 may further include an input interface allowing an external device to communicate with the image processing apparatus. For example, the input interface may be used to receive instructions from an external computer device, from a user, and the like. The image processing apparatus may further include an output interface that connects the image processing apparatus and one or more external devices to each other. For example, the image processing apparatus may display an image or the like through the output interface.
For the detailed description of the processing procedure of the image processing method, reference may be made to the related description in the embodiment of the image processing method, and for the detailed description of the processing procedure of the training method of the neural network, reference may be made to the related description in the embodiment of the training method of the neural network, and repeated details are not repeated.
Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the image processing method or the training method of the neural network in the above embodiments. For example, the computer-readable storage medium may include a storage component of a tablet computer, a hard disk of a personal computer, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a compact disc read-only memory (CD-ROM), a flash memory, or any combination of the above, as well as other suitable storage media.
It will be understood that the above embodiments are merely exemplary embodiments employed to illustrate the principles of the present disclosure, and the present disclosure is not limited thereto. It will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the disclosure, and these are to be considered as the scope of the disclosure.

Claims (15)

1. An image processing method, comprising:
acquiring input images and momentum items of N-1 levels, wherein N is a positive integer and is greater than 2;
generating N levels of initial feature images with the resolution arranged from high to low based on the input image;
for the N levels of initial feature images, performing iterative back projection processing on the ith level based on the i +1 level of initial feature images and the ith level of momentum terms to generate an ith level of updated feature image; i =1,2, \ 8230;, N-1;
an output image is generated based on the updated feature image of level 1.
2. The image processing method of claim 1, wherein each level of iterative backprojection processing comprises: down-sampling processing, connection processing, up-sampling processing, first superposition processing and second superposition processing;
the down-sampling process of the i-th level includes: performing downsampling based on the input of the iterative back projection processing of the ith level to generate downsampled output of the ith level;
the i-th level of joining processing includes: performing a connection operation based on the downsampled output of the ith level and the initial feature image of the (i + 1) th level to generate a joint output of the ith level;
the up-sampling process of the ith level includes: generating an upsampled output of an ith level based on the joint output of the ith level;
the first superimposition processing of the i-th hierarchy includes: overlapping the first overlapping input of the ith level and the up-sampling output of the ith level to generate a first overlapping output of the ith level;
the second superimposition processing of the i-th hierarchy includes: superposing the input of the iterative back projection processing of the ith level and the first superposed output of the ith level to generate the output of the iterative back projection processing of the ith level;
the iterative back-projection processing of the j +1 th level is nested between the down-sampling processing of the j level and the connection processing of the j level, the input of the iterative back-projection processing of the j +1 th level comprises the down-sampling output of the j level, wherein j =1,2, \ 8230;, N-2; wherein, the iterative back projection processing of at least one level is continuously executed for a plurality of times, and the input of the next iterative back projection processing comprises the output of the previous iterative back projection processing; the first superposition input of the first superposition process in the last iterative back projection process comprises the first superposition output of the first superposition process in the previous iterative back projection process, the first superposition input in the first iterative back projection process comprises the momentum term of the current level, and the updated characteristic image of the 1 st level comprises the output of the last iterative back projection process of the 1 st level.
3. The image processing method according to claim 2, wherein the generating of the i-th level joint output based on the i-th level downsampled output and the i + 1-th level initial feature image by joining comprises:
taking the downsampled output of the i-th level as an input of the iterative backprojection process of the i + 1-th level to generate an output of the iterative backprojection process of the i + 1-th level; and, concatenating the output of the iterative backprojection process at the i +1 th level with the initial feature image at the i +1 th level to generate the joint output at the i-th level.
4. The image processing method according to claim 2, wherein generating N-level initial feature images having resolutions arranged from high to low based on the input image comprises:
performing N different levels of analysis processing on the input image to generate initial feature images of the N levels with resolution arranged from high to low, respectively.
5. The image processing method according to claim 2, wherein generating an output image based on the updated feature image of level 1 comprises:
converting the updated feature image of level 1 to generate the output image.
6. The image processing method according to claim 1, wherein generating N levels of initial feature images having resolutions arranged from high to low based on the input image comprises:
taking the input image as a1 st level intermediate input image, and performing down-sampling on the input image to generate 2 nd to nth level intermediate input images with resolutions arranged from high to low, respectively;
analyzing and processing the intermediate input image of each level to generate an input characteristic image of each level; taking the input characteristic image of the Nth level as an initial characteristic image of the Nth level;
for each of the first N-1 levels, sequentially performing down-sampling and analysis processing on the intermediate input image of the level to generate an intermediate feature image; connecting the intermediate characteristic image of the current level with the initial characteristic image of the next level, performing up-sampling on the image generated after connection, and superposing the up-sampled image with the momentum term of the current level to generate a first momentum term; and superposing the first momentum term and the input feature image of the current level to generate the initial feature image of the current level.
7. The image processing method according to claim 6, wherein the iterative back-projection processing of each hierarchy is successively performed M times, M being an integer greater than 1, each iterative back-projection processing including a down-sampling processing, a joining processing, an up-sampling processing, a first superimposing processing, and a second superimposing processing;
the down-sampling process in the mth iterative back-projection process at the ith level includes: performing downsampling based on the input of the mth iterative back projection processing of the ith level to generate downsampled output of the mth iterative back projection processing of the ith level; wherein the initial feature image of the ith level comprises the input of the 1 st iterative back projection processing of the ith level, and the input of each iterative back projection processing after the 1 st iterative back projection processing of the ith level comprises the output of the previous iterative back projection processing;
the join processing in the m-th iterative back projection processing of the ith level comprises the following steps: performing a join operation based on a downsampled output of the mth iterative backprojection process of the i-th level and an output of the mth iterative backprojection process of the i + 1-th level to generate an mth compensated feature image of the i-th level;
the up-sampling process in the mth iteration back projection process of the ith level comprises the following steps: performing upsampling on the basis of the mth compensation characteristic image of the ith level to generate upsampled output of the mth iterative back projection processing of the ith level;
the first superimposition processing in the 1 st iterative back-projection processing of the i-th hierarchy includes: performing superposition operation based on the up-sampling output of the 1 st iterative back projection processing of the ith level and the first momentum item of the ith level to generate a first superposition output of the 1 st iterative back projection processing of the ith level; the first superimposition processing in each iterative backprojection processing after the 1 st time of the i-th level includes: performing a superposition operation based on the up-sampling output of the current iterative back projection processing and the first superposition output of the previous iterative back projection processing to generate the first superposition output of the current iterative back projection processing;
the second superimposition processing in the 1 st iterative back projection processing of the i-th hierarchy includes: performing superposition operation based on the first superposition output of the 1 st iterative back projection processing of the ith level and the initial characteristic image of the ith level to generate a second superposition output of the 1 st iterative back projection processing of the ith level; the second superimposition processing in each iterative back-projection processing after 1 st time of the i-th level includes: performing superposition operation based on the first superposition output of the current iterative back projection processing and the second superposition output of the last iterative back projection processing to generate a second superposition output of the current iterative back projection processing, wherein the second superposition output of each iterative back projection processing is used as the output of the current iterative back projection processing;
wherein M =1,2, \8230, M, the second overlay of the last iterative backprojection process of level 1 is output as said updated feature image of level 1.
8. The image processing method according to claim 6, wherein generating an output image based on the updated feature image of level 1 comprises:
and converting the updated feature image of the 1 st level, and superposing the image generated after conversion and the input image to generate the output image.
9. The image processing method according to claim 2 or 6, wherein generating the N-level initial feature images having the resolutions arranged from high to low based on the input image comprises:
concatenating the input image with a random noise image to generate a joint input image;
performing N different levels of analysis processing on the joint input image to generate initial feature images of the N levels with resolution arranged from high to low respectively.
10. The image processing method according to any one of claims 1 to 8, wherein, of the N levels of initial feature images, an initial feature image of level 1 has a highest resolution, and the resolution of the initial feature image of level 1 is the same as the resolution of the input image;
the resolution of the initial feature image of the previous level is an integer multiple of the resolution of the initial feature image of the next level.
11. A method of training a neural network, the neural network comprising: the training method comprises the following steps of analyzing a network, iterating a back projection processing network and outputting the network:
acquiring a training input image and preset N-1 momentum items, wherein N is a positive integer and is greater than 2;
processing the training input image by using the analysis network to generate training initial characteristic images of N levels with the resolution arranged from high to low;
performing iterative back projection processing of the ith level based on the training initial feature image of the (i + 1) th level and the momentum term of the ith level by using the iterative back projection processing network to generate a training updating feature image of the ith level; i =1,2, \ 8230;, N-1;
generating a training output image based on the training update feature image of the level 1 by using the output network;
calculating a loss value of the neural network through a loss function based on the training output image, and correcting parameters of the neural network according to the loss value of the neural network.
12. Training method according to claim 11, wherein the loss function comprises: and the mean square error between the training standard image corresponding to the training input image and the training output image.
13. An image processing apparatus characterized by comprising:
the image acquisition module is configured to acquire an input image and N-1 preset momentum items, wherein N is a positive integer;
an image processing module configured to generate, based on the input image, N number of levels of initial feature images having a resolution arranged from high to low, N being a positive integer and N >2; for the N levels of initial feature images, performing iterative back projection processing on the ith level based on the i +1 level of initial feature images and the ith level of momentum terms to generate an ith level of updated feature image; i =1,2, \ 8230;, N-1; and generating an output image based on the updated feature image of level 1.
14. An image processing apparatus characterized by comprising:
a memory having stored thereon a computer program, wherein the computer program when executed by the processor implements the image processing method of any of claims 1 to 10 or the training method of any of claims 11 to 12.
15. A non-transitory computer-readable storage medium on which a computer program is stored, the computer program, when being executed by a processor, implementing the image processing method of any one of claims 1 to 10 or the training method of any one of claims 11 to 12.
CN202110878340.4A 2021-07-30 2021-07-30 Image processing method and device, training method and computer readable storage medium Pending CN115700726A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110878340.4A CN115700726A (en) 2021-07-30 2021-07-30 Image processing method and device, training method and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110878340.4A CN115700726A (en) 2021-07-30 2021-07-30 Image processing method and device, training method and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115700726A true CN115700726A (en) 2023-02-07

Family

ID=85120825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110878340.4A Pending CN115700726A (en) 2021-07-30 2021-07-30 Image processing method and device, training method and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115700726A (en)

Similar Documents

Publication Publication Date Title
CN110322400B (en) Image processing method and device, image processing system and training method thereof
WO2022110638A1 (en) Human image restoration method and apparatus, electronic device, storage medium and program product
US10311547B2 (en) Image upscaling system, training method thereof, and image upscaling method
CN111311490A (en) Video super-resolution reconstruction method based on multi-frame fusion optical flow
CN109360153B (en) Image processing method, super-resolution model generation method and device and electronic equipment
CN110770784A (en) Image processing apparatus, imaging apparatus, image processing method, program, and storage medium
CN111447359B (en) Digital zoom method, system, electronic device, medium, and digital imaging device
CN110211057B (en) Image processing method and device based on full convolution network and computer equipment
CN111754404B (en) Remote sensing image space-time fusion method based on multi-scale mechanism and attention mechanism
CN111553867B (en) Image deblurring method and device, computer equipment and storage medium
CN113724134B (en) Aerial image blind super-resolution reconstruction method based on residual distillation network
CN112529776A (en) Training method of image processing model, image processing method and device
CN113628115B (en) Image reconstruction processing method, device, electronic equipment and storage medium
CN114494022B (en) Model training method, super-resolution reconstruction method, device, equipment and medium
CN116645598A (en) Remote sensing image semantic segmentation method based on channel attention feature fusion
CN111553861B (en) Image super-resolution reconstruction method, device, equipment and readable storage medium
CN117173412A (en) Medical image segmentation method based on CNN and Transformer fusion network
Pang et al. Lightweight multi-scale aggregated residual attention networks for image super-resolution
JP2019139713A (en) Image processing apparatus, imaging apparatus, image processing method, program and storage medium
CN115004220A (en) Neural network for raw low-light image enhancement
CN111724309B (en) Image processing method and device, training method of neural network and storage medium
CN116385265B (en) Training method and device for image super-resolution network
CN116757934A (en) Image super-resolution reconstruction method, system, storage medium and intelligent terminal
CN115700726A (en) Image processing method and device, training method and computer readable storage medium
CN116266336A (en) Video super-resolution reconstruction method, device, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination