CN114549319A

CN114549319A - Image processing method, image processing apparatus, electronic device, storage medium, and program

Info

Publication number: CN114549319A
Application number: CN202210177467.8A
Authority: CN
Inventors: 姜瑜铭; 陈焯杰; 吕健勤; 刘子纬
Original assignee: Nanyang Technological University; Sensetime International Pte Ltd
Current assignee: Nanyang Technological University; Sensetime International Pte Ltd
Priority date: 2021-05-21
Filing date: 2022-02-25
Publication date: 2022-05-27

Abstract

The embodiment discloses an image processing method, an image processing device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring an image to be processed and a reference image, wherein the resolution of the reference image is higher than that of the image to be processed; processing the image to be processed and the reference image by utilizing a first contrast relationship network to obtain a first corresponding relationship between pixel points in the image to be processed and the reference image; the first comparison relation network is obtained by performing supervised training on the basis of a first relation between the features of the first sample image and the features of the sample reference image and the second sample image; and according to the first corresponding relation, fusing the characteristics of the reference image into the image to be processed to obtain a super-resolution image of the image to be processed.

Description

Image processing method, image processing apparatus, electronic device, storage medium, and program

Technical Field

The present disclosure relates to computer vision processing technologies, and in particular, to an image processing method, an apparatus, an electronic device, a computer storage medium, and a computer program.

Background

Among the related arts, Reference-based Super-Resolution (Ref-SR) technology has attracted much attention. Compared with single-image super-resolution input as a single low-resolution image, the reference-based image super-resolution technology utilizes an extra high-resolution image as a reference image, performs super-resolution processing on a given low-resolution image to obtain a super-resolution image, and in the super-resolution processing process, the reference image can provide references for more image details for the super-resolution processing process.

The key of the reference-based image super-resolution technology is to find the corresponding relationship between the pixels of the low-resolution image and the reference image, however, due to the resolution difference between the low-resolution image and the reference image, it is difficult to accurately find the corresponding relationship between the pixels of the low-resolution image and the reference image, and thus the quality of the generated super-resolution image may be affected.

Disclosure of Invention

The embodiment of the disclosure provides a technical scheme for image processing.

The embodiment of the present disclosure provides an image processing method, including:

acquiring an image to be processed and a reference image, wherein the resolution of the reference image is higher than that of the image to be processed;

processing the image to be processed and the reference image by utilizing a first comparison relationship network to obtain a first corresponding relationship between pixel points in the image to be processed and the reference image; the first contrast relationship network is obtained by performing supervised training based on a first relationship between the features of a first sample image and the features of a sample reference image and a second sample image, and the features of the first sample image and the features of the sample reference image are data obtained based on a trained second contrast relationship network; the resolution of the first sample image and the sample reference image are both higher than the resolution of the second sample image;

and according to the first corresponding relation, fusing the characteristics of the reference image into the image to be processed to obtain a super-resolution image of the image to be processed.

In some embodiments, the training process of the first contrast-relation network comprises:

determining a second relationship between features of the second sample image and features of the sample reference image;

and adjusting the network parameter value of the first contrast relation network according to the first relation and the second relation.

In some embodiments, the first relationship is used to characterize correlation data between features in the first sample image and the sample reference image, and the second relationship is used to characterize correlation data between features in the second sample image and the sample reference image;

the adjusting a network parameter value of the first contrast relationship network according to the first relationship and the second relationship includes:

determining a loss of the first contrast relationship network according to the first relationship and the second relationship; adjusting a network parameter value of the first contrast relationship network according to the loss of the first contrast relationship network; the loss of the first comparative relationship network includes a correlation loss representing a difference between the first relationship and the second relationship.

In some embodiments, the second relationship is further used to characterize distance information between features of pixel points in the second sample image and features of pixel points in the sample reference image;

the determining the loss of the first contrast-relation network according to the first relation and the second relation includes:

determining a triplet loss according to the distance information characterized by the second relationship; determining the loss of the first contrast relationship network according to the first relationship, the second relationship and the triple loss, wherein the loss of the first contrast relationship network also comprises the triple loss.

In some embodiments, at least one set of sample images used to train the second contrast relationship network has a higher resolution than the second sample image.

In some embodiments, the method further comprises:

acquiring the first sample image; performing homography transformation on the first sample image to obtain a sample reference image; and carrying out downsampling processing on the first sample image to obtain the second sample image.

In some embodiments, the fusing the features of the reference image into the image to be processed according to the first corresponding relationship to obtain a super-resolution image of the image to be processed includes:

and aiming at each pixel point in the image to be processed, executing:

determining a target pixel point corresponding to the pixel point in the image to be processed in the reference image according to the first corresponding relation;

determining the position offset of a target pixel point in the reference image according to the position difference between the pixel point and the corresponding target pixel point; determining pixel points of the target pixel points after position offset according to the positions of the target pixel points and the position offset; obtaining reference characteristics, wherein the reference characteristics represent the characteristics of the pixel points of the target pixel points after the position deviation;

and fusing the characteristics of each pixel point in the image to be processed and the reference characteristics corresponding to each pixel point to obtain fusion characteristics, and obtaining a super-resolution image of the image to be processed according to the fusion characteristics.

In some embodiments, the features of the reference image comprise texture features.

In some embodiments, the training data of the second contrast relationship network comprises at least a portion of the images in the first sample image and the sample reference image.

An embodiment of the present disclosure further provides an image processing apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed and a reference image, and the resolution of the reference image is higher than that of the image to be processed;

the first processing module is used for processing the image to be processed and the reference image by utilizing a first comparison relationship network to obtain a first corresponding relationship between the image to be processed and pixel points in the reference image; the first contrast relationship network is obtained by performing supervised training based on a first relationship between the features of a first sample image and the features of a sample reference image and a second sample image, and the features of the first sample image and the features of the sample reference image are data obtained based on a trained second contrast relationship network; the resolution of the first sample image and the sample reference image are both higher than the resolution of the second sample image;

and the second processing module is used for fusing the characteristics of the reference image into the image to be processed according to the first corresponding relation to obtain a super-resolution image of the image to be processed.

In some embodiments, the apparatus further comprises a training module configured to train the first network of relationships using the steps of:

the training module is specifically configured to:

In some embodiments, the obtaining module is further configured to:

In some embodiments, the second processing module is specifically configured to:

and aiming at each pixel point in the image to be processed, executing:

Embodiments of the present disclosure also provide an electronic device comprising a processor and a memory for storing a computer program capable of running on the processor; wherein,

the processor is configured to run the computer program to perform any one of the image processing methods described above.

The disclosed embodiments also provide a computer storage medium having a computer program stored thereon, which when executed by a processor implements any of the image processing methods described above.

The embodiment of the present disclosure further provides a computer program, which includes computer readable codes, and when the computer readable codes are run in an electronic device, a processor in the electronic device executes a method for implementing any one of the image processing methods.

In the image processing method, the image processing device, the electronic device and the computer storage medium provided by the embodiment of the disclosure, first, an image to be processed and a reference image are obtained, wherein the resolution of the reference image is higher than that of the image to be processed; then, processing the image to be processed and the reference image by utilizing a first comparison relationship network to obtain a first corresponding relationship between pixel points in the image to be processed and the reference image; the first contrast relation network is obtained by performing supervised training on the basis of a first relation between the features of the first sample image and the features of the sample reference image and the second sample image, and the features of the first sample image and the features of the sample reference image are data obtained on the basis of the trained second contrast relation network; the resolution of the first sample image and the sample reference image are both higher than the resolution of the second sample image; and finally, according to the first corresponding relation, fusing the characteristics of the reference image into the image to be processed to obtain a super-resolution image of the image to be processed.

It can be seen that, in the embodiment of the present disclosure, the first contrast relationship network may be trained based on the relationship between the features of the first sample image and the features of the sample reference image, and the resolutions of the first sample image and the sample reference image are both higher than the resolution of the second sample image, that is, in the embodiment of the present disclosure, the first contrast relationship network may be trained based on the relationship between two images with higher resolutions, and compared with the matching relationship between a low resolution image and a high resolution image, the matching relationship between the two images with higher resolutions may provide more accurate and comprehensive matching information, so that, based on the first relationship between the features of the first sample image and the features of the sample reference image and the second sample image, supervised training is performed, which is beneficial to enabling the trained first contrast relationship network to more accurately obtain the corresponding relationship between the pixels in the image to be processed and the reference image, and then on the basis of accurately obtaining the corresponding relation between the pixels in the image to be processed and the reference image, the quality of the super-resolution image of the image to be processed is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart of an image processing method of an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a training procedure of a first contrast relationship network and a second contrast relationship network in an embodiment of the present disclosure;

FIG. 3 is a block flow diagram of super resolution processing using a recovery network in an embodiment of the disclosure;

FIG. 4A is a schematic diagram of a correspondence between an image to be processed and a reference image in an embodiment of the present disclosure and related technologies;

FIG. 4B is a super-resolution image of a to-be-processed image in the related art;

FIG. 4C is a super-resolution image of a to-be-processed image in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a component structure of an image processing apparatus according to an embodiment of the disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The present disclosure will be described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the examples provided herein are merely illustrative of the present disclosure and are not intended to limit the present disclosure. In addition, the embodiments provided below are some embodiments for implementing the disclosure, not all embodiments for implementing the disclosure, and the technical solutions described in the embodiments of the disclosure may be implemented in any combination without conflict.

It should be noted that, in the embodiments of the present disclosure, the terms "comprises," "comprising," or any other variation thereof are intended to cover a non-exclusive inclusion, so that a method or apparatus including a series of elements includes not only the explicitly recited elements but also other elements not explicitly listed or inherent to the method or apparatus. Without further limitation, the use of the phrase "including a. -. said." does not exclude the presence of other elements (e.g., steps in a method or elements in a device, such as portions of circuitry, processors, programs, software, etc.) in the method or device in which the element is included.

For example, the image processing method provided by the embodiment of the present disclosure includes a series of steps, but the image processing method provided by the embodiment of the present disclosure is not limited to the described steps, and similarly, the image processing apparatus provided by the embodiment of the present disclosure includes a series of modules, but the apparatus provided by the embodiment of the present disclosure is not limited to include the explicitly described modules, and may also include modules that are required to be configured for acquiring relevant information or performing processing based on the information.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

The disclosed embodiments may be implemented in computer systems comprising terminals and/or servers and may be operational with numerous other general purpose or special purpose computing system environments or configurations. Here, the terminal may be a thin client, a thick client, a hand-held or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronics, a network personal computer, a small computer system, etc., and the server may be a small computer system, a large computer system, a distributed cloud computing environment including any of the above, etc.

The electronic devices such as the terminal and the server can realize corresponding functions through the execution of the program modules. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so forth. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

The key of the image super-resolution technology based on reference lies in finding the corresponding relation between the pixel points of the low-resolution image and the reference image, and the low-resolution image and the high-resolution image which is taken as the reference image can be implicitly matched in the related technology, so as to determine the corresponding relation between the pixel points of the low-resolution image and the reference image, wherein the corresponding relation is obtained according to the content similarity and the appearance similarity of the images; after the correspondence is determined, super-resolution processing may be performed on the low-resolution image based on the correspondence.

However, due to the difference in resolution between the low-resolution image and the high-resolution image serving as the reference image, it is difficult to accurately obtain the correspondence between the pixel points of the low-resolution image and the reference image; here, since the amounts of information included in the low-resolution image and the high-resolution image that is the reference image are not balanced, the high-resolution image is generally subjected to down-sampling processing to obtain a down-sampled image, and the down-sampled image has the same resolution as the low-resolution image. However, the down-sampling process inevitably results in information loss, for example, information of a fine texture region contained in a high-resolution image is lost, and thus it is difficult to accurately obtain the correspondence between the pixel points of the low-resolution image and the reference image.

In view of the above technical problems, the present disclosure provides an image processing method to perform super-resolution processing on various types of low-resolution images, where the low-resolution images may include, but are not limited to, scene images (e.g., outdoor scene images, indoor scene images), object images (e.g., building images, animal and plant images), and the like.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the disclosure, and as shown in fig. 1, the flowchart may include:

step 101: and acquiring an image to be processed and a reference image, wherein the resolution of the reference image is higher than that of the image to be processed.

Here, the image to be processed may be an image that needs to be subjected to super-resolution processing; for example, the image to be processed may be acquired from a local storage area or through a network, for example, the image to be processed may be an image captured by a mobile terminal or a camera, etc.; the format of the image to be processed can be Joint Photographic Experts GROUP (JPEG), Bitmap (BMP), Portable Network Graphics (PNG) or other formats; it should be noted that, the format and the source of the image to be processed are only exemplified here, and the embodiment of the present disclosure does not limit the format and the source of the image to be processed.

In the embodiment of the present disclosure, the reference image is a pre-acquired image, and in a first example, the reference image whose similarity to the image to be processed is greater than a similarity threshold may be selected from a pre-stored image set, where the similarity threshold may be an empirically set numerical value, for example, the similarity threshold is 80%, 85%, or 90%; in a second example, a reference image that is similar to the image to be processed may be manually selected from a pre-stored image set; in a third example, a plurality of images may be searched from a search engine in a graph searching manner based on an image to be processed, and a reference image with a similarity greater than a similarity threshold with the image to be processed may be selected from the plurality of searched images, or a reference image more similar to the image to be processed may be manually selected.

Step 102: processing the image to be processed and the reference image by utilizing a first contrast relationship network to obtain a first corresponding relationship between pixel points in the image to be processed and the reference image; the first contrast relation network is obtained by performing supervised training on the basis of a first relation between the features of the first sample image and the features of the sample reference image and the second sample image, and the features of the first sample image and the features of the sample reference image are data obtained on the basis of the trained second contrast relation network; the resolution of the first sample image and the sample reference image are both higher than the resolution of the second sample image.

In the embodiment of the present disclosure, the first contrast relationship network may be a neural network that employs a deep learning technique; in some embodiments, the first contrast relationship network comprises a feature extraction sub-network for extracting image features; illustratively, the feature extraction sub-network comprises a first feature extraction unit and a second feature extraction unit, wherein the first feature extraction unit is used for extracting the features of the image to be processed, and the second feature extraction unit is used for extracting the features of the reference image; for example, each feature extraction unit may include a convolutional layer and a pooling layer.

The first feature extraction unit and the second feature extraction unit may be two different network units, for example, network units with the same network structure but different network parameters, and the network parameters may be weights or other parameters, considering that the resolutions of the image to be processed and the reference image are different. Of course, the network structures of the two network units may also be different, and are not limited herein.

The first comparison relationship network can also comprise a feature processing sub-network, and after the features of the image to be processed and the features of the reference image are extracted, the features of the image to be processed and the features of the reference image can be processed by using the feature processing sub-network to obtain a first corresponding relationship between the pixels in the image to be processed and the reference image; illustratively, a feature processing sub-network may include a convolutional layer, an active layer, and an output layer.

It can be seen that, in order to train the first contrast relationship network, the second contrast relationship network needs to be trained in advance, and then the trained second contrast relationship network is used to process the input first sample image and the sample reference image, so as to obtain the first relationship between the features of the first sample image and the features of the sample reference image. That is, the input data in the second contrast relationship network is different from the input data in the first contrast relationship network, the input data of the second contrast relationship network includes the first sample image and the sample reference image, the first sample image and the sample reference image are two images with higher resolution (both higher resolution than the second sample image); the input data of the first contrast network comprises an image to be processed and a reference image.

In the embodiment of the present disclosure, since it is necessary to determine the first relationship between the features of the first sample image and the features of the sample reference images, the number of sample reference images may be the same as the number of first sample images. Of course, different first sample images may also correspond to the same sample reference image, i.e. the number of first sample images may also be more than the number of sample reference images.

The second contrast network may include a feature extraction sub-network, the feature extraction sub-network in the second contrast network being used to extract features of the first sample image and features of the sample reference image, and the structure of the feature extraction sub-network in the second contrast network may be the same as or different from that of the feature extraction sub-network in the first contrast network.

The following describes how to acquire the training data of the first sample image, the second sample image, the sample reference image, and the second contrast relationship network.

In a first example, a sample reference image may be acquired after acquiring the first sample image or the second sample image, in the manner described above for acquiring the reference image.

In the second example, the training data of the first sample image, the second sample image, the sample reference image, or the second contrast relationship network may be obtained from a common data set, or the training data of the first sample image, the second sample image, the sample reference image, or the second contrast relationship network may be obtained from a local storage area, for example, the common data set may be a CUFED5 data set, a SUN data set, a comic (Manga)109 data set, or the like.

In a third example, at least part of the images in the first sample image and the sample reference image may be directly used as training data of the second contrast relationship network after the first sample image and the sample reference image are acquired; an entirely different image from the first sample image and the sample reference image may also be used as training data for the second contrast network.

Considering that the second contrast relationship network needs to obtain the features of the two images, the training data of the second contrast relationship network may include at least one set of sample images, and each set of sample images in the at least one set of sample images may include two sample images, i.e., a first sample image and a sample reference image corresponding to the first sample image.

As can be seen from the foregoing description, the second contrast relationship network needs to output the relationship between the features of the two images with higher resolutions, that is, the second contrast relationship network needs to output the first relationship between the features of the first sample image and the features of the sample reference image, and thus, as an example, the resolution of at least one group of sample images used for training the second contrast relationship network is higher than the resolution of the second sample image.

Understandably, since the second contrast relationship network is trained based on two images with higher resolutions, and the resolutions of the first sample image and the sample reference image are both higher than that of the second sample image, that is, the first sample image and the sample reference image are both images with higher resolutions; therefore, the trained second contrast relationship network is beneficial to accurately extracting corresponding features from the first sample image and the sample reference image, and further beneficial to accurately obtaining the first relationship between the features of the first sample image and the features of the sample reference image.

With reference to the above description, the following describes an exemplary training process of the second contrast relationship network.

The training process of the second contrast relationship network may include: inputting the training data of the second contrast relationship network into the second contrast relationship network, and processing each group of sample images in the training data by using the second contrast relationship network to obtain the characteristics of each sample image in each group of sample images; and then, adjusting the network parameter value of the second contrast relationship network according to the corresponding relationship between the pixel points of the two sample images in each group of sample images and the characteristics of each sample image in each group of sample images.

It can be seen that, in order to train the second comparison relationship network, the corresponding relationship between the pixel points of the two sample images in each group of sample images needs to be known in advance. For example, one sample image of each set of sample images may be homomorphically transformed to obtain another sample image of each set of sample images; here, the homographic transformation may include at least one of rotation and scaling; the homography transformation can reflect the corresponding relation of the pixel points between the images, so that the corresponding relation between the pixel points of two sample images in each group of sample images can be determined.

One implementation of adjusting the network parameter values of the second contrast relationship network may be: obtaining the loss of a second contrast relationship network according to the corresponding relationship between the pixel points of the two sample images in each group of sample images and the characteristics of each sample image in each group of sample images; and adjusting the network parameter value of the second contrast relationship network according to the loss of the second contrast relationship network.

In the embodiment of the disclosure, under the condition that the second contrast relationship network after the adjustment of the network parameter value does not meet the training end condition, the steps of obtaining the characteristics of each sample image in each group of sample images, obtaining the loss of the second contrast relationship network, and adjusting the network parameter value of the second contrast relationship network according to the loss of the second contrast relationship network are re-executed; and under the condition that the second comparison relationship network after the adjustment of the network parameter value meets the training end condition, taking the second comparison relationship network after the adjustment of the network parameter value as the trained second comparison relationship network.

The training end condition may be that the iteration number of the training process of the second contrast relationship network reaches the set number, or that the processing of the image by the second contrast relationship network after the network parameter value is adjusted meets the set precision requirement; here, the number of times of setting may be a value that is empirically set, for example, the number of times of setting may be set with reference to a feature extraction network similar to the second contrast relationship network; the set accuracy requirement may be related to a loss of the second contrast network, for example, the set accuracy requirement may be that the loss of the second contrast network is smaller than the set loss, and the set loss may be set with reference to a feature extraction network similar to the second contrast network.

Step 103: and according to the first corresponding relation, fusing the characteristics of the reference image into the image to be processed to obtain a super-resolution image of the image to be processed.

The features of the reference image may comprise texture features, such that the super-resolution image of the image to be processed may comprise texture features of the reference image.

Step 103 can be implemented based on a recovery network, specifically, the features of the image to be processed and the features of the reference image can be input into the recovery network after training, in the recovery network, the features of the reference image are fused into the image to be processed according to the first corresponding relationship to obtain a super-resolution image of the image to be processed, and the recovery network can output the super-resolution image of the image to be processed.

In practical applications, the steps 101 to 103 may be implemented by a Processor in an electronic Device, where the Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor.

It can be seen that, in the embodiment of the present disclosure, the first contrast relationship network may be trained based on a relationship between the features of the first sample image and the features of the sample reference image, and the resolutions of the first sample image and the sample reference image are both higher than the resolution of the second sample image, that is, in the embodiment of the present disclosure, the first contrast relationship network may be trained based on a relationship between two images with higher resolutions, and compared with a matching relationship between a low resolution image and a high resolution image, the matching relationship between the two images with higher resolutions may provide more accurate matching information, so that, based on the first relationship between the features of the first sample image and the features of the sample reference image and the second sample image, the supervised training is performed, which is beneficial to enabling the trained first contrast relationship network to more accurately obtain a corresponding relationship between pixels in the image to be processed and the reference image (i.e., the corresponding relationship between pixels in the lower resolution image and the pixels in the higher resolution image) The corresponding relation between the pixels in the image to be processed and the reference image) and then on the basis of accurately obtaining the corresponding relation between the pixels in the image to be processed and the reference image, the quality of the super-resolution image of the image to be processed is improved.

By adopting the technical scheme of the embodiment of the disclosure, under the condition that the low-resolution image needs to be restored, the super-resolution processing can be carried out by means of the reference image similar to the scene of the low-resolution image, and the texture (such as characters, trees and the like) in the low-resolution image can be restored through accurate relation matching of the low-resolution image and the high-resolution reference image.

The training process of the first contrast-relation network is exemplified below.

The training process of the first contrast relationship network may include: determining a second relationship between the features of the second sample image and the features of the sample reference image; and adjusting the network parameter value of the first contrast relationship network according to the first relationship and the second relationship.

In the embodiments of the present disclosure, the second relationship may be distance information, similarity information, or other information between the feature of the second sample image and the feature of the sample reference image. The number of sample reference images may be the same as the number of second sample images due to the need to determine a second relationship between the features of the second sample images and the features of the sample reference images.

It can be seen that embodiments of the present disclosure may train a first contrast relationship network in conjunction with a first relationship and a second relationship, the first relationship representing a relationship between features of a first sample image and features of a sample reference image, the first sample image and the sample reference image both having a higher resolution than a resolution of a second sample image, the second relationship representing a relationship of the second sample image of lower resolution and the sample reference image of higher resolution, thus, the disclosed embodiments train the first contrast relationship network based on the relationship between the two higher resolution images, and the relationship between the lower resolution second sample image and the higher resolution sample image, facilitating the fusion of the matching relationship between the two high resolution images into the second relationship, therefore, the trained first comparison relationship network can more accurately obtain the corresponding relationship between the pixel points in the image to be processed and the reference image; that is to say, under the condition that the second contrast relationship network is used as a teacher network and the first contrast relationship network is used as a student network, the characteristics extracted by the teacher network can be distilled and migrated into the student network through knowledge, and the student network can be supervised and trained through the teacher network, so that the trained first contrast relationship network can accurately obtain the corresponding relationship between the pixel points in the image to be processed and the reference image.

The first relationship described above may be used to characterize correlation data between features in the first sample image and the sample reference image, and the second relationship may be used to characterize correlation data between features in the second sample image and the sample reference image.

Accordingly, adjusting the network parameter value of the first contrast-relation network according to the first relation and the second relation may include: determining the loss of the first contrast relationship network according to the first relationship and the second relationship; adjusting a network parameter value of the first contrast-relationship network based on the loss of the first contrast-relationship network.

Here, the loss of the first contrast-relation network may include a correlation loss representing a difference between the first relation and the second relation.

In the embodiment of the present disclosure, for each feature of the second sample image, correlation data between the feature of the second sample image and the feature of the sample reference image may be determined, where, for example, there are N features in the first sample image and the second sample image, and there are M features in the sample reference image, and correlation data between the feature of the second sample image and the feature of the sample reference image may be obtained by calculating correlation data between each feature of the second sample image and each feature of the sample reference image, and the correlation data between the feature of the second sample image and the feature of the sample reference image may be represented by a matrix with a size of N × M, where N and M are integers greater than 1; each feature in the first sample image, the second sample image, or the reference image may be a feature in the form of a vector, for example.

For example, the correlation data between the feature of the pixel point p in the second sample image and the feature of the pixel point q in the sample reference image can be calculated according to formula (1):

wherein, cor_pqData representing the correlation of the characteristics of the pixel p in the second sample image with the characteristics of the pixel q in the sample reference image, f_pFeatures representing a pixel point p in the second sample image, f_qFeatures representing pixel points q in a sample reference image, f_kThe characteristic of a pixel point k in the sample reference image is represented, I' represents a set of pixel points in the sample reference image, and τ is a preset parameter, for example, τ is 0.14, 0.15, or 0.16.

In the embodiment of the present disclosure, each pixel in the first sample image may be calculated by referring to the calculation method of the above formula (1)The correlation data of the characteristics of the points and the characteristics of each pixel point in the sample reference image so as to determine the correlation data between the characteristics in the first sample image and the sample reference image; here, the correlation data between the features of the first sample image and the features of the sample reference image may be written as

The correlation data between the features of the second sample image and the features of the sample reference image can be noted as

The loss of correlation may represent any of: KL divergence (Kullback-Leibler divergence), JS divergence (Jensen-Shannon divergence), Cross Entropy (Cross Encopy), and Wasserstein distance.

In the case where the correlation loss represents the KL divergence, the correlation loss can be calculated according to equation (2) and equation (3):

wherein,

representing dependency data

Correlation data of the features of pixel p in the first sample image with the features of pixel k in the sample reference image,

representing dependency data

The correlation data of the characteristics of the pixel point p in the second sample image and the characteristics of the pixel point k in the sample reference image, I represents the set of the pixel points in the second sample image, and L_klIndicating a loss of correlation.

It can be understood that, since the correlation loss represents a difference between the first relationship and the second relationship, and the first relationship is used to represent correlation data between features in the first sample image and the sample reference image, training the first contrast relationship network through the correlation loss is beneficial to migrating the matching relationship between the first sample image and the sample reference image to the matching relationship between the second sample image and the sample reference image through knowledge distillation, so that the trained first contrast relationship network can more accurately obtain the corresponding relationship between pixel points in the image to be processed with the lower resolution and the reference image with the higher resolution.

In this disclosure, the data in the second relationship may not be limited to the correlation data between the features in the second sample image and the sample reference image, for example, the second relationship may also be used to represent distance information between the features of the pixel points in the second sample image and the features of the pixel points in the sample reference image.

Accordingly, determining the loss of the first contrast-relation network based on the first relation and the second relation may include: determining the triple loss according to the distance information represented by the second relation; determining the loss of the first contrast relationship network according to the first relationship, the second relationship and the triple loss, wherein the loss of the first contrast relationship network also comprises the triple loss.

In the disclosed embodiment, the triplet penalty is determined from the distance between the features of the positive sample pair and the distance between the features of the negative sample pair; taking the pixel point p of the second sample image as an example, a positive sample pair and a negative sample pair are described below, where the positive sample pair may include the pixel point p of the second sample image and a pixel point p 'in the sample reference image, and in the positive sample pair, the pixel point p' in the sample reference image is a corresponding pixel point of the pixel point p of the second sample image; the negative sample pair includes a pixel point p of the second sample image and a pixel point k in the sample reference image, and in the negative sample pair, the distance between the pixel point k in the sample reference image and the pixel point p' in the sample reference image is greater than a set threshold T, for example, the set threshold T may be 3.5, 4, or 4.5; here, the pixel point p of the second sample image, the pixel point p' in the sample reference image, and the pixel point k in the sample reference image constitute a triplet.

The triplet penalty can be calculated according to equation (4):

wherein L is_marginRepresenting the triplet loss, m represents a preset margin value, for example, m may be 0.9, 1 or 1.1, pos (p) may be calculated from the characteristic distance between pairs of positive samples, neg (p) may be calculated from the characteristic distance between pairs of negative samples.

In some embodiments, pos (p) may be calculated according to equation (5):

wherein f is_p'Representing the characteristics of pixel point p' in the sample reference image, | · | | non-calculation₂Representing a two-norm.

In some embodiments, neg (p) may be calculated according to equation (6):

wherein | · | purple sweet_∞And expressing an infinite-norm, wherein k-p 'expresses the position difference between a pixel point k in the sample reference image and a pixel point p' in the sample reference image, and k-p expresses the position difference between the pixel point k in the second sample image and the pixel point p in the second sample image.

In the related art, the same object or similar texture pattern generally exists in both the low-resolution image and the high-resolution image as the reference image, but there is a large difference in appearance and content between the low-resolution image and the high-resolution image as the reference image due to a transformation relationship (e.g., a scaling transformation and a rotation transformation) between the low-resolution image and the high-resolution image as the reference image; under the circumstance, the corresponding relation of pixel points between the images calculated only through appearance similarity and content similarity is often inaccurate, and high-definition resolution textures in the high-resolution images cannot be accurately migrated. In the disclosed embodiment, however, since the triplet loss is positively correlated with the characteristic distance between the positive sample pair, negatively correlated with the characteristic distance between the negative sample pair, therefore, the training of the first comparison relationship network based on the triple loss is beneficial to enabling the trained first comparison relationship network to find out the pixel point which is closest to the characteristic distance of each pixel point of the low-resolution image to be processed in the high-resolution image as the reference image, for the transformation relation between the low-resolution image to be processed and the reference image, the embodiment of the disclosure can accurately and reliably obtain the corresponding relation between the pixel points of the low-resolution image to be processed and the high-resolution image serving as the reference image, and furthermore, the pixel matching error caused by the transformation relation between the low-resolution image and the high-resolution image serving as the reference image is favorably reduced.

It can be seen from the above description that the triple loss is determined according to the distance between the features of the positive sample pair and the distance between the features of the negative sample pair, and both the positive sample pair and the negative sample pair need to be determined according to the correspondence between the pixel points in the second sample image and the sample reference image, that is, the real correspondence between the pixel points in the sample reference image and the pixel points in the second sample image needs to be predetermined.

In order to facilitate the determination of the real corresponding relationship between the pixel points in the sample reference image and the pixel points in the second sample image, the homography transformation can be performed on the first sample image after the first sample image is obtained to obtain a sample reference image; and carrying out downsampling processing on the first sample image to obtain a second sample image.

As can be understood, the sample reference image is an image obtained by performing homographic transformation on the first sample image, so that the real corresponding relationship between the pixel points in the first sample image and the sample reference image can be directly determined; the second sample image is an image obtained by performing downsampling processing on the first sample image, so that the real corresponding relation between pixel points in the first sample image and the second sample image can be directly determined; further, according to the real corresponding relationship between the pixel points in the first sample image and the sample reference image and the real corresponding relationship between the pixel points in the first sample image and the second sample image, the real corresponding relationship between the pixel points in the second sample image and the sample reference image can be determined.

The loss L of the first contrast network includes the correlation loss L_klAnd triplet loss L_marginIn this case, the loss L of the first contrast network can be calculated according to equation (7):

L＝L_margin+α_kl·L_kl (7)

wherein alpha is_klRepresents the loss of correlation L_klWeight of (a), a_klCan be preset according to actual requirements, and in one example, alpha is_klMay be 14, 15 or 16; in another example, a larger α may be set in a scene with a larger difference between the resolution of the image to be processed and the resolution of the reference image_klTaking the value of (A); the method can set smaller alpha in the scene with larger resolution difference between the image to be processed and the reference image_klThe value of (a). The learning rate of the first contrast relationship network can be determined by trying to take values, that is, the learning rate of the first contrast relationship network can be set to a plurality of values, the first contrast relationship network can be trained for each value of the learning rate of the first contrast relationship network, and then the first contrast relationship network is tested through test data to obtain a test result of the first contrast relationship network; finally, learning from the first network of comparison relationships can be basedThe value of the learning rate of the first contrast-relation network is determined according to the test result corresponding to each value of the rate, for example, the learning rate of the first contrast-relation network may be 10^-3。

Certainly, as an implementation manner, the network loss of the second contrast relationship network may include a triple loss, and the triple loss of the second contrast relationship network is used to measure a distance between pixel point features of two sample images in each group of sample images; the calculation method of the triplet loss in the second contrast relationship network may refer to the calculation method of the triplet loss in the first contrast relationship network, and details are not described here.

The following describes a training procedure of the first contrast relationship network in the embodiment of the present disclosure with reference to the drawings.

Fig. 2 is a schematic diagram of a training process of a first contrast relationship network and a second contrast relationship network in an embodiment of the present disclosure, and referring to fig. 2, after a trained second contrast relationship network is obtained, a feature of a first sample image and a feature of a sample reference image may be extracted by using the second contrast relationship network, so as to determine correlation data between the feature of the first sample image and the feature of the sample reference image

In the embodiment of the present disclosure, the triplet loss L in the first contrast network may be calculated with reference to the above equations (4) to (6)_marginAnd correlation data between the features of the second sample image and the features of the sample reference image may be determined from the features of the second sample image and the features of the sample reference image

In deriving correlation data

And correlation data

Then, the correlation loss L can be calculated from the formula (2) and the formula (3)_klThe loss L of the first contrast network can then be calculated according to equation (7).

According to the loss L of the first comparison relationship network, the first comparison relationship network can be trained to obtain the trained first comparison relationship network.

In some embodiments, the implementation manner of step 103 may be:

aiming at each pixel point in the image to be processed, executing the following steps: determining a target pixel point corresponding to the pixel point in the image to be processed in the reference image according to the first corresponding relation; determining the position offset of a target pixel point in the reference image according to the position difference between the pixel point and the corresponding target pixel point; determining pixel points of the target pixel points after position offset according to the positions of the target pixel points and the position offset; obtaining reference characteristics, wherein the reference characteristics represent the characteristics of the pixel points of the target pixel points after the position deviation;

after the reference feature corresponding to each pixel point is obtained, the feature of each pixel point in the image to be processed and the reference feature corresponding to each pixel point can be fused to obtain a fusion feature, and the super-resolution image of the image to be processed is obtained according to the fusion feature.

In some embodiments, the amount of position offset of a target pixel can be one or more, so that for each pixel point, under the condition that the amount of position offset is one, the number of pixel points of the target pixel point after position offset is one, and further, a reference feature can be obtained; for each pixel point, under the condition that the number of the position offset amounts is K, the number of pixel points of the target pixel point after the position offset is K, and then K reference features can be obtained, where K is an integer greater than 1, that is, the feature of each pixel point and the plurality of reference features can be fused in the embodiment of the present disclosure.

In some embodiments, the difference in position between each pixel point and the corresponding target pixel pointCan be represented by p₀Is represented by the formula p₀And p-p 'represents the position difference between the pixel point p in the image to be processed and the corresponding target pixel point p'.

In some embodiments, for a pixel point p in the image to be processed, the reference feature y (p) may be calculated according to formula (8):

wherein, w_kRepresenting the convolution kernel weight, Δ p_kDenotes a learnable offset, Δ m_kRepresenting a learnable modulation scalar, p + p₀+p_k+Δp_kRepresenting pixel point p offset p₀+p_k+Δp_kThe latter position, x (p + p)₀+p_k+Δp_k) Representing the position p + p in the reference image₀+p_k+Δp_kThe original features of (i.e., the features extracted using the first comparative network). p is a radical of_kFor a predetermined amount of positional deviation, e.g. p_kThe values can be taken according to equation (9):

p_k∈{(-1,-1),(-1,0),(-1,1),(0,-1),(0,0),(0,1),(1,-1),(1,0),(1,1)} (9)

in some embodiments, referring to fig. 3, the image to be processed and the reference image may be input to the trained first contrast relationship network, and the image to be processed and the reference image are processed by using the first contrast relationship network, so as to obtain a first corresponding relationship between pixel points in the image to be processed and the reference image.

After obtaining the first corresponding relationship, the first corresponding relationship may be input to a recovery network, in which the following steps are performed: determining pixel points of the target pixel points after position deviation to obtain reference characteristics; and fusing the characteristics of each pixel point in the image to be processed and the reference characteristics corresponding to each pixel point to obtain fusion characteristics, and obtaining the super-resolution image of the image to be processed according to the fusion characteristics.

In some embodiments, the feature of each pixel point in the image to be processed and the reference feature corresponding to each pixel point are fused, and the fused feature may be processed by a residual block to obtain a super-resolution image of the image to be processed.

In the disclosed embodiment, the recovery network may be a Deep convolutional neural network (Deep)&Cross Network, DCN) or other types of neural networks; in some embodiments, the recovery network may be trained on losses of the recovery network, which may include at least one of: reconstruction loss (reconstruction loss) L_recPerceptual loss (perceptual loss) L_perAntagonistic loss (antagonistic loss) L_adv(ii) a Illustratively, the reconstruction loss L_recThe norm in (1) may be the norm, the perceptual loss L_perCan be calculated from the VGG characteristics.

In some embodiments, the reconstruction loss L_recAnd a loss of perception L_perAnd loss of antagonism L_advThe weight of each loss can be determined by trying to take a value, that is, the weight of each loss can be set to a plurality of values, a recovery network can be trained aiming at each value of each loss, and then the recovery network is tested through test data to obtain a test result of the recovery network; finally, the value of each loss can be determined according to the test result corresponding to each value of each loss, for example, the loss L is reconstructed_recAnd a loss of perception L_perAnd loss of antagonism L_advAre respectively weighted to be 1.0 and 10^-4And 10^-6,. Similarly, the learning rate of the training recovery network can also be determined by trying to take a value, for example, the learning rate of the training recovery network can be 10^-4。

In some embodiments, Δ p_kAnd Δ m_kThe network parameters in the recovery network can be adjusted, and the parameter delta p can be adjusted in the process of training the recovery network_kAnd Δ m_k(ii) a Using a learnable offset Δ p_kThe information of surrounding pixels of the target pixel can be dynamically utilized, and a learnable modulation scalar delta m can be utilized_kFeatures in the reference image can be summarized with reasonable weight.

It can be seen that the position difference between each pixel point and the corresponding target pixel point is favorable for accurately determining the reference feature, and the reference feature is the feature of the pixel points around the target pixel point, so that the feature of each pixel point in the image to be processed is fused with the corresponding reference feature, the feature of the pixel points around the target pixel point is favorable for being migrated to the image to be processed, and further, the quality of the super-resolution image of the image to be processed is favorable for being improved.

The following describes the effects of the super-resolution processing method in the embodiment of the present disclosure and the related art by comparing them with each other with the accompanying drawings.

Fig. 4A is a schematic diagram of a correspondence relationship between an image to be processed and a reference image in the embodiment of the present disclosure and the related art; referring to fig. 4A, a first connection line 401 represents a corresponding relationship between pixel points of the image to be processed and the reference image in the embodiment of the present disclosure, and a second connection line 402 represents a corresponding relationship between pixel points of the image to be processed and the reference image in the related art, it can be seen that the corresponding relationship represented by the second connection line 402 is not accurate, and the corresponding relationship represented by the first connection line 401 is more accurate.

Fig. 4B is a super-resolution image of an image to be processed in the related art, and fig. 4C is a super-resolution image of an image to be processed in the embodiment of the present disclosure.

The super-resolution processing method according to the embodiment of the present disclosure may include at least method 1 and method 2. Wherein, the loss function adopted in the training process of the method 1 comprises reconstruction loss L_recAnd a loss of perception L_perAnd loss of antagonism L_adv(ii) a The loss function used in the training process of method 2 is simply the reconstruction loss L_rec. Whichever loss function is adopted, more texture features in the high-resolution image can be effectively fused into the low-resolution image to obtain the super-resolution image.

It will be understood by those of skill in the art that in the above method of the present embodiment, the order of writing the steps does not imply a strict order of execution and does not impose any limitations on the implementation, as the order of execution of the steps should be determined by their function and possibly inherent logic.

On the basis of the image processing method proposed by the foregoing embodiment, an embodiment of the present disclosure proposes an image processing apparatus.

Fig. 5 is a schematic diagram of a composition structure of an image processing apparatus according to an embodiment of the disclosure, and as shown in fig. 5, the apparatus may include: an obtaining module 701, a first processing module 702 and a second processing module 703.

An obtaining module 701, configured to obtain an image to be processed and a reference image, where a resolution of the reference image is higher than a resolution of the image to be processed;

a first processing module 702, configured to process the to-be-processed image and the reference image by using a first contrast relationship network, so as to obtain a first correspondence relationship between pixel points in the to-be-processed image and the reference image; the first contrast relationship network is obtained by performing supervised training based on a first relationship between the features of a first sample image and the features of a sample reference image and a second sample image, and the features of the first sample image and the features of the sample reference image are data obtained based on a trained second contrast relationship network; the resolution of the first sample image and the sample reference image are both higher than the resolution of the second sample image;

a second processing module 703 is configured to fuse the features of the reference image into the image to be processed according to the first corresponding relationship, so as to obtain a super-resolution image of the image to be processed.

the training module is specifically configured to:

In some embodiments, the obtaining module 701 is further configured to:

In some embodiments, the second processing module 703 is specifically configured to:

and aiming at each pixel point in the image to be processed, executing:

In practical applications, the obtaining module 701, the first processing module 702, and the second processing module 703 may be implemented by a processor in a computer device, where the processor may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller, and a microprocessor.

In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a Processor (Processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Specifically, the computer program instructions corresponding to an image processing method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disk, a usb disk, or the like, and when the computer program instructions corresponding to an image processing method in the storage medium are read or executed by an electronic device, any one of the image processing methods of the foregoing embodiments is implemented.

Based on the same technical concept of the foregoing embodiment, referring to fig. 6, it illustrates an electronic device 80 provided by an embodiment of the present disclosure, which may include: a memory 801 and a processor 802; wherein,

the memory 801 is used for storing computer programs and data;

the processor 802 is configured to execute the computer program stored in the memory to implement any one of the image processing methods of the foregoing embodiments.

In practical applications, the memory 801 may be a volatile memory (RAM); or a non-volatile memory (non-volatile memory) such as a ROM, a flash memory (flash memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 802.

The processor 802 may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller, and a microprocessor. It is understood that the electronic devices for implementing the above-described processor functions may be other devices, and the embodiments of the present disclosure are not particularly limited.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, which are not repeated herein for brevity

The methods disclosed in the method embodiments provided by the present application can be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in various product embodiments provided by the application can be combined arbitrarily to obtain new product embodiments without conflict.

The features disclosed in the various method or apparatus embodiments provided herein may be combined in any combination to arrive at new method or apparatus embodiments without conflict.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An image processing method, characterized in that the method comprises:

2. The method of claim 1, wherein the training process of the first contrast-relation network comprises:

3. The method of claim 2, wherein the first relationship is used to characterize correlation data between features in the first sample image and the sample reference image, and the second relationship is used to characterize correlation data between features in the second sample image and the sample reference image;

4. The method of claim 3, wherein the second relationship is further used to characterize distance information between features of pixels in the second sample image and features of pixels in the sample reference image;

5. The method according to any one of claims 1 to 4, wherein a resolution of at least one set of sample images used for training the second contrast relationship network is higher than a resolution of the second sample image.

6. The method according to any one of claims 1 to 5, further comprising:

7. The method according to any one of claims 1 to 6, wherein the fusing the features of the reference image into the image to be processed according to the first corresponding relationship to obtain a super-resolution image of the image to be processed comprises:

and aiming at each pixel point in the image to be processed, executing:

8. The method of any of claims 1 to 7, wherein the features of the reference image comprise texture features.

9. The method of any one of claims 1 to 8, wherein the training data of the second contrast relationship network comprises at least a portion of the images in the first sample image and the sample reference image.

10. An image processing apparatus, characterized in that the apparatus comprises:

11. The apparatus of claim 10, further comprising a training module configured to train the first network of relationships using the steps of:

12. The apparatus of claim 11, wherein the first relationship is used to characterize correlation data between features in the first sample image and the sample reference image, and wherein the second relationship is used to characterize correlation data between features in the second sample image and the sample reference image;

the training module is specifically configured to:

13. The apparatus of claim 12, wherein the second relationship is further used to characterize distance information between features of pixels in the second sample image and features of pixels in the sample reference image;

the training module is specifically configured to:

14. The apparatus according to any one of claims 10 to 13, wherein a resolution of at least one set of sample images used for training the second contrast relationship network is higher than a resolution of the second sample image.

15. The apparatus according to any one of claims 10 to 14, wherein the obtaining module is further configured to:

16. The apparatus according to any one of claims 10 to 15, wherein the second processing module is specifically configured to:

and aiming at each pixel point in the image to be processed, executing:

17. The apparatus according to any of claims 10 to 16, wherein the features of the reference image comprise texture features.

18. An electronic device comprising a processor and a memory for storing a computer program operable on the processor; wherein,

the processor is configured to run the computer program to perform the method of any one of claims 1 to 8.

19. A computer storage medium on which a computer program is stored, characterized in that the computer program realizes the method of any one of claims 1 to 8 when executed by a processor.

20. A computer program comprising computer readable code which, when run in an electronic device, a processor in the electronic device performs the method of any of claims 1 to 8.