CN116664399A - Image super-resolution processing method, device and equipment based on zero sample learning - Google Patents

Image super-resolution processing method, device and equipment based on zero sample learning Download PDF

Info

Publication number
CN116664399A
CN116664399A CN202310596565.XA CN202310596565A CN116664399A CN 116664399 A CN116664399 A CN 116664399A CN 202310596565 A CN202310596565 A CN 202310596565A CN 116664399 A CN116664399 A CN 116664399A
Authority
CN
China
Prior art keywords
image
resolution
super
wide
alignment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310596565.XA
Other languages
Chinese (zh)
Inventor
熊志伟
徐瑞康
姚明德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202310596565.XA priority Critical patent/CN116664399A/en
Publication of CN116664399A publication Critical patent/CN116664399A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The disclosure provides an image super-resolution processing method, device and equipment based on zero sample learning, which can be applied to the technical field of image processing. The image super-resolution processing method based on zero sample learning comprises the following steps: acquiring a wide-angle image and a tele image; cutting the wide-angle image to obtain a cut image; aligning the cut image to the tele image by using a trained alignment network to obtain an aligned image; cropping the first image pair to obtain a plurality of second image pairs, wherein the first image pair characterizes an image pair consisting of an aligned image and a tele image; training the super-resolution model by using a plurality of second image pairs to obtain a target super-resolution model; and processing the wide-angle image by using the target super-resolution model to obtain a super-resolution image corresponding to the wide-angle image.

Description

Image super-resolution processing method, device and equipment based on zero sample learning
Technical Field
The disclosure relates to the technical field of image processing, in particular to an image super-resolution processing method, device and equipment based on zero sample learning.
Background
Existing mobile devices are typically equipped with an asymmetric camera system, such as a camera module of a smart phone. Asymmetric camera systems are used to acquire images of different field sizes in the same scene. Asymmetric camera systems are typically composed of a wide angle lens and a telephoto lens. Wide angle lenses have a wide field of view relative to tele lenses, typically as the primary lens of the system. However, due to the restriction relationship between the field of view and the resolution, the resolution of the wide-angle lens is lower than that of the telephoto lens in the overlapping field of view of the wide-angle lens and the telephoto lens.
In order to acquire images with large view fields and high resolution, a related super-resolution processing method based on deep learning is characterized in that degradation is predefined, a super-resolution model is trained in a supervised mode according to a large amount of sample data, then the super-resolution image corresponding to the wide-angle image is obtained by processing the wide-angle image according to the trained super-resolution model, wherein the sample data are the wide-angle image and the long-focus image acquired by the same asymmetric camera system for multiple times.
In the process of implementing the disclosed concept, the inventor finds that at least the following problems exist in the related art: the quality of the super-resolution image corresponding to the wide-angle image obtained by the related super-resolution processing method based on deep learning cannot meet the requirements of practical application.
Disclosure of Invention
In view of the above, the present disclosure provides an image super-resolution processing method, apparatus and device based on zero sample learning.
According to a first aspect of the present disclosure, there is provided an image super-resolution processing method based on zero sample learning, including:
acquiring a wide-angle image and a long-focus image, wherein the wide-angle image and the long-focus image represent images obtained by respectively acquiring the same scene by two acquisition devices with different focal lengths, and the resolution of the wide-angle image is smaller than that of the long-focus image;
cutting the wide-angle image to obtain a cut image, wherein the cut image is overlapped with the view field of the long-focus image;
aligning the cut image to the tele image by using a trained alignment network to obtain an aligned image;
cropping the first image pair to obtain a plurality of second image pairs, wherein the first image pair characterizes an image pair consisting of an aligned image and a tele image;
training the super-resolution model by using a plurality of second image pairs to obtain a target super-resolution model;
and processing the wide-angle image by using the target super-resolution model to obtain a super-resolution image corresponding to the wide-angle image.
According to an embodiment of the present disclosure, aligning the cut image to the tele image using the trained alignment network, the obtaining the aligned image includes:
Downsampling a long-focus image to obtain a downsampled image, wherein the resolution of the downsampled image is the same as that of the wide-angle image;
inputting the clipping image and the downsampled image into an alignment network to obtain an alignment image.
According to an embodiment of the present disclosure, a training method of an alignment network includes:
inputting the clipping image and the downsampled image into an alignment network to obtain an initial alignment image;
inputting the initial alignment image into a spatial domain local discriminator to obtain a first discrimination result corresponding to the initial alignment image;
inputting a spectrogram corresponding to the initial alignment image into a frequency domain global discriminator to obtain a second discrimination result corresponding to the initial alignment image;
inputting the initial alignment image, the clipping image and the downsampling image into a depth convolution neural network to obtain intermediate feature images respectively corresponding to the initial alignment image, the clipping image and the downsampling image;
calculating a secondary norm between the downsampled image and the initially aligned image;
obtaining alignment loss corresponding to the initial alignment network according to the first discrimination result, the second discrimination result, the intermediate feature map and the secondary norm;
based on the alignment loss, network parameters of the initially aligned network are updated.
According to an embodiment of the present disclosure, a training method of a spatial domain local discriminant includes:
inputting the cut image into a spatial domain local discriminator to obtain a third discrimination result corresponding to the cut image;
obtaining airspace loss corresponding to the airspace local discriminator according to the first discrimination result and the third discrimination result;
and updating the parameters of the airspace local discriminant according to airspace loss.
According to an embodiment of the present disclosure, a training method of a frequency domain global arbiter includes:
inputting a spectrogram corresponding to the cut image into a frequency domain global discriminator to obtain a fourth discrimination result corresponding to the cut image;
obtaining a frequency domain loss corresponding to the frequency domain global discriminator according to the second discrimination result and the fourth discrimination result;
and updating parameters of the frequency domain global discriminator according to the frequency domain loss.
According to an embodiment of the present disclosure, obtaining an alignment loss corresponding to an initial alignment network according to a first discrimination result, a second discrimination result, an intermediate feature map, and a secondary norm includes calculating according to the following formula (one):
wherein L is align Characterization of alignment loss, L 2 The secondary norms are characterized by the fact that,characterizing a loss term associated with the first discrimination result, < >>Characterizing a loss term associated with the second discrimination result, L cl Characterizing loss terms, lambda, associated with intermediate feature maps 1 、λ 2 And lambda (lambda) 3 And characterizing the weight coefficient.
According to an embodiment of the present disclosure, the loss term associated with the intermediate feature map is calculated according to the following equation (two):
wherein L is cl The losses associated with the intermediate feature map are characterized,representing an initial alignment image of an i-th layer output of a deep convolutional neural network>Corresponding intermediate feature map phi i (Y * ) Image Y representing the i-th layer output and clipping of a deep convolutional neural network * Corresponding intermediate feature map, ">Characterizing the j-th layer output of the deep convolutional neural network with the initial alignment image +.>Corresponding intermediate feature map phi j (X ) Downsampled image X representing a j-th layer output of a deep convolutional neural network And (3) representing an nth layer of the deep convolutional neural network by the n corresponding intermediate feature map, wherein m represents an mth layer of the deep convolutional neural network, and m and n are integers more than or equal to 1.
According to an embodiment of the present disclosure, cropping the first image pairs to obtain a plurality of second image pairs includes:
processing the aligned image by using a trained airspace local discriminator to obtain a similarity probability map, wherein the pixel value of each pixel in the similarity probability map represents the similarity of the aligned image and the tele image in the pixel;
And cutting the first image pair according to the similarity probability map to obtain a plurality of second image pairs.
A second aspect of the present disclosure provides an image super-resolution processing apparatus based on zero sample learning, including:
the acquisition module is used for acquiring a wide-angle image and a long-focus image, wherein the wide-angle image and the long-focus image represent images acquired by two acquisition devices with different focal lengths on the same scene respectively, and the resolution of the wide-angle image is smaller than that of the long-focus image;
the first obtaining module is used for cutting the wide-angle image to obtain a cut image, wherein the cut image is overlapped with the view field of the long-focus image;
the second obtaining module is used for aligning the cutting image to the long-focus image by using the trained alignment network to obtain an alignment image;
a third obtaining module, configured to crop the first image pair to obtain a plurality of second image pairs, where the first image pair characterizes an image pair composed of an aligned image and a tele image;
the fourth obtaining module is used for training the super-resolution model by utilizing a plurality of second image pairs to obtain a target super-resolution model;
and a fifth obtaining module, configured to process the wide-angle image by using the target super-resolution model, so as to obtain a super-resolution image corresponding to the wide-angle image.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method described above.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above method.
According to the embodiment of the disclosure, a wide-angle image and a tele image are obtained by obtaining a wide-angle image which is a low-resolution large-view-field image and a high-resolution small-view-field image which is a tele image for the same scene, cutting the wide-angle image to obtain a cut image overlapped with the view field of the tele image, then aligning the cut image to the tele image by using a trained alignment network to obtain an aligned image aligned to the tele image, cutting a first image pair consisting of the aligned image and the tele image to obtain a plurality of second image pairs, training a super-resolution model by using the plurality of second images to obtain a target super-resolution model, fully learning the degradation of the current wide-angle image by using the target super-resolution model, processing the wide-angle image to obtain a super-resolution image corresponding to the wide-angle image, lifting the current image by using the target super-resolution model according to the degradation of the current image, obtaining the super-resolution image corresponding to the current wide-angle image, and obtaining the super-resolution image corresponding to the current image under the unknown condition, and meeting the requirements of the current application of the super-resolution of the target super-resolution model to the wide-angle image.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario diagram of an image super-resolution processing method based on zero sample learning according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of an image super-resolution processing method based on zero sample learning according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a flow chart for training an alignment network, a spatial domain local discriminant, and a frequency domain global discriminant, according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a flowchart of cropping a first image pair, according to an embodiment of the disclosure;
FIG. 5 schematically illustrates a flowchart of an image super-resolution processing method based on zero sample learning according to another embodiment of the present disclosure;
fig. 6 schematically illustrates a schematic diagram of a super-resolution image obtained by an image super-resolution processing method according to an embodiment of the present disclosure;
fig. 7 schematically illustrates a block diagram of an image super-resolution processing apparatus based on zero sample learning according to an embodiment of the present disclosure;
fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement an image super-resolution processing method based on zero-sample learning, according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the disclosure, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated.
According to embodiments of the present disclosure, an asymmetric camera system may include a plurality of fixed focus lenses, the plurality of lenses having different focus sizes.
According to the embodiment of the disclosure, since there is a tiny camera motion in the process of acquiring the image pairs (the wide-angle image and the tele-focal image) by the asymmetric camera system, the degradation of each wide-angle image acquired by the same asymmetric camera system is different, so that the super-resolution model obtained according to the predefined degradation training is not suitable for performing super-resolution enhancement on the resolution of each wide-angle image, therefore, the quality of the super-resolution image corresponding to the wide-angle image obtained by the super-resolution processing method based on deep learning cannot meet the requirements of practical application.
In order to at least partially solve the technical problems in the related art, embodiments of the present disclosure provide an image super-resolution processing method, apparatus and device based on zero sample learning, which may be applied to the field of image processing.
Embodiments of the present disclosure provide a method for image super-resolution processing based on zero sample learning, including: acquiring a wide-angle image and a long-focus image, wherein the wide-angle image and the long-focus image represent images obtained by respectively acquiring the same scene by two acquisition devices with different focal lengths, and the resolution of the wide-angle image is smaller than that of the long-focus image; cutting the wide-angle image to obtain a cut image, wherein the cut image is overlapped with the view field of the long-focus image; aligning the cut image to the tele image by using a trained alignment network to obtain an aligned image; cropping the first image pair to obtain a plurality of second image pairs, wherein the first image pair characterizes an image pair consisting of an aligned image and a tele image; training the super-resolution model by using a plurality of second image pairs to obtain a target super-resolution model; and processing the wide-angle image by using the target super-resolution model to obtain a super-resolution image corresponding to the wide-angle image.
Fig. 1 schematically illustrates an application scenario diagram of an image super-resolution processing method based on zero sample learning according to an embodiment of the present disclosure.
As shown in fig. 1, an application scenario 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or send messages, etc. Various communication client applications, such as a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, and the third terminal device 103.
The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by the user using the first terminal device 101, the second terminal device 102, and the third terminal device 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the image super-resolution processing method based on zero sample learning provided in the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the image super-resolution processing apparatus based on zero sample learning provided by the embodiments of the present disclosure may be generally provided in the server 105. The image super-resolution processing method based on zero-sample learning provided by the embodiment of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the image super-resolution processing apparatus based on zero-sample learning provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The image super-resolution processing method based on zero sample learning of the disclosed embodiment will be described in detail below with reference to the scenario described in fig. 1 through fig. 2 to 6.
Fig. 2 schematically illustrates a flowchart of an image super-resolution processing method based on zero sample learning according to an embodiment of the present disclosure.
As shown in fig. 2, the image super-resolution processing method based on zero sample learning of this embodiment includes operations S210 to S230.
In operation S210, a wide-angle image and a tele image are acquired, where the wide-angle image and the tele image represent images obtained by respectively acquiring the same scene by two acquisition devices with different focal lengths, and the resolution of the wide-angle image is smaller than the resolution of the tele image.
According to an embodiment of the present disclosure, the two collection devices having different focal lengths may be a collection device corresponding to a wide-angle lens and a collection device corresponding to a telephoto lens in an asymmetric camera system. The field of view of the wide-angle lens is greater than the field of view of the tele lens.
According to the embodiment of the disclosure, for the same scene, the scene can be acquired once by using the acquisition device corresponding to the wide-angle lens to obtain a wide-angle image, and the scene can be acquired once by using the acquisition device corresponding to the tele lens to obtain a tele image.
According to an embodiment of the present disclosure, the number of wide-angle images and tele images in operation S210 is 1.
In operation S220, the wide-angle image is cropped to obtain a cropped image, wherein the cropped image overlaps with the field of view of the tele image.
According to the embodiment of the disclosure, the wide-angle image can be cut according to the multiple relation between the focal length of the wide-angle lens and the focal length of the tele lens, so that a cut image is obtained.
According to the embodiments of the present disclosure, for example, in the case where the focal length of the telephoto lens is 2 times that of the wide-angle lens, the center region of the wide-angle image of the large field of view may be cropped to obtain a cropped image overlapping with the field of view of the telephoto image such that the size of the cropped image is 1/2 of the size of the wide-angle image.
In operation S230, the cut image is aligned to the tele image using the trained alignment network, resulting in an aligned image.
According to embodiments of the present disclosure, the alignment network may be a convolutional neural network architecture based network. Embodiments of the present disclosure do not limit the alignment network and may be selected according to actual service requirements.
According to embodiments of the present disclosure, the alignment network may be, for example, a supervised optical flow estimation network (FlowNet, learning Optical Flow with Convolutional Networks).
According to embodiments of the present disclosure, for example, a pair of spatially misaligned images (tele image, cropped image) may be input into a trained FlowNet, which processes the pair of misaligned images to obtain an aligned image.
In operation S240, the first image pair is cropped to obtain a plurality of second image pairs, wherein the first image pair characterizes an image pair composed of an aligned image and a tele image.
According to the embodiment of the disclosure, the first image pair can be cropped according to the preset size and the preset cropping number, so as to obtain a plurality of second image pairs.
According to embodiments of the present disclosure, the embodiments of the present disclosure do not limit the predetermined size and the predetermined clipping number, and may be selected according to actual business requirements.
According to an embodiment of the present disclosure, for example, in a case where the size of each image included in the first image pair is 1024×1024px (pixel), the predetermined size may be 512×512px, and the predetermined clipping number may be 20. The predetermined size may also be 256 x 256px, and the predetermined clipping number may also be 30. The predetermined sizes may be 512 x 512px and 256 x 256px, respectively, the predetermined number of cuts corresponding to the sizes 512 x 512px may be 10, and the predetermined number of cuts corresponding to the sizes 256 x 256px may be 50.
According to an embodiment of the disclosure, for example, in a case where the size of each image included in the first image pair is 1024 x 514 px, the predetermined size is 512 x 512px, and the predetermined clipping number is 20, the first image pair may be randomly clipped into 20 second image pairs with sizes of 512 x 512px, so as to obtain a plurality of second image pairs.
In operation S250, the super-resolution model is trained using the plurality of second image pairs to obtain a target super-resolution model.
According to embodiments of the present disclosure, the super-resolution model may be a model composed of a deep convolutional network based on an attention mechanism. The embodiment of the disclosure does not limit the super-resolution model, and can be selected according to actual service requirements.
According to embodiments of the present disclosure, the super-resolution model may be, for example, a depth residual channel convolutional network (RCAN, image super-resolution using very deep residual channel attention networks).
According to the embodiment of the disclosure, the super-resolution model is trained by using the plurality of second image pairs, so that a trained target super-resolution model is obtained, and the super-resolution model can fully learn the degradation of the current wide-angle image according to the plurality of second image pairs, so that the target super-resolution model fully learned the degradation of the current wide-angle image is obtained.
In operation S260, the wide-angle image is processed using the target super-resolution model, resulting in a super-resolution image corresponding to the wide-angle image.
According to the embodiment of the disclosure, since the target super-resolution model fully learns the degradation of the current wide-angle image according to the plurality of second images, the target super-resolution model is utilized to process the wide-angle image to obtain the super-resolution image corresponding to the wide-angle image, so that the target super-resolution model can perform super-resolution lifting on the current wide-angle image according to the degradation of the current wide-angle image to obtain the super-resolution image corresponding to the current wide-angle image, the quality of the super-resolution image corresponding to the current wide-angle image is improved, and the super-resolution image corresponding to the current wide-angle image can meet the requirements of practical applications.
According to the embodiment of the disclosure, a wide-angle image and a tele image are obtained by obtaining a wide-angle image which is a low-resolution large-view-field image and a high-resolution small-view-field image which is a tele image for the same scene, cutting the wide-angle image to obtain a cut image overlapped with the view field of the tele image, then aligning the cut image to the tele image by using a trained alignment network to obtain an aligned image aligned to the tele image, cutting a first image pair consisting of the aligned image and the tele image to obtain a plurality of second image pairs, training a super-resolution model by using the plurality of second images to obtain a target super-resolution model, fully learning the degradation of the current wide-angle image by using the target super-resolution model, processing the wide-angle image to obtain a super-resolution image corresponding to the wide-angle image, lifting the current image by using the target super-resolution model according to the degradation of the current image, obtaining the super-resolution image corresponding to the current wide-angle image, and obtaining the super-resolution image corresponding to the current image under the unknown condition, and meeting the requirements of the current application of the super-resolution of the target super-resolution model to the wide-angle image.
According to an embodiment of the present disclosure, for operation S230 shown in fig. 2, aligning a clipping image to a tele image using a trained alignment network, obtaining an aligned image may include the following operations:
downsampling a long-focus image to obtain a downsampled image, wherein the resolution of the downsampled image is the same as that of the wide-angle image;
inputting the clipping image and the downsampled image into an alignment network to obtain an alignment image.
According to the embodiment of the disclosure, the method for downsampling the tele image can be selected according to actual service requirements, and the embodiment of the disclosure does not limit the downsampling method. For example, a bilinear difference algorithm may be used to downsample the tele image to obtain a downsampled image.
According to the embodiment of the disclosure, the cut image and the downsampled image are input into an alignment network to obtain an alignment image, the alignment network takes the downsampled image as an alignment guide, and the cut image is subjected to spatial position transformation, so that the spatial position of the cut image is aligned to a tele image, and the alignment image aligned with the downsampled image is obtained.
Fig. 3 schematically illustrates a flow chart for training an alignment network, a spatial domain local discriminant, and a frequency domain global discriminant, according to an embodiment of the present disclosure.
As shown in fig. 3, the training method of the alignment network includes:
will crop the image Y * "301" and downsampled image X 302 inputs to the alignment network 310 to obtain an initial alignment image“303”。
Will initially align the images"303" input airspace local discriminator D s In "320", an image is obtained in alignment with the initial alignmentThe first discrimination result corresponding to "303".
Will be aligned with the original image"303" corresponding spectrogram input frequency domain global discriminant D f In "330", the image +.>And a second discrimination result corresponding to "303".
Will initially align the images"303", cut-out image Y * "301" and downsampled image X "302" is input into the deep convolutional neural network 340 to obtain an image +.>"303", cut-out image Y * "301" and downsampled image X "302" respectively correspond to the intermediate feature maps.
Calculating downsampled image X 302 with the initially aligned imageThe quadratic norms 304 between "303".
And obtaining the alignment loss corresponding to the initial alignment network according to the first discrimination result, the second discrimination result, the intermediate feature map and the secondary norm 304.
Based on the alignment loss, network parameters of the initially aligned network are updated.
According to an embodiment of the present disclosure, the alignment network 310 as shown in fig. 3 may be, for example, flowNet and the deep convolutional neural network 340 may be, for example, VGG (Very deep convolutional networks for large-scale image recognition).
According to an embodiment of the present disclosure, spatial domain local discriminant D, as shown in FIG. 3 s "320" may include an 8-layer convolutional layer and a global average pooling layer. Frequency domain global arbiter D as shown in FIG. 3 f "330" may include a convolution layer consisting of 8 layers.
As shown in FIG. 3, a loss term associated with the first discrimination result can be calculated based on the first discrimination result"305", a loss term ++relating to the second discrimination result can be calculated based on the second discrimination result>"306", the loss term L related to the intermediate feature map can be calculated according to the intermediate feature map cl "307". Then, according to the second norm 304, the loss term related to the first discrimination result +.>"305", loss term related to the second discrimination result->"306" and loss term L related to intermediate feature map cl "307" obtains an alignment loss corresponding to the initial alignment network, and then updates network parameters of the initial alignment network based on the alignment loss.
According to an embodiment of the present disclosure, deriving an alignment loss corresponding to the initial alignment network from the first discrimination result, the second discrimination result, the intermediate feature map, and the secondary norm may include calculating according to the following formula (one):
wherein L is align Characterization of alignment loss, L 2 The secondary norms are characterized by the fact that,characterizing a loss term associated with the first discrimination result, < >>Characterizing a loss term associated with the second discrimination result, L cl Characterizing loss terms, lambda, associated with intermediate feature maps 1 、λ 2 And lambda (lambda) 3 And characterizing the weight coefficient.
According to an embodiment of the present disclosure, the loss term associated with the intermediate feature map may be calculated according to the following equation (two):
wherein L is cl The losses associated with the intermediate feature map are characterized,representing an initial alignment image of an i-th layer output of a deep convolutional neural network>Corresponding intermediate feature map phi i (Y * ) Image Y representing the i-th layer output and clipping of a deep convolutional neural network * Corresponding intermediate feature map, ">Characterizing the j-th layer output of the deep convolutional neural network with the initial alignment image +.>Corresponding intermediate feature map phi j (X ) Downsampled image X representing a j-th layer output of a deep convolutional neural network And (3) representing an nth layer of the deep convolutional neural network by the n corresponding intermediate feature map, wherein m represents an mth layer of the deep convolutional neural network, and m and n are integers more than or equal to 1.
According to embodiments of the present disclosure, for example, m and n may each be 10.
According to an embodiment of the present disclosure, the loss term related to the first discrimination result may be calculated according to the following formula (iii):
Wherein, the liquid crystal display device comprises a liquid crystal display device,characterizing a loss associated with the first discrimination result, < >>Characterization airspace local discriminant D s Output of the image aligned with the original>Corresponding first discrimination result,/->Characterization by->Is a variable expectation.
According to an embodiment of the present disclosure, spatial domain local discriminant D s Will align the image with the initialProcessing the corresponding small-size images to obtain discrimination results corresponding to the small-size images respectively to obtain a first discrimination result, and then inputting the discrimination results corresponding to the small-size images respectively into the formula->In (i.e.)>) Obtaining a loss term +.>
According to an embodiment of the present disclosure, the loss term related to the second discrimination result may be calculated according to the following formula (four):
wherein, the liquid crystal display device comprises a liquid crystal display device,characterizing a loss associated with the second discrimination result, < >>Characterization global arbiter D fs Output of the image aligned with the original>Corresponding second discrimination result,/->Characterization and initial alignment image->A corresponding spectrogram.
According to embodiments of the present disclosure, the secondary norm between the downsampled image and the initially aligned image may be calculated according to the following equation (five).
L 2 =||X -Y * || 2 (V)
According to the embodiment of the disclosure, in the process of training the alignment network, the secondary norm is used as supervision, so that the initial alignment image can be ensured Is aligned with the tele image X. By using the loss term associated with the first discrimination result as a supervision, it is possible to secure the initial alignment image +.>And local statistical consistency of the tele image X in the airspace. By using the loss term related to the second discrimination result as a supervision, it is possible to secure the initial alignment image +.>And global statistical consistency of the tele image X in the frequency domain. Initial alignment of images can be constrained in the characterization space of features using loss terms associated with intermediate feature maps as a supervisionAs close as possible to the cropped image Y * Away from downsampled image X And further, the degradation is unchanged in the alignment process, namely the degradation consistency of the alignment image and the clipping image is ensured.
According to the embodiment of the disclosure, in the process of training an alignment network, a clipping image and a downsampling image are input into the alignment network to obtain an initial alignment image, the initial alignment image is input into a spatial domain local discriminator to obtain a first discrimination result corresponding to the initial alignment image, a spectrogram corresponding to the initial alignment image is input into a frequency domain global discriminator to obtain a second discrimination result corresponding to the initial alignment image, the clipping image and the downsampling image are input into a deep convolutional neural network to obtain an intermediate feature image corresponding to the initial alignment image, the clipping image and the downsampling image respectively, a secondary norm between the downsampling image and the initial alignment image is calculated, an alignment loss corresponding to the initial alignment network is obtained according to the first discrimination result, the second discrimination result, the intermediate feature image and the secondary norm, and network parameters of the initial alignment network are updated according to the alignment loss, so that the spatial position of the clipping image is aligned to a long-focus image by the alignment network and the quality-reducing consistency of the clipping image can be fully maintained.
As shown in fig. 3, the spatial domain local discriminant and the frequency domain global discriminant are trained simultaneously in the process of training the alignment network.
As shown in fig. 3, the training method of the spatial domain local discriminant includes:
will initially align the images"303" input airspace local discriminator D s In "320", an image is obtained in alignment with the initial alignmentThe first discrimination result corresponding to "303". Will crop the image Y * '301' input airspace local discriminator D s In "320", a third determination result corresponding to the cut image is obtained. And obtaining the airspace loss corresponding to the airspace local discriminator according to the first discrimination result and the third discrimination result. And updating the parameters of the airspace local discriminant according to airspace loss.
According to an embodiment of the present disclosure, obtaining the airspace loss corresponding to the airspace local discriminator according to the first discrimination result and the third discrimination result may include calculating according to the following formula (six):
/>
wherein, the liquid crystal display device comprises a liquid crystal display device,characterizing airspace loss corresponding to airspace local discriminator, D s (Y * ) Characterization airspace local discriminant D s Output and cropping image Y * And a corresponding third discrimination result. />Characterization by Y * Is a variable expectation.
As shown in fig. 3, the training method of the frequency domain global discriminant includes:
Will be aligned with the original image"303" corresponding spectrogram input frequency domain global discriminant D f In "330", the image +.>And a second discrimination result corresponding to "303". Will and crop the image Y * Frequency domain global discriminator D for input of spectrogram corresponding to 301 f And 330, obtaining a fourth judging result corresponding to the clipping image. And obtaining the frequency domain loss corresponding to the frequency domain global discriminator according to the second discrimination result and the fourth discrimination result. And updating parameters of the frequency domain global discriminator according to the frequency domain loss.
According to an embodiment of the present disclosure, deriving the frequency domain loss corresponding to the frequency domain global arbiter from the second discrimination result and the fourth discrimination result may include calculating according to the following formula (seventh):
wherein, the liquid crystal display device comprises a liquid crystal display device,characterizing a frequency domain penalty corresponding to a frequency domain global arbiter [ D ] f (fl[Y * ])]Characterization global arbiter D fs Output and cropping image Y * And a corresponding fourth discrimination result.
According to the embodiment of the disclosure, the trained alignment network can be obtained under the condition that the number of times of training the alignment network reaches the preset number of times, and the trained alignment network can also be obtained under the condition that the alignment loss is smaller than the preset loss value. The termination conditions for training the alignment network may be determined according to actual traffic demands, and embodiments of the present disclosure do not limit the termination conditions.
According to an embodiment of the present disclosure, for operation S240 shown in fig. 2, cropping the first image pair to obtain a plurality of second image pairs may include the following operations:
processing the aligned image by using a trained airspace local discriminator to obtain a similarity probability map, wherein the pixel value of each pixel in the similarity probability map represents the similarity of the aligned image and the tele image in the pixel;
and cutting the first image pair according to the similarity probability map to obtain a plurality of second image pairs.
Fig. 4 schematically illustrates a flowchart of cropping a first image pair according to an embodiment of the disclosure.
As shown in fig. 4, the images will be aligned"403" is sent to the optimized airspace local discriminator D s 420, obtaining a degradation identification map M s "405". Identifying map M of degradation s "405" up-samples and normalizes to get the alignment image +.>"403" resolution consistent similarity probability map P D Then can be based on the similarity probability map P D For the first image pair (tele image X "404", alignment image +.>"403") to obtain a low-high resolution for training the super-resolution modelA set of rate image pairs, i.e. a second set of image pairs +. >"406". Wherein K characterizes the number of second image pairs, < >>Characterizing a kth second image of the second set of image pairs corresponding to the aligned image, the second set of images corresponding to the aligned image may be represented as +.> Characterizing a kth second image of the second set of image pairs corresponding to the tele image, the second set of images corresponding to the tele image may be represented as +.>
According to embodiments of the present disclosure, optimized airspace local discriminator D is accomplished s For example, the trained spatial domain local discriminant D in FIG. 3 can be used s . The upsampling method may be, for example, a bilinear interpolation algorithm. P (P) D Also representing the probability that the pixel is used for the second image pair.
According to the embodiment of the disclosure, although the consistency of degradation is restrained in the alignment process, in practical application, the alignment image still has a region with partial degradation change compared with the clipping image.
According to the embodiment of the disclosure, an aligned image is processed by using a trained spatial domain local discriminator to obtain a similarity probability map, wherein the pixel value of each pixel in the similarity probability map represents the similarity of the aligned image and a tele image in the pixel, and then a first image pair is cut according to the similarity probability map to obtain a plurality of second image pairs, so that the plurality of second image pairs comprise areas with unchanged degradation in more aligned images.
According to embodiments of the present disclosure, for training a super-resolution model under an asymmetric camera system, the process thereof may be expressed asWherein S is Θ (. Cndot.) characterization of super resolution model, min Θ And (3) searching a minimum value by taking Θ as a variable, wherein Θ represents a model parameter of the super-resolution model, and L (·) represents a loss function adopted by the super-resolution model. Degradation of resolution degradation due to field of view increase may be referred to as resolution-field of view degradation D RV (. Cndot.) the use of a catalyst. Further, Y *The relation with X can be expressed as +.>Therefore, the super-resolution model can implicitly learn the degradation of the current wide-angle image in the process of training the super-resolution model.
Fig. 5 schematically illustrates a flowchart of an image super-resolution processing method based on zero sample learning according to another embodiment of the present disclosure.
As shown in fig. 5, the same scene may be acquired with an asymmetric camera system, resulting in a wide-angle image Y "501" and a tele image X "502". Center clipping is carried out on the wide-angle image Y '501' to obtain a clipping image Y * "503", cropping image Y * The field of view of "503" overlaps with the field of view of the tele image X.
Downsampling the tele image X '502' to obtain a downsampled image X "504". Will crop the image Y * "503" and downsampled image X "504" is input into the alignment network 510, the alignment network 510 is trained using the method shown in FIG. 3 to obtain a trained alignment network, and then the cut image Y is cropped using the trained alignment network * "503" and downsampled image X Processing 504 to obtain an aligned image
The spatial domain local discriminant is trained while the alignment network 510 is trained to obtain a trained spatial domain local discriminant 520. Alignment of images according to trained airspace arbiter 520And the tele image X are subjected to clipping processing as shown in FIG. 4 to obtain a second image pair set +.>"505", using the second image pair set +.>The data in "505" trains the super-resolution model 530 to obtain the target super-resolution model. Inputting the wide-angle image Y ' 501 ' into a target super-resolution model to obtain a super-resolution image Y corresponding to the wide-angle image Y ' 501 SR “506”。
As shown in fig. 5, by acquiring a wide-angle image and a tele image, a wide-angle image which is a low-resolution large-view image and a tele image which is a high-resolution small-view image of the same scene is obtained, the wide-angle image is cut to obtain a cut image overlapped with the view field of the tele image, then the cut image is aligned to the tele image by using a trained alignment network to obtain an aligned image aligned to the tele image, a first image pair consisting of the aligned image and the tele image is cut to obtain a plurality of second image pairs, a super-resolution model is trained by using the plurality of second image pairs to obtain a target super-resolution model, the target super-resolution model is fully learned to the degradation of the current wide-angle image, then the target super-resolution model is used for processing the wide-angle image to obtain a super-resolution image corresponding to the wide-angle image, the target super-resolution model can be lifted to the current wide-angle image according to the degradation of the current wide-angle image, the super-resolution image corresponding to the current wide-angle image is obtained, the target super-resolution model is more suitable for the current image corresponding to the unknown wide-angle image, and the current image is required to be lifted, and the current image is better.
According to embodiments of the present disclosure, the effectiveness of the image super-resolution processing method based on zero sample learning provided by the embodiments of the present disclosure is verified by using two simulation data sets and two real data sets.
In experiments simulating the dataset, the hci_new light field dataset and the Middlebury2021 binocular dataset were employed in accordance with embodiments of the present disclosure. The image data in the data set is processed into image data in an asymmetric camera system through simulation, namely, an image of a main view angle is subjected to downsampling to obtain a simulated wide-angle image, then an image of another view angle is selected for central cutting to obtain a simulated tele image, and the image of the original main view angle is used as a real image. For quantitative comparison, peak signal-to-noise ratio and structural similarity are used as numerical indicators.
The experimental results are shown in tables 1 and 2. Wherein IG and AG represent symmetric gaussian degradation and asymmetric gaussian degradation (i.e., two methods of downsampling an image of a main view angle), respectively, ig_jepg represents symmetric gaussian degradation combined with JEPG compression, data in front of a diagonal "/" represents peak signal-to-noise ratio, data in back of a diagonal "/" represents structural similarity, 2 x scale represents 2-fold super-resolution improvement over a wide-angle image, and 4 x scale represents 4-fold super-resolution improvement over a wide-angle image. The comparison methods are classified into the following categories: shan Tuchao, a super-resolution method based on reference images, and a blind super-resolution method. Shan Tuchao resolution methods include RCAN (Image super-resolution using very deep residual channel attention networks) and Kernelgan (Blind super-resolution kernel estimation using an internal-gan). The super-resolution method based on the reference image includes MASA (Masa-sr: matching acceleration and spatial adaptation for reference-based image super-resolution), DCSR-SRA (Dual-camera super-resolution with aligned attention modules), selfDZSR (Self-supervised learning for real-world super-resolution from Dual zoomed observations). Blind super-resolution methods include DANv2 (End-to-End alternating optimization for blind super resolution), DCLS (Deep constrained least squares for blind image super-resolution). The ZeDuSR in table 1 represents the image super-resolution processing method provided by the embodiment of the present disclosure. The outer-ZS in table 2 represents the image super-resolution processing method provided by the embodiment of the present disclosure.
TABLE 1
TABLE 2
The experimental results in table 1 and table 2 show that, under the conditions that the degradation of the two simulation data is different and the super-resolution multiple of the obtained super-resolution image relative to the wide-angle image is different, the quality of the super-resolution image obtained by the image super-resolution processing method provided by the embodiment of the disclosure is better than that of the existing method.
According to an embodiment of the present disclosure, in an experiment of a real dataset, an asymmetric camera system mounted by IPhone11 and IPhone12 is employed as acquisition wide-angle image and tele image data. The data of IPhone11 is subjected to 2 times super resolution, IPhone12 is subjected to 2 times and 4 times super resolution, and the visual comparison experiment result is shown in FIG. 6.
Fig. 6 schematically illustrates a schematic diagram of a super-resolution image obtained by the image super-resolution processing method according to an embodiment of the present disclosure. The ZeDuSR in fig. 6 represents an image super-resolution processing method provided by an embodiment of the present disclosure.
As can be seen from fig. 6, in the case that the super-resolution multiples of the super-resolution image obtained according to the real dataset are different from the super-resolution multiples of the wide-angle image, the super-resolution image obtained by the image super-resolution processing method provided by the embodiment of the present disclosure can keep more image details, and the image details are clearer, and the quality of the super-resolution image obtained by the image super-resolution processing method provided by the embodiment of the present disclosure is better than that of the existing method.
It should be noted that, unless there is an execution sequence between different operations or an execution sequence between different operations in technical implementation, the execution sequence between multiple operations may be different, and multiple operations may also be executed simultaneously in the embodiment of the disclosure.
Based on the image super-resolution processing method based on zero sample learning, the disclosure also provides an image super-resolution processing device based on zero sample learning. The device will be described in detail below in connection with fig. 7.
Fig. 7 schematically illustrates a block diagram of an image super-resolution processing apparatus based on zero sample learning according to an embodiment of the present disclosure.
As shown in fig. 7, the image super-resolution processing apparatus 700 based on zero sample learning of this embodiment includes an acquisition module 710, a first obtaining module 720, a second obtaining module 730, a third obtaining module 740, a fourth obtaining module 750, and a fifth obtaining module 760.
The acquiring module 710 is configured to acquire a wide-angle image and a tele image, where the wide-angle image and the tele image represent images acquired by two acquiring devices with different focal lengths respectively for the same scene, and the resolution of the wide-angle image is smaller than the resolution of the tele image. In an embodiment, the obtaining module 710 may be configured to perform the operation S210 described above, which is not described herein.
The first obtaining module 720 is configured to crop the wide-angle image to obtain a cropped image, where the cropped image overlaps with the field of view of the tele image. In an embodiment, the first obtaining module 720 may be configured to perform the operation S220 described above, which is not described herein.
A second obtaining module 730, configured to align the clipping image to the tele image by using the trained alignment network, and obtain an aligned image. In an embodiment, the second obtaining module 730 may be configured to perform the operation S230 described above, which is not described herein.
A third obtaining module 740 is configured to crop the first image pair to obtain a plurality of second image pairs, where the first image pair characterizes an image pair composed of an aligned image and a tele image. In an embodiment, the third obtaining module 740 may be configured to perform the operation S240 described above, which is not described herein.
A fourth obtaining module 750 is configured to train the super-resolution model using the plurality of second image pairs to obtain a target super-resolution model. In an embodiment, the fourth obtaining module 750 may be used to perform the operation S250 described above, which is not described herein.
And a fifth obtaining module 760, configured to process the wide-angle image with the target super-resolution model to obtain a super-resolution image corresponding to the wide-angle image. In an embodiment, the fifth obtaining module 760 may be used to perform the operation S260 described above, which is not described herein.
According to an embodiment of the present disclosure, the second obtaining module includes a downsampled image obtaining sub-module and an aligned image obtaining sub-module.
The downsampling image obtaining submodule is used for downsampling the long-focus image to obtain a downsampled image, wherein the resolution of the downsampled image is the same as that of the wide-angle image.
And the alignment image obtaining sub-module is used for inputting the clipping image and the downsampled image into an alignment network to obtain an alignment image.
According to an embodiment of the disclosure, an image super-resolution processing device based on zero-sample learning includes an alignment network training module. An alignment network training module for:
inputting the clipping image and the downsampled image into an alignment network to obtain an initial alignment image;
inputting the initial alignment image into a spatial domain local discriminator to obtain a first discrimination result corresponding to the initial alignment image;
inputting a spectrogram corresponding to the initial alignment image into a frequency domain global discriminator to obtain a second discrimination result corresponding to the initial alignment image;
inputting the initial alignment image, the clipping image and the downsampling image into a depth convolution neural network to obtain intermediate feature images respectively corresponding to the initial alignment image, the clipping image and the downsampling image;
Calculating a secondary norm between the downsampled image and the initially aligned image;
obtaining alignment loss corresponding to the initial alignment network according to the first discrimination result, the second discrimination result, the intermediate feature map and the secondary norm;
based on the alignment loss, network parameters of the initially aligned network are updated.
According to an embodiment of the disclosure, an image super-resolution processing device based on zero sample learning includes a spatial domain local discriminant training module. The airspace local discriminant training module is used for:
inputting the cut image into a spatial domain local discriminator to obtain a third discrimination result corresponding to the cut image;
obtaining airspace loss corresponding to the airspace local discriminator according to the first discrimination result and the third discrimination result;
and updating the parameters of the airspace local discriminant according to airspace loss.
According to an embodiment of the disclosure, an image super-resolution processing device based on zero sample learning includes a frequency domain global discriminant training module. The frequency domain global discriminant training module is used for:
inputting a spectrogram corresponding to the cut image into a frequency domain global discriminator to obtain a fourth discrimination result corresponding to the cut image;
obtaining a frequency domain loss corresponding to the frequency domain global discriminator according to the second discrimination result and the fourth discrimination result;
And updating parameters of the frequency domain global discriminator according to the frequency domain loss.
According to an embodiment of the present disclosure, the alignment network training module includes a calculation sub-module.
A calculation sub-module for calculating according to the following formula (one):
wherein L is align Characterization of alignment loss, L 2 The secondary norms are characterized by the fact that,characterizing a loss term associated with the first discrimination result, < >>Characterizing a loss term associated with the second discrimination result, L cl Characterizing loss terms, lambda, associated with intermediate feature maps 1 、λ 2 And lambda (lambda) 3 And characterizing the weight coefficient.
According to an embodiment of the present disclosure, the calculation submodule includes a calculation unit.
A calculation unit for calculating according to the following formula (two):
wherein L is cl The losses associated with the intermediate feature map are characterized,representing an initial alignment image of an i-th layer output of a deep convolutional neural network>Corresponding intermediate feature map phi i (Y * ) Image Y representing the i-th layer output and clipping of a deep convolutional neural network * Corresponding intermediate feature map, ">Characterizing the j-th layer output of the deep convolutional neural network with the initial alignment image +.>Corresponding intermediate feature map phi j (X ) Downsampled image X representing a j-th layer output of a deep convolutional neural network And (3) representing an nth layer of the deep convolutional neural network by the n corresponding intermediate feature map, wherein m represents an mth layer of the deep convolutional neural network, and m and n are integers more than or equal to 1.
According to an embodiment of the disclosure, the third obtaining module includes a probability map obtaining sub-module and a second image pair obtaining sub-module.
The probability map obtaining sub-module is used for processing the aligned image by using the trained airspace local discriminant to obtain a similarity probability map, wherein the pixel value of each pixel in the similarity probability map represents the similarity of the aligned image and the tele image in the pixel.
And the second image pair obtaining submodule is used for cutting the first image pair according to the similarity probability map to obtain a plurality of second image pairs.
According to embodiments of the present disclosure, any of the acquisition module 710, the first obtaining module 720, the second obtaining module 730, the third obtaining module 740, the fourth obtaining module 750, and the fifth obtaining module 760 may be combined in one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the acquisition module 710, the first acquisition module 720, the second acquisition module 730, the third acquisition module 740, the fourth acquisition module 750, and the fifth acquisition module 760 may be implemented, at least in part, as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or as any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the acquisition module 710, the first acquisition module 720, the second acquisition module 730, the third acquisition module 740, the fourth acquisition module 750, and the fifth acquisition module 760 may be at least partially implemented as computer program modules, which when executed, may perform the respective functions.
Fig. 8 schematically illustrates a block diagram of an electronic device adapted to implement an image super-resolution processing method based on zero-sample learning, according to an embodiment of the disclosure.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 801 may also include on-board memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the disclosure.
In the RAM 803, various programs and data required for the operation of the electronic device 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or the RAM 803. Note that the program may be stored in one or more memories other than the ROM 802 and the RAM 803. The processor 801 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, the electronic device 800 may also include an input/output (I/O) interface 805, the input/output (I/O) interface 805 also being connected to the bus 804. The electronic device 800 may also include one or more of the following components connected to an input/output (I/O) interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to an input/output (I/O) interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The above-described computer-readable storage medium carries one or more programs that, when executed, implement the zero-sample learning-based image super-resolution processing method according to the embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 802 and/or RAM 803 and/or one or more memories other than ROM 802 and RAM 803 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. When the computer program product runs in a computer system, the program code is used for enabling the computer system to realize the image super-resolution processing method based on zero sample learning provided by the embodiment of the disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, and/or from a removable medium 811 via a communication portion 809. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 801. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (10)

1. An image super-resolution processing method based on zero sample learning comprises the following steps:
acquiring a wide-angle image and a long-focus image, wherein the wide-angle image and the long-focus image represent images obtained by respectively acquiring the same scene by two acquisition devices with different focal lengths, and the resolution of the wide-angle image is smaller than that of the long-focus image;
Cutting the wide-angle image to obtain a cut image, wherein the cut image is overlapped with the field of view of the tele image;
aligning the cut image to the tele image by using a trained alignment network to obtain an aligned image;
cropping the first image pair to obtain a plurality of second image pairs, wherein the first image pair represents an image pair formed by the aligned image and the tele image;
training the super-resolution model by using a plurality of second image pairs to obtain a target super-resolution model;
and processing the wide-angle image by using the target super-resolution model to obtain a super-resolution image corresponding to the wide-angle image.
2. The method of claim 1, wherein the aligning the crop image to the tele image with the trained alignment network, resulting in an aligned image comprises:
downsampling the long-focus image to obtain a downsampled image, wherein the resolution of the downsampled image is the same as the resolution of the wide-angle image;
inputting the clipping image and the downsampled image into the alignment network to obtain the alignment image.
3. The method according to claim 1 or 2, wherein the training method of the alignment network comprises:
inputting the clipping image and the downsampled image into the alignment network to obtain an initial alignment image;
inputting the initial alignment image into a spatial domain local discriminator to obtain a first discrimination result corresponding to the initial alignment image;
inputting a spectrogram corresponding to the initial alignment image into a frequency domain global discriminator to obtain a second discrimination result corresponding to the initial alignment image;
inputting the initial alignment image, the clipping image and the downsampling image into a depth convolution neural network to obtain intermediate feature images respectively corresponding to the initial alignment image, the clipping image and the downsampling image;
calculating a secondary norm between the downsampled image and the initially aligned image;
obtaining an alignment loss corresponding to the initial alignment network according to the first discrimination result, the second discrimination result, the intermediate feature map and the secondary norm;
and updating network parameters of the initial alignment network according to the alignment loss.
4. A method according to claim 3, wherein the training method of the spatial domain local discriminant comprises:
Inputting the cut image into a spatial domain local discriminator to obtain a third discrimination result corresponding to the cut image;
obtaining airspace loss corresponding to the airspace local discriminator according to the first discrimination result and the third discrimination result;
and updating parameters of the spatial domain local discriminator according to the spatial domain loss.
5. The method of claim 3, wherein the training method of the frequency domain global discriminant comprises:
inputting a spectrogram corresponding to the clipping image into a frequency domain global discriminator to obtain a fourth discrimination result corresponding to the clipping image;
obtaining a frequency domain loss corresponding to the frequency domain global discriminator according to the second discrimination result and the fourth discrimination result;
and updating parameters of the frequency domain global discriminator according to the frequency domain loss.
6. A method according to claim 3, wherein said deriving an alignment loss corresponding to said initial alignment network from said first discrimination result, said second discrimination result, said intermediate feature map and said secondary norm comprises calculating according to the following formula (one):
wherein L is align Characterizing the alignment loss, L 2 The secondary norms are characterized in that, Characterizing a loss term associated with said first discrimination result,/i>Characterizing a loss term associated with the second discrimination result, the L cl Characterizing a loss term, lambda, associated with the intermediate feature map 1 、λ 2 And lambda (lambda) 3 And characterizing the weight coefficient.
7. The method of claim 6, wherein the loss term associated with the intermediate feature map is calculated according to the following equation (two):
wherein L is cl Characterizing losses associated with the intermediate feature map,Representing an i-th layer output of a deep convolutional neural network +.>Corresponding intermediate feature map phi i (Y * ) Characterizing the i-th layer output of a deep convolutional neural network with the cropped image Y * Corresponding intermediate feature map, ">Characterizing the j-th layer output of the deep convolutional neural network +_with the initial alignment image +_>Corresponding intermediate feature map phi j (x ) Characterizing a j-th layer output of a deep convolutional neural network with the downsampled image X And (3) representing an nth layer of the deep convolutional neural network by the n corresponding intermediate feature map, wherein m represents an mth layer of the deep convolutional neural network, and m and n are integers more than or equal to 1.
8. A method according to claim 3, wherein cropping the first image pairs to obtain a plurality of second image pairs comprises:
Processing the aligned image by using the trained airspace local discriminator to obtain a similarity probability map, wherein the pixel value of each pixel in the similarity probability map represents the similarity of the aligned image and the tele image in the pixel;
and clipping the first image pairs according to the similarity probability map to obtain the plurality of second image pairs.
9. An image super-resolution processing device based on zero sample learning, comprising:
the acquisition module is used for acquiring a wide-angle image and a long-focus image, wherein the wide-angle image and the long-focus image represent images obtained by respectively acquiring the same scene by two acquisition devices with different focal lengths, and the resolution of the wide-angle image is smaller than that of the long-focus image;
the first obtaining module is used for clipping the wide-angle image to obtain a clipping image, wherein the clipping image is overlapped with the view field of the tele image;
the second obtaining module is used for aligning the cut image to the tele image by using a trained alignment network to obtain an aligned image;
a third obtaining module, configured to crop a first image pair to obtain a plurality of second image pairs, where the first image pair characterizes an image pair composed of the aligned image and the tele image;
The fourth obtaining module is used for training the super-resolution model by utilizing a plurality of second image pairs to obtain a target super-resolution model;
and a fifth obtaining module, configured to process the wide-angle image by using the target super-resolution model, so as to obtain a super-resolution image corresponding to the wide-angle image.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more instructions,
wherein the one or more instructions, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 8.
CN202310596565.XA 2023-05-23 2023-05-23 Image super-resolution processing method, device and equipment based on zero sample learning Pending CN116664399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310596565.XA CN116664399A (en) 2023-05-23 2023-05-23 Image super-resolution processing method, device and equipment based on zero sample learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310596565.XA CN116664399A (en) 2023-05-23 2023-05-23 Image super-resolution processing method, device and equipment based on zero sample learning

Publications (1)

Publication Number Publication Date
CN116664399A true CN116664399A (en) 2023-08-29

Family

ID=87721763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310596565.XA Pending CN116664399A (en) 2023-05-23 2023-05-23 Image super-resolution processing method, device and equipment based on zero sample learning

Country Status (1)

Country Link
CN (1) CN116664399A (en)

Similar Documents

Publication Publication Date Title
WO2022116856A1 (en) Model structure, model training method, and image enhancement method and device
CN111386536A (en) Semantically consistent image style conversion
US20190005619A1 (en) Image upscaling system, training method thereof, and image upscaling method
US11348203B2 (en) Image generation using subscaling and depth up-scaling
CN111915480B (en) Method, apparatus, device and computer readable medium for generating feature extraction network
WO2020062494A1 (en) Image processing method and apparatus
CN110211195B (en) Method, device, electronic equipment and computer-readable storage medium for generating image set
US11694331B2 (en) Capture and storage of magnified images
CN113781493A (en) Image processing method, image processing apparatus, electronic device, medium, and computer program product
CN110717405B (en) Face feature point positioning method, device, medium and electronic equipment
CN113837965A (en) Image definition recognition method and device, electronic equipment and storage medium
Wang et al. Image dehazing based on partitioning reconstruction and entropy-based alternating fast-weighted guided filters
CN116664399A (en) Image super-resolution processing method, device and equipment based on zero sample learning
CN116245769A (en) Image processing method, device, equipment and storage medium
US20220327663A1 (en) Video Super-Resolution using Deep Neural Networks
CN114708145A (en) Method and device for determining ocean current flow field of GOCI water color image
CN113888424A (en) Historical relic photo color restoration method and device, electronic equipment and storage medium
CN114511862A (en) Form identification method and device and electronic equipment
CN114429602A (en) Semantic segmentation method and device, electronic equipment and storage medium
CN112861940A (en) Binocular disparity estimation method, model training method and related equipment
Zhang et al. Super-resolution for semantic segmentation
CN111881778B (en) Method, apparatus, device and computer readable medium for text detection
Zhao et al. A practical super-resolution method for multi-degradation remote sensing images with deep convolutional neural networks
Zhang et al. Perception-oriented single image super-resolution network with receptive field block
Lin et al. Image super-resolution by estimating the enhancement weight of self example and external missing patches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination