WO2022099710A1 - 图像重建方法、电子设备和计算机可读存储介质 - Google Patents

图像重建方法、电子设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2022099710A1
WO2022099710A1 PCT/CN2020/129145 CN2020129145W WO2022099710A1 WO 2022099710 A1 WO2022099710 A1 WO 2022099710A1 CN 2020129145 W CN2020129145 W CN 2020129145W WO 2022099710 A1 WO2022099710 A1 WO 2022099710A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
maps
input image
feature maps
Prior art date
Application number
PCT/CN2020/129145
Other languages
English (en)
French (fr)
Inventor
那彦波
张文浩
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/433,178 priority Critical patent/US11893710B2/en
Priority to CN202080002800.0A priority patent/CN114830168A/zh
Priority to PCT/CN2020/129145 priority patent/WO2022099710A1/zh
Publication of WO2022099710A1 publication Critical patent/WO2022099710A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • G06T3/4076Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution using the original low-resolution images to iteratively correct the high-resolution images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to the field of image processing, and in particular, to an image reconstruction method for an edge device, an electronic device, and a computer-readable storage medium.
  • Edge Artificial Intelligence Some tasks utilizing AI techniques, such as image classification, object detection, etc., have been successfully implemented on edge devices because the outputs involved (e.g., labels, bounding boxes, etc.) are low-dimensional.
  • Image super-resolution (SR) reconstruction has now become an important branch in the field of image processing.
  • the methods of image super-resolution reconstruction are mainly based on interpolation methods and learning-based methods.
  • the interpolation-based method cannot introduce additional useful high-frequency information, so the phenomenon of image blurring is easy to occur in the reconstructed image.
  • Learning-based methods can improve resolution by learning to exploit the relationship between low-resolution images and high-resolution images.
  • the currently known learning-based methods are computationally complex due to involving large-dimensional outputs or complex network structures, making it difficult to implement on edge devices with relatively limited resources (such as storage resources and power resources) and computing power.
  • the present disclosure provides a method, electronic device and storage medium that can alleviate, alleviate or even eliminate the above problems.
  • an image reconstruction method for an edge device comprising: extracting low-level features from an input image of a first scale to generate a plurality of first feature maps, the plurality of first feature maps A feature map has a second scale larger than the first scale compared to the input image; low-level features are extracted from the input image to generate a plurality of second feature maps, the plurality of second feature maps are having the second scale; generating multiple mask maps based on the multiple second feature maps; generating multiple intermediate feature maps based on the multiple mask maps and the multiple first feature maps, the multiple Each of the intermediate feature maps has the second scale; and a reconstructed image with the second scale is synthesized based on the plurality of intermediate feature maps.
  • the extracting low-level features from the input image to generate the plurality of first feature maps comprises: performing a transposed convolution process on the input image to generate the plurality of first feature maps; and wherein the The extracting low-level features from the input image to generate the plurality of second feature maps includes: performing a transposed convolution process on the input image to generate the plurality of second feature maps.
  • transposing the input image to generate the plurality of first feature maps comprises: using a single transposed convolutional layer to process the input image to generate the plurality of first features and the performing transposed convolution processing on the input image to generate the plurality of second feature maps includes: using a single transposed convolution layer to process the input image to generate the plurality of second feature maps.
  • the extracting low-level features from the input image to generate a plurality of first feature maps includes: convolving the input image to generate a plurality of first initial feature map groups, each first initial The feature map group includes a plurality of first initial feature maps, and each first initial feature map has the first scale; for each first initial feature map group, through the plurality of first initial feature maps included in the Rearranging to generate a corresponding first feature map, and wherein said extracting low-level features from an input image to generate a plurality of second feature maps comprises: convolving said input image to generate a plurality of second initial features map group, each second initial feature map group includes a plurality of second initial feature maps, and each second initial feature map has the first scale; for each second initial feature map group, The plurality of second initial feature maps are rearranged to generate corresponding second feature maps.
  • the convolving the input image to generate the plurality of first initial feature map groups includes: using a single convolutional layer to convolve the input image to generate the plurality of first initial feature maps group; the performing convolution processing on the input image to generate a plurality of second initial feature map groups includes: using a single convolution layer to convolve the input image to generate the plurality of second initial feature map groups.
  • the convolving the input image to generate the plurality of first initial feature map groups includes: using a single convolutional layer to convolve the input image to generate the plurality of first initial feature maps group; the performing convolution processing on the input image to generate a plurality of second initial feature map groups includes: using a first convolution layer and a second convolution layer to convolve the input image to generate the plurality of second Initial feature map group.
  • the number of first initial feature maps included in each first initial feature map group depends on a scaling factor between the second scale and the first scale
  • each second initial feature map group includes The number of included second initial feature maps depends on the scaling factor of the second scale to the first scale.
  • each mask map represents a feature of the corresponding first feature map in the plurality of first feature maps weights; wherein generating a plurality of intermediate feature maps based on the plurality of mask maps and the plurality of first feature maps includes: weighting the corresponding first feature maps based on the feature weights represented by each mask map to generate multiple one of the intermediate feature maps.
  • the first feature map, the second feature map, and the mask map have the same number, and the mask map has the second dimension.
  • the generating a plurality of mask maps based on the plurality of second feature maps includes: forming a plurality of feature element groups based on a plurality of feature elements at corresponding positions in the plurality of second feature maps; A normalization process is performed on the feature elements in each feature element group respectively, and a corresponding mask map is generated based on the normalized feature elements.
  • the mask maps are at the pixel level, each mask map having the second scale and containing feature weights for each pixel in its corresponding first feature map.
  • the low-level features include at least one of texture features, edge features, and blob features of the input image.
  • the input image is a luminance channel image of the color image to be reconstructed.
  • the steps of the method are performed using a trained image reconstruction model to reconstruct the input image into a super-resolution image.
  • the image reconstruction model is obtained by training through the following steps: input training data in a training sample set consisting of a reference image and a degraded image into the image reconstruction model to obtain an output image; The total loss of the image reconstruction model is obtained from the reference image based on a predetermined loss function; the parameter adjustment of the image reconstruction model is performed on the original model according to the total loss to obtain the trained image reconstruction model.
  • an electronic device comprising: a memory configured to store computer-executable instructions; a processor configured to execute the above when the computer-executable instructions are executed by the processor Image reconstruction methods for edge devices.
  • a non-volatile computer-readable storage medium which stores computer-executable instructions, and when the computer-executable instructions are executed, performs the above image reconstruction for an edge device method.
  • FIG. 1 schematically shows an example scenario in which the technical solutions provided by the present disclosure can be applied
  • FIG. 2 schematically shows an example structure of a convolutional neural network in the related art
  • FIG. 3 schematically shows an example flowchart of a method for image reconstruction according to some embodiments of the present disclosure
  • FIG. 4 schematically illustrates an example structure of an image reconstruction model according to some embodiments of the present disclosure
  • Figure 5 schematically illustrates a normalization process according to some embodiments of the present disclosure
  • FIG. 6 schematically illustrates an example structure of another image reconstruction model according to some embodiments of the present disclosure
  • Figure 7 schematically illustrates a rearrangement process according to some embodiments of the present disclosure
  • FIG. 9 schematically shows an example structure of an apparatus for image reconstruction in an edge device according to some embodiments of the present disclosure.
  • FIG. 10 schematically illustrates an example architecture diagram of a computing device according to some embodiments of the present disclosure.
  • Edge device refers to the device with computing resources and network resources between data sources and cloud services.
  • a user terminal device can be an edge device between people and cloud services
  • a gateway can be an edge device between a smart home and a cloud center.
  • Edge artificial intelligence refers to the application of artificial intelligence algorithms, technologies, and products on the edge side close to the data generating end, such as building a convolutional neural network model in edge devices.
  • Image super-resolution refers to the technology of reconstructing low-resolution (LR) images into high-resolution (HR) images using image processing methods.
  • Convolutional Network refers to a neural network structure that uses images as input/output and replaces scalar weights by convolution (filters).
  • CNN Convolutional Neural Network
  • convolutional neural networks with 3 layers are considered shallow, while networks with more than 5 or 10 layers are generally considered deep.
  • Low-level features Refers to features that can be extracted by shallow layers (such as layers 1-3) of a convolutional neural network, sometimes called shallow features. Low-level features usually correspond to information sensitive to human low-level visual centers, such as edge information.
  • Shuffle refers to combining a set of images of a first scale into an image of a second scale larger than the first scale through pixel rearrangement.
  • s2 small-scale images of size h ⁇ w can be used, and a large-scale image of size sh ⁇ sW can be formed by rearranging the pixels at the same position in the s2 small - scale images in a certain order An s ⁇ s patch in the image.
  • plural herein refers to at least two, ie, two or more.
  • the "scale” of an image herein can be a scale described by resolution, or it can be a scale described by other similar physical quantities. It will also be understood that, although the terms first, second, etc. may be used herein to describe various means, features or sections, these means, features or sections should not be limited by these terms. These terms are only used to distinguish one device, feature or section from another device, feature or section.
  • FIG. 1 schematically shows an example scenario 100 in which the technical solutions provided by the present disclosure may be applied.
  • the system shown in the scenario 100 includes an edge device 110 , a network 120 and a cloud service 130 .
  • the edge device 110 is deployed at the edge of the entire system and can communicate with the cloud service 130 through the network 120 .
  • edge devices can provide more real-time and fast responses and higher information security, but their storage capacity, computing power, and available energy (such as power resources) are often very limited, so they are not suitable for overly complex execution. calculation processing.
  • edge device 110 is illustrated as computer 112 , mobile phone 114 , and television 116 . It will be appreciated that this is merely a schematic representation. In fact, the edge device 110 may be any edge device or combination of edge devices that can be used to perform the image reconstruction technical solutions provided by the present disclosure.
  • edge devices 110 may include user terminal devices, Internet of Things (IoT) devices, and the like. End-user devices may include, for example, desktop computers, laptop computers, tablet computers, mobile phones and other mobile devices, and wearable devices (eg, smart watches, smart glasses, headsets), and the like. IoT devices may include any device capable of participating in and/or capable of communicating with an Internet of Things (IoT) system or network.
  • IoT Internet of Things
  • equipment and equipment associated with vehicles such as navigation systems, autonomous driving systems, etc.
  • equipment, equipment and/or infrastructure associated with industrial manufacturing and production etc.
  • smart entertainment systems eg, televisions, audio systems, electronic game systems
  • smart home or office systems security systems, etc.
  • Edge devices 110 may be used to implement artificial intelligence algorithms, technologies, and products.
  • the edge device 110 is configured to perform super-resolution reconstruction on the input image y to obtain an upscaled reconstructed image Y.
  • the input image y may be received by the edge device 110 from other devices via a network, or the input image y may also be a locally stored image.
  • the input image y may be a frame image in a picture or a video, or an image obtained by preprocessing the frame image in the picture or video.
  • the input image may be a single-channel grayscale image, or may be an image obtained based on one or more channels in a multi-channel image.
  • the input image may be a Y-channel image.
  • image super-resolution reconstruction can be achieved by two types of methods, namely interpolation methods and machine learning methods.
  • the former achieves image upsampling by inserting new elements between pixels based on the original image pixels using a suitable interpolation algorithm, such as nearest neighbor interpolation, bilinear interpolation, mean interpolation, median interpolation, etc.
  • a suitable interpolation algorithm such as nearest neighbor interpolation, bilinear interpolation, mean interpolation, median interpolation, etc.
  • Such methods cannot bring more information about the original image, so the quality of the super-resolution reconstructed image will inevitably be affected.
  • the latter usually implements image upsampling through machine learning models. Such methods can learn more features about the original image.
  • the currently used machine learning models usually have relatively complex structures, and thus consume a large amount of computing resources and energy, and thus cannot achieve real-time performance for such applications on edge devices.
  • a technical solution suitable for realizing image super-resolution reconstruction on various edge devices with limited resources is provided. It uses simpler learnable algorithms and can leverage a large number of edge AI devices to perform image enhancement tasks.
  • FIG. 2 schematically shows the structure of a convolutional neural network 200 used in an AI algorithm.
  • the convolutional neural network 200 includes an input layer 210 , a hidden layer 220 and an output layer 230 .
  • the input layer 210 receives 4 input images.
  • the hidden layer 220 includes 3 convolutional units (corresponding to 3 output feature maps).
  • the output layer 230 includes 2 convolutional units, producing 2 output images. each has a weight
  • the box of corresponds to a convolution/filter (eg a 3 ⁇ 3 or 5 ⁇ 5 kernel), where k indicates the input layer number and i and j indicate the input and output units, respectively.
  • Bias is a scalar added to the output of the convolution. The result of several convolutions and bias additions is then passed through the activation box.
  • the activation boxes usually correspond to rectified linear units (ReLU), sigmoid functions, or hyperbolic tangent functions, etc.
  • the filter and bias parameters can be obtained through a training process using a set of input/output sample images, and can be adjusted to fit some optimization criteria, which can be determined according to the specific application. After training, the filters and biases can be fixed during system operation.
  • FIG. 3 schematically illustrates an example flow diagram of a method 300 for image reconstruction in an edge device according to some embodiments of the present disclosure.
  • the image reconstruction may be, for example, a super-resolution reconstruction from a low resolution (LR) image (eg 1080P: 1920*1080) to a high resolution (HR) image (eg 4K: 3840*2160).
  • LR low resolution
  • HR high resolution
  • the image reconstruction may scale the LR image by a desired target scale factor to obtain a reconstructed HR image.
  • low-level features are extracted from the input image having the first scale to generate a plurality of first feature maps.
  • the first feature map has a second scale enlarged by the target scale factor compared to the input image.
  • the first scale may be: 1920*1080
  • the second scale may be 3840*2160.
  • the target scale factor is 2.
  • the input image may be an LR image acquired by the edge device from other devices or local storage, which may contain various types of images, such as landscape images, people images, building images, and the like.
  • the input image may be a single-channel grayscale image.
  • the input image may also be an image of a color channel of a color image.
  • the input image may be an image of a Y channel (ie, a luminance channel).
  • the edge device may directly extract the luminance channel from the multi-channel image to be reconstructed.
  • a luma channel may be a channel characterizing luma contained in an image encoded in a color space such as YUV, YCbCr, Lab, etc., eg, the Y channel in YUV, YCbCr, and the L channel in Lab.
  • the edge device may derive the luminance channel based on multiple channels in the multi-channel image to be reconstructed.
  • its luminance channel can be extracted by converting it into a color space such as YUV, YCbCr, or Lab based on the values of its R, G, and B channels.
  • the luminance channel of the image can also be acquired in other ways.
  • low-level features refer to low-level image detail features relative to high-level features such as image semantic features. Such features usually correspond to information perceived by the human lower-level visual center.
  • the low-level features may include edge features, texture features, small blob features, etc. of the image.
  • the low-level features may be features extracted by a shallow layer (eg, the first convolutional layer) of the convolutional neural network.
  • the low-level features are features at the output of a single hidden layer convolutional neural network (ie, after being convolved through a single convolutional layer).
  • low-level features are extracted from the input image to generate a plurality of second feature maps.
  • the second feature map has a second scale scaled up by the same target scale factor.
  • a plurality of mask maps are generated based on the plurality of second feature maps.
  • Each mask map is used to represent feature weights of the corresponding first feature maps in the plurality of first feature maps.
  • Feature weights can be used to indicate how important a feature is to image reconstruction. The larger the weight, the more important the feature is.
  • the feature weight may be related to how sensitive the human vision is to the feature in the image. Larger weights indicate that the feature is more visually sensitive and thus enhanced more during reconstruction.
  • the plurality of mask maps are generated by normalizing the plurality of second feature maps.
  • the mask map may be at the pixel level.
  • Each mask map has the same second scale as the first feature map and contains feature weights for each pixel in its corresponding first feature map.
  • the mask elements in each mask map represent feature weights of pixels at corresponding positions in its corresponding first feature map.
  • a plurality of intermediate feature maps are generated based on the plurality of mask maps and the plurality of first feature maps.
  • the plurality of intermediate feature maps each have the second scale.
  • a reconstructed image with the second scale is synthesized based on the plurality of intermediate feature maps.
  • the corresponding first feature maps in the plurality of first feature maps may be weighted according to the feature weights represented by the mask map to synthesize the reconstructed image with the second scale.
  • the reconstructed image is an image in which super-resolution reconstruction is realized.
  • the image pixel at a certain position in the reconstructed image can be reconstructed by making the mask at the specific position in the plurality of mask maps and the plurality of first feature maps Elements and feature elements are weighted and summed separately.
  • the image enhancement task can be performed by using the edge devices that have been heavily invested in.
  • the network structure involved is simpler and requires less computation, thereby achieving a simpler learnable system. s solution.
  • the feature elements of important channels are enhanced, thereby ensuring the quality of the reconstructed image.
  • FIG. 3 shows the various steps in sequence, this is by way of example only. In fact, the steps may be performed in a different order than shown, eg, some of the steps may be performed in the reverse order of that shown or in parallel. For example, step 310 may be performed in reverse order or in parallel with steps 320 and 330 .
  • FIG. 4 illustrates an example structure of an image reconstruction model according to some embodiments of the present disclosure.
  • the image reconstruction model 400 includes a first transposed convolution module 410 , a second transposed convolution module 420 , a normalization module 430 and a weighted summation module 440 .
  • This image reconstruction model can be implemented using a convolutional neural network.
  • the first transposed convolution module 410 is used to extract low-level features from the input image y to generate a plurality of first feature maps.
  • the first transposed convolution module 410 may include C feature channels F 1 , . Set up convolutions to generate C first feature maps f 1 , . . . , f C . Since the applied convolutions are transposed convolutions, the scales of the first feature maps f 1 , . . . , f C are enlarged relative to the input image y.
  • the convolution parameters used by the transposed convolution such as the scale of the convolution kernel, are determined based at least in part on the target magnification ratio r.
  • transposed convolution may be implemented with a single transposed convolution layer.
  • a single convolution kernel can be used to transpose the input image to obtain the corresponding first feature map, that is, F 1 , . . . , F in FIG. 4 . c correspond to a single convolution kernel, respectively.
  • the stride of the transposed convolution can be set to s
  • the size of the convolution kernel can be set to rk ⁇ rk, where r represents the target enlargement ratio of the reconstructed image Y compared to the input image y, and k represents the positive Integers such as 1, 2, etc.
  • the number C of feature channels can be selected according to actual needs, for example, 4, 8, 16 and so on.
  • the second transposed convolution module 420 is used to extract low-level features from the input image y to generate a plurality of second feature maps.
  • the second transposed convolution module 420 may include C feature channels M 1 , . . . , M c for applying transposed convolutions to the input image y respectively to generate C second feature maps m 1 ', ..., m C '.
  • the C feature channels here may have a one-to-one correspondence with the feature channels of C in the first transposed convolution module 410 . Since the applied convolutions are transposed convolutions, the scales of the second feature maps m 1 ', . . . , m C ' are enlarged relative to the input image y.
  • the convolution parameters used in the transposed convolution may be determined corresponding to the transposed convolution in the first transposed convolution module 410 .
  • transposed convolution may be implemented with a single transposed convolution layer.
  • a single convolution kernel can be used to transpose the input image to obtain the corresponding second feature map, that is, M 1 , . . . , M in FIG. 4 . c correspond to a single convolution kernel, respectively.
  • the transposed convolution applied in the feature channels M 1 , . . . , Mc is a bias-enabled transposed convolution, ie, the convolution output of each channel plus the corresponding Bias, as b 1 , . . . , b C as shown in the figure.
  • bias-enabled transposed convolution ie, the convolution output of each channel plus the corresponding Bias, as b 1 , . . . , b C as shown in the figure.
  • the normalization module 430 is configured to generate a plurality of mask maps based on the plurality of second feature maps.
  • the plurality of mask maps m 1 , . . . , m C may be generated by performing a normalization operation on the plurality of second feature maps by a normalization function (eg, a SOFTMAX function).
  • a normalization function eg, a SOFTMAX function
  • FIG. 5 schematically illustrates an example process of normalization processing according to an embodiment of the present disclosure. To simplify the description, only four 2*2 second feature maps m 1 ', m 2 ', m 3 ', m 4 ' are shown in FIG. 5 .
  • a feature element group may be formed based on a plurality of feature elements at corresponding positions in the plurality of second feature maps 510 .
  • normalization processing may be performed on the feature elements in the group according to the feature element group to obtain a normalized feature element group (b1, b2, b3, b4). For example, as shown in FIG.
  • the feature element group a can be input into the SOFTMAX module as a 4-dimensional vector, and the normalized feature element group b can be generated by using the SOFTMAX function to normalize each vector element.
  • the normalization process can also be performed by any other suitable normalization function, such as a sigmoid function, an arctangent function, and the like.
  • Each element in the normalized feature element group (b1, b2, b3, b4) corresponding to the i-th row and j-th column of the image position is used as the corresponding position in the mask map m 1 , m 2 , m 3 , m 4 ( That is, the element at row i, column j).
  • FIG. 5 shows four second feature maps with a scale of 2*2, that is, corresponding to four feature channels, this is only an example, in fact, other numbers with other features can be defined as needed Scale feature channels.
  • the weighted summation module 440 weights the corresponding first feature maps of the plurality of first feature maps according to the feature weights, so as to synthesize the reconstructed image Y with the second scale. As shown in Fig.
  • the first feature maps f 1 , ..., f C can be multiplied with the corresponding mask maps m 1 , ..., m C respectively, and the multiplication here can be Refers to the multiplication of each element f c (i, j) in the first feature map by the corresponding element m c (i, j) in the corresponding position in the corresponding mask map, that is, based on the weight of each feature in the mask map Corresponding elements at the same position in the corresponding first feature map are weighted.
  • c represents the first feature map of the c-th feature channel
  • i, j represent the i-th row and the j-th column in the figure.
  • each feature channel after multiplication can be summed, for example, the elements at the corresponding positions are directly added to obtain the reconstructed image Y, that is:
  • the multiplied result of each feature channel can be used to obtain the reconstructed image Y in the following manner: where k c represents the summation weight of the corresponding feature channel, which can be obtained by training with other model parameters.
  • composition structure of deep learning is simplified, so that each layer has simple and clear functions, thus reducing the calculation cost and speeding up the running speed.
  • the disclosed scheme can be viewed as an "on-the-fly" reconstruction scheme.
  • FIG. 6 illustrates another example structure of an image reconstruction model according to some embodiments of the present disclosure.
  • the image reconstruction model 600 includes a first convolution module 610 , a second convolution module 620 , a normalization module 630 and a weighted summation module 640 .
  • the first convolution module 610 may also be regarded as including C feature channels, but in the process of acquiring the first feature map of each feature channel, each feature channel further includes s 2 sub-channels.
  • the input image y can be input to C*s 2 sub-channels, namely F 1 to where every s 2 sub-channels (corresponding to s 2 convolution kernels) (for example, F 1 to or to etc.) can constitute a sub-channel group, and can be regarded as corresponding to one feature channel among the C feature channels.
  • the number s of sub-channels may be determined based at least in part on a target magnification ratio r, for example, s may be set equal to r.
  • the first convolution module 610 may include a single convolution layer to implement convolution processing.
  • a single convolution kernel can be used to convolve the input image to obtain the corresponding first initial feature map, that is, F 1 to correspond to a single convolution kernel, respectively.
  • the convolution stride may be set to 1
  • the size of the convolution kernel may be set to k ⁇ k.
  • F 1 to The output is C first initial feature map groups, each first initial feature map group corresponds to a sub-channel group, including s 2 first initial feature maps output through s 2 sub-channels in the sub-channel group, where each The first initial feature maps have the same first scale as the input image.
  • each first initial feature map group a plurality of first initial feature maps included in the group may be rearranged to generate a corresponding first feature map.
  • the s 2 first initial feature maps included in each first initial feature map group may be rearranged into a single first feature map by the rearrangement module.
  • the feature elements at the corresponding positions of the first initial feature maps in the s 2 first initial feature maps may be rearranged in a predetermined order to form a block at the corresponding position in the first feature maps.
  • FIG. 7 schematically illustrates a rearrangement process according to an embodiment of the present disclosure.
  • the four elements a1, a2, a3, a4 in the first row and first column of the first initial feature map group f 1 ', f 2 ', f 3 ', f 4 ' are used to form the first
  • the four elements are sequentially arranged to the positions of the four elements contained in the first block.
  • the elements in the i-th row and j-th column of each first initial feature map in the first initial feature map group are used to form a tile at the corresponding position in the first feature map, and these elements are sequentially arranged to the tile The contained element position.
  • the second convolution module 620 includes C feature channels corresponding to the feature channels of the first convolution module 610 one-to-one.
  • each feature channel further includes s 2 sub-channels.
  • these sub-channels may also have a one-to-one correspondence with the sub-channels included in the feature channels of the first convolution module 610 .
  • the input image y can be input to C*s 2 sub-channels, namely M 1 to where every s 2 subchannels (e.g. M 1 to or to etc.) can form a sub-channel group.
  • M 1 to with F 1 to can be in one-to-one correspondence.
  • the convolution parameters used by the second convolution module 620 can also be determined based at least in part on the target magnification ratio.
  • the second convolution module 620 may be implemented by a single convolution layer.
  • a single convolution kernel can be used to convolve the input image to obtain the corresponding second initial feature map, that is, M 1 to correspond to a single convolution kernel, respectively.
  • the convolution stride can be set to 1, and the size of the convolution kernel can be set to k ⁇ k.
  • M 1 to The output of is C second initial feature map groups, and each second initial feature map group includes s 2 second initial feature maps.
  • the s 2 second initial feature maps may be rearranged into a single second feature map through the rearrangement module.
  • the rearrangement process may be the same as that employed by the first convolution module 610 .
  • the second convolution module 620 may include one or more additional two-dimensional convolutional layers, which will make the mask decision more accurate, thereby helping to obtain better quality of the reconstructed image.
  • Normalization module 630 and weighted summation module 640 may work in the same manner as normalization module 430 and weighted summation module 440 in FIG. 4 .
  • FIG. 3 can be implemented by means of the image reconstruction models shown in FIGS. 4-7 , and can also be similarly implemented by means of any other suitable image reconstruction models.
  • the image reconstruction model according to the embodiment of the present disclosure utilizes a convolutional neural network, which is based on learning, and can obtain better reconstruction effects through training.
  • FIG. 8 shows a training method 800 for training an image reconstruction model according to an embodiment of the present disclosure.
  • ] or L L2 (Y, r) E[(Yr) 2 ], where E represents a The average value over the set of training samples approximates the expected value, r denotes the reference image, and Y denotes the output image obtained when the degraded image y corresponding to the reference image is input.
  • the image reconstruction model may be of the form shown in Figures 4-7.
  • Degraded images with reduced resolution can be generated as input images based on a set of reference images.
  • the degraded image may have a first scale
  • the reference image may have a second scale
  • the ratio of the second scale to the first scale may be the target magnification ratio.
  • the target magnification ratio can be selected according to actual needs, for example, it can be selected as 2, 3, 4 and so on.
  • generating the degraded image can be achieved in various ways, such as downsampling, pooling, filtering, etc.
  • the values of adjacent pixels in the reference image can be averaged as the pixel value at the corresponding position in the degraded image.
  • noise can also be added to the degraded image.
  • the above processing may be performed on the luminance channel image of the reference image.
  • a training sample set can be constructed based on the reference image and the generated degraded image.
  • the training data in the training sample set may be input into the image reconstruction model to obtain a corresponding output image.
  • the training data includes degraded images and reference images.
  • the total loss of the image reconstruction model may be calculated based on a predetermined loss function according to each output image and the corresponding reference image.
  • ] or L L2 (Y, r) E[( Yr) 2 ], where E represents the expected value approximated by the mean value over a set of training samples, r represents the reference image, and Y represents the output image obtained when the degraded image y corresponding to the reference image is input.
  • L1 and L2 represent the L1 norm and the L2 norm, respectively.
  • the parameters of the image reconstruction model may be adjusted according to the total loss to obtain a trained image reconstruction model.
  • the parameter adjustment will be such that the total loss tends to be minimized during training.
  • the above process can be iterated multiple times, and optionally, when the total loss falls within a predetermined threshold range or does not substantially change, the training can be considered complete.
  • different feature channels in the image reconstruction model can be made to focus on different types of features, such as image features with different gradients, through a preset mechanism, so that in the subsequent use process, the trained image
  • the reconstructed model can better extract the features in the input image based on different feature channels, thus helping to improve the image quality of the reconstructed image.
  • the evaluation of the image reconstruction scheme according to the embodiment of the present disclosure by using the images and video frames of the publicly available data set shows that, considering the amount of model parameters, the running speed and the running power of the 2-fold superscore, the scheme according to the embodiment of the present disclosure Compared with the existing known solutions, the operation speed is greatly improved, and the operation power is smaller. In the case of small models, it can even reach 120FHD/s.
  • FHD refers to 1080P image frames.
  • the speed is also greatly improved, and the output of 4K images at 60FPS (60 frames per second) can be achieved. .
  • Fig. 9 schematically shows an example structure of an apparatus 900 for image reconstruction in an edge device according to some embodiments of the present disclosure.
  • the image reconstruction apparatus 900 includes a first feature extraction module 910 , a second feature extraction module 920 , a mask map generation module 930 and a synthesis module 940 .
  • the first feature extraction module 910 may be configured to extract low-level features from an input image at a first scale to generate a plurality of first feature maps, the first feature maps having a target scale factor enlarged compared to the input image second scale;
  • the second feature extraction module 920 may be configured to extract low-level features from the input image to generate a plurality of second feature maps, the second feature maps having the second scale;
  • the mask map generation module may be configured to The second feature maps generate a plurality of mask maps, each mask map has a second scale and represents feature weights of the corresponding first feature maps in the plurality of first feature maps;
  • the synthesis module 940 may be configured to follow the feature weights Corresponding first feature maps of the plurality of first feature maps are weighted to synthesize a reconstructed image having the second scale.
  • the image reconstruction apparatus 900 may be deployed on the edge device 110 shown in FIG. 1 . It should be understood that the image reconstruction apparatus 900 may be implemented in software, hardware or a combination of software and hardware. Multiple different modules may be implemented in the same software or hardware structure, or one module may be implemented by multiple different software or hardware structures.
  • the image reconstruction apparatus 900 may be used to implement the image reconstruction method described above, and the relevant details thereof have been described in detail above, and are not repeated here for the sake of brevity.
  • the image reconstruction apparatus 900 may have the same features and advantages as described with respect to the aforementioned image reconstruction methods.
  • FIG. 10 schematically illustrates an example architecture diagram of an electronic device according to some embodiments of the present disclosure. For example, it may represent edge device 110 in FIG. 1 .
  • the example electronic device 1000 includes a processing system 1001 , one or more computer-readable media 1002 , and one or more I/O interfaces 1003 communicatively coupled to each other.
  • the electronic device 1000 may also include a system bus or other data and command transfer system that couples the various components to each other.
  • the system bus may include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor utilizing any of a variety of bus architectures or a local bus, or may also include, for example, control and data lines.
  • Processing system 1001 represents functionality that uses hardware to perform one or more operations. Accordingly, processing system 1001 is illustrated as including hardware elements 1004 that may be configured as processors, functional blocks, and the like. This may include implementing in hardware an application specific integrated circuit or other logic device formed using one or more semiconductors.
  • the hardware element 1004 is not limited by the material from which it is formed or the processing mechanism employed therein.
  • a processor may be composed of semiconductor(s) and/or transistors (eg, electronic integrated circuits (ICs)).
  • the processor-executable instructions may be electronically-executable instructions.
  • Computer readable medium 1002 is illustrated as including memory/storage 1005 .
  • Memory/storage 1005 represents memory/storage associated with one or more computer-readable media.
  • Memory/storage 1005 may include volatile storage media (such as random access memory (RAM)) and/or non-volatile storage media (such as read only memory (ROM), flash memory, optical disks, magnetic disks, etc.).
  • RAM random access memory
  • ROM read only memory
  • Memory/storage 1005 may include fixed media (eg, RAM, ROM, fixed hard drives, etc.) as well as removable media (eg, flash memory, removable hard drives, optical discs, etc.).
  • the memory/storage device 1005 may be used to store various image data and the like mentioned in the above embodiments.
  • the computer-readable medium 1002 may be configured in various other ways as described further below.
  • One or more input/output interfaces 1003 represent functionality that allows a user to enter commands and information into electronic device 1000 and also allows information to be presented to the user and/or sent to other components or devices using various input/output devices.
  • input devices include keyboards, cursor control devices (eg, mice), microphones (eg, for voice input), scanners, touch capabilities (eg, capacitive or other sensors configured to detect physical touch), cameras ( For example, visible or invisible wavelengths (such as infrared frequencies) may be employed to detect motion that does not involve touch as gestures), network cards, receivers, and the like.
  • Examples of output devices include display devices (eg, monitors or projectors), speakers, printers, haptic responsive devices, network cards, transmitters, and the like.
  • the input image or the image to be reconstructed may be received through an input device, and the reconstructed image may be presented through an output device.
  • the electronic device 1000 also includes an image reconstruction strategy 1006 .
  • Image reconstruction strategy 1006 may be stored in memory/storage device 1005 as computational program instructions.
  • the image reconstruction strategy 1006 may, together with the processing system 1001 and the like, implement the full functionality of the various modules of the image reconstruction apparatus 900 described with respect to FIG. 9 .
  • modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • module generally refer to software, firmware, hardware, or a combination thereof.
  • the features of the techniques described herein are platform independent, meaning that the techniques can be implemented on a variety of computing platforms with a variety of processors.
  • Computer-readable media may include various media that can be accessed by electronic device 1000 .
  • Computer-readable media may include "computer-readable storage media” and "computer-readable signal media.”
  • Computer-readable storage medium refers to media and/or devices capable of persistent storage of information, and/or tangible storage devices, as opposed to mere signal transmissions, carrier waves, or signals per se.
  • Computer-readable storage media refers to non-signal bearing media.
  • Computer readable storage media includes, for example, volatile and nonvolatile, removable and non-removable media and/or are suitable for storage of information such as computer readable instructions, data structures, program modules, logic elements/circuits or other data. ) hardware such as a storage device implemented by a method or technology.
  • Examples of computer-readable storage media may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROMs, digital versatile disks (DVDs) or other optical storage devices, hard disks, cassettes, magnetic tape, magnetic disk storage A device or other magnetic storage device, or other storage device, tangible medium, or article of manufacture suitable for storing desired information and accessible by a computer.
  • Computer-readable signal medium refers to a signal-bearing medium configured to transmit instructions to hardware of electronic device 1000, such as via a network.
  • Signal media typically embody computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave, data signal or other transport mechanism.
  • Signal media also includes any information delivery media.
  • signal media include wired media, such as a wired network or direct wire, and wireless media, such as acoustic, RF, infrared, and other wireless media.
  • hardware element 1001 and computer-readable medium 1002 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware that, in some embodiments, may be used to implement the techniques described herein. At least some aspects. Hardware elements may include integrated circuits or systems on a chip, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), complex programmable logic devices (CPLDs), and other implementations in silicon or components of other hardware devices. In this context, a hardware element may act as a processing device for performing program tasks defined by the instructions, modules and/or logic embodied by the hardware element, as well as a hardware device for storing instructions for execution, eg, previously described computer-readable storage medium.
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • CPLDs complex programmable logic devices
  • software, hardware or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer-readable storage medium and/or by one or more hardware elements 1001 .
  • the electronic device 1000 may be configured to implement specific instructions and/or functions corresponding to software and/or hardware modules.
  • implementing a module as a module executable by electronic device 1000 as software may be implemented, at least in part, in hardware, eg, using a computer-readable storage medium and/or hardware element 1001 of the processing system.
  • Instructions and/or functions may be executed/operable by, for example, one or more electronic devices 1000 and/or processing system 1001 to implement the techniques, modules, and examples described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本公开提供了一种用于边缘设备的图像重建方法、电子设备和存储介质。图像重建方法包括:从第一尺度的输入图像提取低级别特征以生成多个第一特征图,多个第一特征图相比于输入图像均具有大于第一尺度的第二尺度;从输入图像提取低级别特征以生成多个第二特征图,多个第二特征图均具有第二尺度;基于多个第二特征图生成多个掩模图;基于多个掩模图和多个第一特征图生成多个中间特征图,多个中间特征图均具有第二尺度,基于多个中间特征图合成具有第二尺度的重建图像。该方法有助于在较低的资源消耗下实现较好的图像超分辨率重建效果。

Description

图像重建方法、电子设备和计算机可读存储介质 技术领域
本公开涉及图像处理领域,具体地涉及一种用于边缘设备的图像重建方法、电子设备和计算机可读存储介质。
背景技术
随着深度学习和人工智能技术的不断发展,人们对在移动终端等边缘设备上高效实现这些技术的兴趣日益增长。这种趋势被称为边缘人工智能(Edge Artificial Intelligence,Edge-AI)。利用人工智能技术的一些任务,比如图像分类、目标检测等,已经在边缘设备上成功实现,因为其涉及的输出(例如标签、边界框等)都是低维的。
图像的超分辨率(Super Resolution,SR)重建目前已经成为图像处理领域的一个重要分支。目前,图像超分辨率重建的方法主要基于插值的方法和基于学习的方法。但是基于插值的方法因为不能够引入额外有用的高频信息,所以在重建图像中容易产生图像模糊的现象。而基于学习的方法可以通过学习利用低分辨率图像与高分辨率图像间的关系,来实现分辨率的提高。但是目前已知的基于学习的方法因涉及大维度的输出或复杂的网络结构而导致计算复杂高,从而难以在资源(比如存储资源和电力资源)和计算能力相对有限边缘设备上实现。
发明内容
有鉴于此,本公开提供了一种的方法、电子设备及存储介质,可以缓解、减轻或甚至消除上述问题。
根据本公开的一方面,提供了一种用于边缘设备的图像重建方法,所述方法包括:从第一尺度的输入图像提取低级别特征以生成多个第一特征图,所述多个第一特征图相比于所述输入图像均具有大于所述第一尺度的第二尺度;从所述输入图像提取低级别特征以生成多个第二特征图,所述多个第二特征图均具有所述第二尺度;基于所述多个第二特征图生成多个掩模图;基于所述多个掩模图和所述多个第一特征图生成多个中间特征图,所述多个中间特征图均具有所述第二尺度;基于所述多个中间特征图合成具有所述第二尺度的重建图像。
在一些实施例中,所述从输入图像提取低级别特征以生成多个第一特征图包括:对所述输入图像进行转置卷积处理以生成所述多个第一特征图;以及其中所述从输入图像提取低级别特征以生成多个第二特征图包括:对输入图像进行转置卷积处理以生成所述多个第二特征图。
在一些实施例中,所述对输入图像进行转置卷积处理以生成所述多个第一特征图包括:使用单个转置卷积层对输入图像进行处理以生成所述多个第一特征图;以及所述对输入图像进行转置卷积处理以生成所述多个第二特征图包括:使用单个转置卷积层对输入图像进行处理以生成所述多个第二特征图。
在一些实施例中,所述从输入图像提取低级别特征以生成多个第一特征图包括:对所述输入图像进行卷积处理以生成多个第一初始特征图组,每个第一初始特征图组包括多个第一初始特征图,且每个第一初始特征图具有所述第一尺度;对于每个第一初始特征图组,通过对其所包含的多个第一初始特征图进行重排列而生成相应的第一特征图,以及其中所述从输入图像提取低级别特征以生成多个第二特征图包括:对所述输入图像进行卷积处理以生成多个第二初始特征图组,每个第二初始特征图组包括多个第二初始特征图,且每个第二初始特征图具有所述第一尺度;对于每个第二初始特征图组,通过对其所包含的多个第二初始特征图进行重排列而生成相应的第二特征图。
在一些实施例中,所述对输入图像进行卷积处理以生成多个第一初始特征图组包括:使用单个卷积层对输入图像进行卷积,以生成所述多个第一初始特征图组;所述对输入图像进行卷积处理以生成多个第二初始特征图组包括:使用单个卷积层对输入图像进行卷积,以生成所述多个第二初始特征图组。
在一些实施例中,所述对输入图像进行卷积处理以生成多个第一初始特征图组包括:使用单个卷积层对输入图像进行卷积,以生成所述多个第一初始特征图组;所述对输入图像进行卷积处理以生成多个第二初始特征图组包括:使用第一卷积层和第二卷积层对输入图像进行卷积,以生成所述多个第二初始特征图组。
在一些实施例中,每个第一初始特征图组中包含的第一初始特征图的数量取决于所述第二尺度与所述第一尺度的比例因子,每个第二 初始特征图组中包含的第二初始特征图的数量取决于所述第二尺度与所述第一尺度的比例因子。
在一些实施例中,所述多个掩模图与所述多个第一特征图存在对应关系,且每个掩模图表征所述多个第一特征图中对应的第一特征图的特征权重;其中基于所述多个掩模图和所述多个第一特征图生成多个中间特征图包括:基于每个掩模图所表征的特征权重对对应的第一特征图加权来生成多个所述中间特征图。
在一些实施例中,所述第一特征图、所述第二特征图以及所述掩模图具有相同的数量,且所述掩模图具有所述第二尺度。
在一些实施例中,所述基于所述多个第二特征图生成多个掩模图包括:基于所述多个第二特征图中对应位置处的多个特征元素形成多个特征元素组;分别对各特征元素组中的特征元素执行归一化处理,且基于归一化的特征元素生成对应的掩模图。
在一些实施例中,所述掩模图为像素级别的,每个掩模图具有所述第二尺度且包含用于其对应的第一特征图中每个像素的特征权重。
在一些实施例中,所述低级别特征包含所述输入图像的纹理特征、边缘特征和斑点特征中的至少一个。
在一些实施例中,所述输入图像为待重建彩色图像的亮度通道图像。
在一些实施例中,使用经训练的图像重建模型执行所述方法的步骤以将所述输入图像重建为超分辨率图像。
在一些实施例中,所述图像重建模型是通过以下步骤训练得到的:将由参考图像和退化图像组成的训练样本集中的训练数据输入到所述图像重建模型中,得到输出图像;根据输出图像和参考图像基于预定的损失函数得到所述图像重建模型的总损失;根据所述总损失对原始模型进行所述图像重建模型的参数调整以得到经训练的图像重建模型。
根据本公开的另一方面,提供了一种电子设备,包括:存储器,其被配置成存储计算机可执行指令;处理器,其被配置成当所述计算机可执行指令被处理器执行时执行上述用于边缘设备的图像重建方法。
根据本公开的又一方面,提供了一种非易失性计算机可读存储介 质,其存储有计算机可执行指令,当所述计算机可执行指令被执行时,执行上述用于边缘设备的图像重建方法。
根据在下文中所描述的实施例,本公开的这些和其它方面将是清楚明白的,并且将参考在下文中所描述的实施例而被阐明。
附图说明
在下面结合附图对于示例性实施例的描述中,本公开的更多细节、特征和优点被公开,在附图中:
图1示意性示出了可以应用本公开所提供的技术方案的示例场景;
图2示意性示出了相关技术中的卷积神经网络的示例结构;
图3示意性示出了根据本公开的一些实施例的进行图像重建的方法的示例流程图;
图4示意性示出了根据本公开的一些实施例的图像重建模型的示例结构;
图5示意性示出了根据本公开的一些实施例的归一化过程;
图6示意性示出了根据本公开的一些实施例的另一图像重建模型的示例结构;
图7示意性示出了根据本公开的一些实施例的重排列过程;
图8示意性示出了根据本公开的一些实施例的训练图像重建模型的方法;
图9示意性示出了根据本公开的一些实施例的在边缘设备中进行图像重建的装置的示例结构;
图10示意性示出了根据本公开的一些实施例的计算设备的示例架构图。
具体实施方式
在详细介绍本公开的实施例之前,首先对一些相关的概念进行解释:
边缘设备:指在数据源和云服务之间的具有计算资源和网络资源的设备。例如,用户终端设备可以是人与云服务之间的边缘设备,而网关可以是智能家居和云中心之间的边缘设备。
边缘人工智能:指在靠近数据产生端的边缘侧进行人工智能算法、 技术、产品的应用,比如在边缘设备中构建卷积神经网络模型。
图像超分辨率:指利用图像处理方法将低分辨率(Low Resolution,LR)图像重建为高分辨率(High Resolution,HR)图像的技术。
卷积神经网络(CNN):简称为卷积网络,指一种使用图像作为输入/输出并通过卷积(滤波器)替换标量权重的神经网络结构。一般而言,具有3个层的卷积神经网络被认为是浅的,而层数大于5或10的网络通常被认为是深的。
低级别特征:指可通过卷积神经网络的浅层(比如第1-3层)提取的特征,有时也称为浅层特征。低级别特征通常对应人类低级视觉中枢所敏感的信息,比如边缘信息等。
重排列(shuffle):指将一组第一尺度的图像通过像素重排列而合并为一个大于第一尺度的第二尺度的图像。举例而言,可以使用s2个大小为h×w的小尺度图像,通过对该s 2个小尺度图像中相同位置的像素按一定顺序进行重排列,以构成一个大小为sh×sW的大尺度图像中的一个s×s的图像块。
应理解,本文中的“多个”指至少两个,即两个或更多个。本文中的图像的“尺度”可以是以分辨率描述的尺度,或者也可以是以其他类似物理量描述的尺度。还应理解,尽管第一、第二等术语在本文中可以用来描述各种装置、特征或部分,但是这些装置、特征或部分不应当由这些术语限制。这些术语仅用来将一个装置、特征或部分与另一个装置、特征或部分相区分。
图1示意性示出了可以应用本公开所提供的技术方案的示例场景100。如图1所示,场景100所示出的系统包括边缘设备110、网络120和云服务130。边缘设备110被部署在整个系统的边缘,可以通过网络120与云服务130通信。相比于云端,边缘设备可以提供更实时快速的响应以及更高的信息安全性,但其存储能力、计算能力以及可用能量(比如电力资源)等往往是十分有限的,因此不适于执行过于复杂的计算处理。
在图1中,边缘设备110被例示为计算机112、移动电话114和电视116。可以理解,这仅仅是示意性表示。实际上,边缘设备110可以是任何可用于执行本公开所提供的图像重建技术方案的边缘设备或边缘设备的组合。举例而言,边缘设备110可以包括用户终端设备及物 联网(IoT)设备等。终端用户设备可包括例如,台式计算机、膝上型计算机、平板电脑、移动电话和其他移动设备、以及可穿戴设备(例如,智能手表、智能眼镜、头戴式耳机)等。IoT设备可以包括能够参与物联网(IoT)系统或网络和/或能够与之通信的任何设备。例如,与交通工具相关联的装备和设备(诸如导航系统、自主驾驶系统等)、与工业制造和生产等相关联的装备、设备和/或基础设施以及智能娱乐系统(例如,电视机、音频系统、电子游戏系统)、智能家庭或办公室系统、安全系统等中的各种设备。
边缘设备110可以被用来实现人工智能算法、技术、产品。按照本公开的实施例,边缘设备110被配置为对输入图像y进行超分辨率重建,以得到尺度放大的重建图像Y。示例性地,输入图像y可以由边缘设备110经由网络从其他设备接收,或者输入图像y也可以是本地存储的图像。示例性地,输入图像y可以是图片或视频中的帧图像,也可以是对图片或视频中的帧图像进行预处理后得到的图像。此外,示例性地,输入图像可以是单通道的灰度图像,也可以是基于多通道图像中的一个或更多个通道获得的图像。比如对于包含YUV三通道图像的彩色图像,输入图像可以是Y通道图像。
在相关技术中,大体而言,图像超分辨率重建可以通过两类方法实现,即插值方法和机器学习方法。前者通过在原图像像素的基础上在像素点之间采用合适的插值算法插入新的元素来实现图像的上采样,例如最近邻插值、双线性插值、均值插值、中值插值等。这类方法并不能带来更多关于原图像的信息,因此超分辨率重建图像的质量将不可避免地受到影响。后者通常通过机器学习模型来实现图像的上采样。这类方法可以学习到关于原图像的更多特征。然而,当前使用的机器学习模型通常具有较为复杂的结构,并因此需要消耗大量计算资源及能量,因而不能在边缘设备上实现这种应用的实时性能。
按照本公开的实施例,提供了一种适合于在各种资源有限的边缘设备上实现图像超分辨率重建的技术方案。其采用更简单的可学习算法,且可以利用大量的边缘AI设备来执行图像增强任务。
图2示意性示出了AI算法中所使用的卷积神经网络200的结构。卷积神经网络200包括输入层210、隐藏层220以及输出层230。输入层210接收4个输入图像。隐藏层220包括3个卷积单元(对应3个 输出特征图)。输出层230包括2个卷积单元,产生2个输出图像。每个具有权重
Figure PCTCN2020129145-appb-000001
的方框对应于一个卷积/滤波器(例如3×3或5×5的核),其中k指示输入层号,i和j分别指示输入和输出单元。偏置
Figure PCTCN2020129145-appb-000002
是加至卷积的输出的标量。几个卷积和偏置相加的结果随后通过激活框。激活框通常对应于整流线性单元(ReLU)、sigmoid函数或双曲正切函数等。滤波器和偏置参数可以通过使用一组输入/输出样本图像的训练过程来获得,并且可以被调整以适合一些优化准则,其中优化准则可以根据具体应用来确定。训练完成后,滤波器和偏置在系统操作期间可以是固定的。
图3示意性示出了根据本公开的一些实施例的在边缘设备中进行图像重建的方法300的示例流程图。图像重建可以是例如从低分辨率(LR)图像(例如1080P:1920*1080)到高分辨率(HR)图像(例如4K:3840*2160)的超分辨率重建。在一些实施例中,图像重建可以按照所要求的目标比例因子放大LR图像的尺度来得到重建的HR图像。
在步骤310,从具有第一尺度的输入图像提取低级别特征以生成多个第一特征图。第一特征图相比于输入图像具有按目标比例因子放大的第二尺度。在示例性的从1080P图像重建4K图像的场景中,第一尺度可以是:1920*1080,第二尺度可以是3840*2160。相应地,目标比例因子为2。在一些实施例中,输入图像可以是由边缘设备从其他设备或本地存储器获取的LR图像,其可以包含各种类型的图像,比如风景图像、人物图像、建筑物图像等。在一些实施例中,输入图像可以是单通道的灰度图像。替换地,输入图像也可以是彩色图像的颜色通道的图像,比如对于包含YUV颜色通道的彩色图像,输入图像可以Y通道(也即亮度通道)的图像。示例性地,边缘设备可以直接从待重建的多通道图像中提取亮度通道。例如,亮度通道可以是以诸如YUV、YCbCr、Lab等的颜色空间进行编码的图像中包含的表征亮度的通道,例如YUV、YCbCr中的Y通道以及Lab中的L通道。替换地,边缘设备可以基于待重建的多通道图像中的多个通道得到亮度通道。例如,针对RGB图像,可以基于其R、G、B通道的值将其转换至诸如YUV、YCbCr或Lab等的颜色空间中,进而提取其亮度通道。可选地,也可以通过其他方式获取图像的亮度通道。
当在本公开中使用时,低级别特征是指相对于图像语义特征等高级别特征而言的底层图像细节特征。这样的特征通常对应人类低级视觉中枢所感知的信息。在一些实施例中,低级别特征可以包括图像的边缘特征、纹理特征、小的斑点特征等。在采用卷积神经网络提取图像特征的示例中,低级别特征可以是通过卷积神经网络的浅层(例如第一卷积层)提取的特征。在一个示例中,低级别特征是单隐藏层卷积神经网络输出(也即通过单个卷积层卷积后)的特征。
在步骤320,从输入图像提取低级别特征以生成多个第二特征图。第二特征图具有按同样的目标比例因子放大的第二尺度。该多个第二特征图与该多个第一特征图存在对应关系。在一些实施例中,该多个第二特征图与该多个第一特征图存在一一对应的关系。
在步骤330,基于多个第二特征图生成多个掩模图。每个掩模图用于表征多个第一特征图中相应的第一特征图的特征权重。特征权重可以用来指示特征对于图像重建的重要程度。权重越大表明该特征的重要程度越高。可选地,特征权重可以关联于在图像中人类视觉对于该特征的敏感程度。权重越大表明该特征的视觉敏感程度越高,由此在重建时被更多地增强。
在一些实施例中,通过对该多个第二特征图进行归一化处理来生成该多个掩模图。所述掩模图可以是像素级别的。每个掩模图具有与第一特征图相同的第二尺度且包含用于其对应的第一特征图中每个像素的特征权重。在一个示例中,每个掩模图中的掩模元素表征其对应的第一特征图中相应位置的像素的特征权重。
在步骤340,基于所述多个掩模图和所述多个第一特征图生成多个的中间特征图。所述多个中间特征图均具有所述第二尺度。
在步骤350,基于所述多个中间特征图合成具有所述第二尺度的重建图像。
在一些实施例中,可以按照掩模图所表征的特征权重对多个第一特征图中相应的第一特征图进行加权,以合成具有第二尺度的重建图像。该重建图像是实现了超分辨率重建的图像。在一些实施例中,当掩模图为像素级别时,重建图像中某个特定位置处的图像像素可以通过使该多个掩模图和多个第一特征图中该特定位置处的掩模元素和特征元素分别加权求和来得到。
按照本公开的方案,由于采用了更简单的可学习的算法,所以可以利用已大量投资的边缘设备来执行图像增强任务。与现有的深度学习解决方案相比,由于仅利用图像的低级别特征来实现SR重建,因此所涉及的网络结构较为简单,所需计算量较少,由此实现了可学习系统的更简单的解决方案。
而且,通过在特征元素级别(例如像素级别)对不同通道的特征图赋予不同的权重,使得重要通道的特征元素得以增强,从而保证了重建图像的质量。
应理解,虽然图3按顺序示出了各个步骤,但是这仅仅是示例性的。实际上,这些步骤可以以不同于所示出的顺序执行,例如其中的一些步骤可以以与所示出的顺序相反的顺序执行或并行执行。比如,步骤310可以与步骤320、步骤330以相反顺序执行或并行执行。
图4示出了根据本公开的一些实施例的图像重建模型的示例结构。该图像重建模型400包括第一转置卷积模块410、第二转置卷积模块420、归一化模块430和加权求和模块440。该图像重建模型可以利用卷积神经网络实现。
第一转置卷积模块410用于从输入图像y提取低级别特征以生成多个第一特征图。如图4所示,第一转置卷积模块410可以包括C个特征通道F 1,......,F c(对应C个卷积核),用于对输入图像y分别施加转置卷积以生成C个第一特征图f 1,......,f C。由于所施加的卷积为转置卷积,所以第一特征图f 1,......,f C的尺度相对于输入图像y被放大。可选地,转置卷积所使用的卷积参数,比如卷积核的尺度,至少部分地被基于目标放大比例r来确定。在一些实施例中,为减少计算复杂度,转置卷积可以通过单个转置卷积层实现。换言之,针对每个特征通道,可以仅使用单个卷积核对输入图像进行转置卷积处理,来得到相应第一特征图,也即,图4中的F 1,......,F c分别对应于单个卷积核。示例性地,转置卷积的步长可以被设置为s,卷积核的尺寸可以被设置为rk×rk,其中r表示重建图像Y相比于输入图像y的目标放大比例,k表示正整数,诸如1、2等。此外,特征通道的个数C可以根据实际需要选择,例如4个、8个、16个等。
第二转置卷积模块420用于从输入图像y提取低级别特征以生成多个第二特征图。如图4所示,类似地,第二转置卷积模块420可以 包括C个特征通道M 1,......,M c,用于对输入图像y分别施加转置卷积以生成C个第二特征图m 1’,......,m C’。这里的C个特征通道与第一转置卷积模块410中的C的特征通道可以是一一对应的。由于所施加的卷积为转置卷积,所以第二特征图m 1’,......,m C’的尺度相对于输入图像y被放大。可选地,转置卷积所使用的卷积参数,比如卷积核的尺度,可以与第一转置卷积模块410中的转置卷积相对应的确定。在一些实施例中,为减少计算复杂度,转置卷积可以通过单个转置卷积层实现。换言之,针对每个特征通道,可以仅使用单个卷积核对输入图像进行转置卷积处理,来得到相应第二特征图,也即,图4中的M 1,......,M c分别对应于单个卷积核。
在一些实施例中,在特征通道M 1,......,M c中施加的转置卷积为启用偏置的转置卷积,即在各通道的卷积输出加上相应的偏置,如图中所示的b 1,......,b C。通过启用偏置,增加了可学习参数的数量,因而可以增强对不同类型图像的适配效果。这可以有助于提高图像重建的质量。
归一化模块430用于基于多个第二特征图生成多个掩模图。在该示例中,可以通过归一化函数(例如SOFTMAX函数)对该多个第二特征图进行归一化运算来生成该多个掩模图m 1,......,m C
图5示意性示出了按照本公开实施例的归一化处理的示例过程。为简化说明,图5中仅示出了4个2*2的第二特征图m 1’,m 2’,m 3’,m 4’。
如图5所示,可以基于多个第二特征图510中对应位置处的多个特征元素形成特征元素组。举例而言,可以利用第二特征图中位置在第i行第j列(诸如图中所示左上角位置)的4个特征元素m 1’(i,j)=a1,m 2’(i,j)=a2,m 3’(i,j)=a3,m 4’(i,j)=a4组成对应图像位置第i行第j列的一个特征元素组(a1,a2,a3,a4),i=1,2,3,4,j=1,2,3,4。随后,可以按照特征元素组对组中的特征元素执行归一化处理来得到归一化的特征元素组(b1,b2,b3,b4)。举例而言,可以如图5所示,将特征元素组a作为4维向量输入到SOFTMAX模块,通过使用SOFTMAX函数对各向量元素进行归一化运算而生成归一化的特征元素组b。替换地,也可以通过任何其他适用的归一化函数来进行归一化处理,例如Sigmoid函数、反正切函数等。
对应图像位置第i行第j列的归一化的特征元素组(b1,b2,b3,b4)中的各元素用作为掩模图m 1,m 2,m 3,m 4中相应位置(即第i行第j列)处的元素。举例而言,掩模图m 1,m 2,m 3,m 4中位置在第i行第j列的4个特征元素m 1(i,j),m 2(i,j),m 3(i,j),m 4(i,j)被分别设置为归一化的特征元素b1,b2,b3,b4,即:m 1(i,j)=b1,m 2(i,j)=b2,m 3(i,j)=b3,m 4(i,j)=b4。
应理解,虽然图5示出了四个尺度为2*2的第二特征图,也即对应于四个特征通道,但这仅仅是示例性的,实际上可以根据需要定义其他数量的具有其他尺度的特征通道。
返回图4,加权求和模块440按照特征权重对多个第一特征图中相应的第一特征图进行加权,以合成具有第二尺度的重建图像Y。如图4所示,第一特征图f 1,......,f C可以分别与相应的掩模图m 1,......,m C相乘,这里的相乘可以指第一特征图中的各元素f c(i,j)分别与相应掩模图中的相应位置的对应元素m c(i,j)相乘,也即基于掩模图中的各特征权重对相应第一特征图中的相同位置处的对应元素加权。这里,c表示第c个特征通道的第一特征图,i,j表示图中第i行第j列。相乘之后的各特征通道的结果可以被求和,例如对应位置处的元素直接相加,来得到重建图像Y,即:
Figure PCTCN2020129145-appb-000003
在一些实施例中,相乘之后的各特征通道的结果可以通过如下方式来得到重建图像Y,即
Figure PCTCN2020129145-appb-000004
其中k c表示相应特征通道的求和权重,该权重可以与其他模型参数共同训练得到。
按照本公开的方案,简化深度学习的组成结构,使得每一层都有简单且明确的功能,因而减少了计算成本,加快了运行速度。公开的方案可以被视为一种“即时”进行重建的方案。
图6示出了根据本公开的一些实施例的图像重建模型的另一示例结构。该图像重建模型600包括第一卷积模块610、第二卷积模块620、归一化模块630和加权求和模块640。
第一卷积模块610也可以视为包括C个特征通道,但在获取各特征通道的第一特征图的过程中,每个特征通道又包括s 2个子通道。如图6所示,输入图像y可以被输入至C*s 2个子通道,即图中所示的F 1
Figure PCTCN2020129145-appb-000005
其中每s 2个子通道(对应s 2个卷积核)(例如F 1
Figure PCTCN2020129145-appb-000006
Figure PCTCN2020129145-appb-000007
Figure PCTCN2020129145-appb-000008
等)可以构成一个子通道组,并可以视为对应于C个特征通道中 的一个特征通道。可选地,子通道的数量s可以至少部分地被基于目标放大比例r来确定,举例而言,可以将s设置为等于r。
在一些实施例中,为减少计算复杂度,第一卷积模块610可以包括单个卷积层来实现卷积处理。换言之,针对每个特征通道,可以仅使用单个卷积核对输入图像进行卷积处理,来得到相应第一初始特征图,也即,图6中的F 1
Figure PCTCN2020129145-appb-000009
分别对应于单个卷积核。示例性地,卷积步长可以被设置为1,卷积核的尺寸可以被设置为k×k。F 1
Figure PCTCN2020129145-appb-000010
的输出为C个第一初始特征图组,每个第一初始特征图组对应一个子通道组,包括通过该子通道组中s 2个子通道输出的s 2个第一初始特征图,其中每个第一初始特征图具有与输入图像相同的第一尺度。
针对每个第一初始特征图组,可以对其所包含的多个第一初始特征图进行重排列而生成相应的第一特征图。例如,在第一卷积模块610中,每个第一初始特征图组所包括的s 2个第一初始特征图可以被重排列模块重排列为单个第一特征图。具体而言,s 2个第一初始特征图中的各第一初始特征图的对应位置的特征元素可以按照预定顺序进行重排列,以形成第一特征图中的对应位置处的一个图块。
图7示意性示出了按照本公开实施例的重排列过程。如图7所示,第一初始特征图组f 1’,f 2’,f 3’,f 4’中第1行第1列的4个元素a1,a2,a3,a4被用于形成第一特征图中第1个图块,这4个元素被顺序排列到该第1图块所包含的4个元素位置。类似地,第一初始特征图组中各第一初始特征图的第i行第j列的元素被用于形成第一特征图中相应位置处的图块,这些元素被顺序安排至该图块包含的元素位置。
尽管图7中仅示出了4个尺度为2*2的输入图像和尺度为4*4的重建图像,但这仅仅是示例性的,上述重排列过程也可以按照重建方案的要求以其他方式进行。
返回图6,第二卷积模块620包括与第一卷积模块610的特征通道一一对应的C个特征通道。在获取各特征通道的第二特征图的过程中,每个特征通道又包括s 2个子通道。在一些实施例中,这些子通道也可以与第一卷积模块610的特征通道所包括的子通道是一一对应的。如图6所示,输入图像y可以被输入至C*s 2个子通道,即图中所示的M 1
Figure PCTCN2020129145-appb-000011
其中每s 2个子通道(例如M 1
Figure PCTCN2020129145-appb-000012
Figure PCTCN2020129145-appb-000013
Figure PCTCN2020129145-appb-000014
等)可以 构成一个子通道组。M 1
Figure PCTCN2020129145-appb-000015
与F 1
Figure PCTCN2020129145-appb-000016
可以是以一一对应的。类似地,第二卷积模块620所使用的卷积参数,比如子通道的数量,同样可以至少部分地被基于目标放大比例来确定。在一些实施例中,为减少计算复杂度,第二卷积模块620可以通过单个卷积层实现。换言之,针对每个特征通道,可以仅使用单个卷积核对输入图像进行卷积处理,来得到相应第二初始特征图,也即,图6中的M 1
Figure PCTCN2020129145-appb-000017
分别对应于单个卷积核。示例性地,卷积步长可以被设置为1,卷积核的尺寸可以被设置为k×k。M 1
Figure PCTCN2020129145-appb-000018
的输出为C个第二初始特征图组,每个第二初始特征图组包括s 2个第二初始特征图。
针对每个第二初始特征图组,其中的s 2个第二初始特征图可以通过重排列模块重排列为单个第二特征图。重排列过程可以与第一卷积模块610所采用的相同。
可选地,除了M 1
Figure PCTCN2020129145-appb-000019
之外,第二卷积模块620可以包括一个或多个另外的二维卷积层,这将使得掩模的决策更准确,从而有助于获得更好的重建图像的质量。
归一化模块630和加权求和模块640可以以与图4中的归一化模块430和加权求和模块440相同的方式工作。
应理解,图3所示的方法可以借助于图4-7所示的图像重建模型来实施,也可以类似地借助于其他任何适用的图像重建模型来实施。
按照本公开实施例的图像重建模型利用了卷积神经网络,其是基于学习的,可以通过训练而得到更好的重建效果。
图8示出了按照本公开实施例的用以训练图像重建模型的训练方法800。该训练方法可以考虑常规的保真度(失真)度量L L1(Y,r)=E[|Y-r|]或L L2(Y,r)=E[(Y-r) 2],其中E表示由一组训练样本上的平均值近似的期望值,r表示参考图像,Y表示当输入参考图像对应的退化图像y时所得到的输出图像。图像重建模型可以是图4-7所示形式的模型。
可以基于一组参考图像生成分辨率降低的退化图像作为输入图像。退化图像可以具有第一尺度,参考图像可以具有第二尺度,第二尺度和第一尺度的比值可以是目标放大比例。该目标放大比例可以根据实际需要选择,例如可以选择为2、3、4等。可选地,生成退化图像可以通过各种方式实现,比如下采样、池化、滤波等。在一个示例 中,可以对参考图像中的相邻若干像素的值求均值作为退化图像中相应位置的像素值。可选地,还可以向退化图像添加噪声。在一些实施例中,上述处理可以针对参考图像的亮度通道图像进行。
对于每个退化图像,可以将其与相应的参考图像组合在一起,作为一个训练样本。由此,可以基于参考图像和所生成的退化图像构建训练样本集。
在步骤810,可以将训练样本集中的训练数据输入到图像重建模型中,得到相应输出图像。训练数据包括退化图像和参考图像。
在步骤820,可以根据每个输出图像及对应参考图像基于预定的损失函数计算图像重建模型的总损失。举例而言,该总损失可以为L1损失或L2损失,也即所采用的损失函数为:L L1(Y,r)=E[|Y-r|]或L L2(Y,r)=E[(Y-r) 2],其中E表示由一组训练样本上的平均值近似的期望值,r表示参考图像,Y表示当输入参考图像对应的退化图像y时所得到的输出图像。L1和L2分别表示L1范数和L2范数。
在步骤830,可以根据总损失对图像重建模型的参数进行调整以得到经训练的图像重建模型。所述参数调整将使得总损失在训练过程中趋于最小化。上述过程可以迭代多次,并且可选地,当总损失降低到预定阈值范围内或基本不再变化时,可以认为训练完成。
在一些实施例中,在训练过程中,可以通过预设机制令图像重建模型中的不同特征通道关注不同类型的特征,例如不同梯度的图像特征等,以便在后续使用过程中,经训练的图像重建模型可以基于不同特征通道更好地提取输入图像中的特征,从而有助于提升重建图像的图像质量。
通过使用公开可得的数据集的图像和视频帧对按照本公开实施例的图像重建方案进行评估表明,考虑2倍超分的模型参数量、运行速度和运行功率,按照本公开实施例的方案相比于现有的已知方案,在运行速度上有较大的提升,而且运行功率更小。在小模型情况下,甚至可以达到120FHD/s。这里FHD是指1080P的图像帧。在重建的超分图像质量和运行速度方面,按照本公开的方案在中等图像质量(例如30db)的情况下,速度亦有较大提升,可以达到4K图像的60FPS(60帧/秒)的输出。
图9示意性示出了根据本公开的一些实施例的在边缘设备中进行 图像重建的装置900的示例结构。如图9所示,图像重建装置900包括第一特征提取模块910、第二特征提取模块920、掩模图生成模块930和合成模块940。
具体而言,第一特征提取模块910可以被配置为从第一尺度的输入图像提取低级别特征以生成多个第一特征图,第一特征图相比于输入图像具有按目标比例因子放大的第二尺度;第二特征提取模块920可以被配置为从输入图像提取低级别特征以生成多个第二特征图,第二特征图具有第二尺度;掩模图生成模块可以被配置为基于多个第二特征图生成多个掩模图,每个掩模图具有第二尺度且表征多个第一特征图中相应的第一特征图的特征权重;合成模块940可以被配置为按照特征权重对多个第一特征图中相应的第一特征图进行加权,以合成具有第二尺度的重建图像。
图像重建装置900可以部署在图1所示的边缘设备110上。应理解,图像重建装置900可以以软件、硬件或软硬件相结合的方式实现。多个不同模块可以在同一软件或硬件结构中实现,或者一个模块可以由多个不同的软件或硬件结构实现。
此外,图像重建装置900可以用于实施前文所描述的图像重建方法,其相关细节已经在前文中详细描述,为简洁起见,在此不再重复。图像重建装置900可以具有与关于前述图像重建方法描述的相同的特征和优势。
图10示意性示出了根据本公开的一些实施例的电子设备的示例架构图。例如其可以代表图1中的边缘设备110。
如图所示,示例电子设备1000包括彼此通信耦合的处理系统1001、一个或多个计算机可读介质1002以及一个或多个I/O接口1003。尽管未示出,但是电子设备1000还可以包括将各种组件彼此耦合的系统总线或其他数据和命令传送系统。系统总线可以包括不同总线结构的任何一个或组合,所述总线结构可以是诸如存储器总线或存储器控制器、外围总线、通用串行总线和/或利用各种总线架构中的任何一种的处理器或局部总线,或者还可以包括诸如控制和数据线。
处理系统1001代表使用硬件执行一个或多个操作的功能。因此,处理系统1001被图示为包括可被配置为处理器、功能块等的硬件元件1004。这可以包括在硬件中实现专用集成电路或使用一个或多个半导 体形成的其它逻辑器件。硬件元件1004不受其形成材料或其中采用的处理机构的限制。例如,处理器可以由(多个)半导体和/或晶体管(例如,电子集成电路(IC))组成。在这样的上下文中,处理器可执行指令可以是电子可执行指令。
计算机可读介质1002被图示为包括存储器/存储装置1005。存储器/存储装置1005表示与一个或多个计算机可读介质相关联的存储器/存储装置。存储器/存储装置1005可以包括易失性存储介质(诸如随机存取存储器(RAM))和/或非易失性存储介质(诸如只读存储器(ROM)、闪存、光盘、磁盘等)。存储器/存储装置1005可以包括固定介质(例如,RAM、ROM、固定硬盘驱动器等)以及可移动介质(例如,闪存、可移动硬盘驱动器、光盘等)。示例性地,存储器/存储装置1005可以用于存储上文实施例中提及的各种图像数据等。计算机可读介质1002可以以下面进一步描述的各种其他方式进行配置。
一个或多个输入/输出接口1003代表允许用户向电子设备1000键入命令和信息并且还允许使用各种输入/输出设备将信息呈现给用户和/或发送给其他组件或设备的功能。输入设备的示例包括键盘、光标控制设备(例如,鼠标)、麦克风(例如,用于语音输入)、扫描仪、触摸功能(例如,被配置为检测物理触摸的容性或其他传感器)、相机(例如,可以采用可见或不可见的波长(诸如红外频率)将不涉及触摸的运动检测为手势)、网卡、接收机等等。输出设备的示例包括显示设备(例如,监视器或投影仪)、扬声器、打印机、触觉响应设备、网卡、发射机等。示例性地,在上文描述的实施例中,可以通过输入设备接收输入图像或待重建的图像,可以通过输出设备呈现重建后的图像。
电子设备1000还包括图像重建策略1006。图像重建策略1006可以作为计算程序指令存储在存储器/存储装置1005中。图像重建策略1006可以连同处理系统1001等一起实现关于图9描述的图像重建装置900的各个模块的全部功能。
本文可以在软件、硬件、元件或程序模块的一般上下文中描述各种技术。一般地,这些模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、元素、组件、数据结构等。本文所使用的术语“模块”、“功能”等一般表示软件、固件、硬件或其组合。本文 描述的技术的特征是与平台无关的,意味着这些技术可以在具有各种处理器的各种计算平台上实现。
所描述的模块和技术的实现可以存储在某种形式的计算机可读介质上或者跨某种形式的计算机可读介质传输。计算机可读介质可以包括可由电子设备1000访问的各种介质。作为示例而非限制,计算机可读介质可以包括“计算机可读存储介质”和“计算机可读信号介质”。
与单纯的信号传输、载波或信号本身相反,“计算机可读存储介质”是指能够持久存储信息的介质和/或设备,和/或有形的存储装置。因此,计算机可读存储介质是指非信号承载介质。计算机可读存储介质包括诸如易失性和非易失性、可移动和不可移动介质和/或以适用于存储信息(诸如计算机可读指令、数据结构、程序模块、逻辑元件/电路或其他数据)的方法或技术实现的存储设备之类的硬件。计算机可读存储介质的示例可以包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字通用盘(DVD)或其他光学存储装置、硬盘、盒式磁带、磁带,磁盘存储装置或其他磁存储设备,或其他存储设备、有形介质或适于存储期望信息并可以由计算机访问的制品。
“计算机可读信号介质”是指被配置为诸如经由网络将指令发送到电子设备1000的硬件的信号承载介质。信号介质典型地可以将计算机可读指令、数据结构、程序模块或其他数据体现在诸如载波、数据信号或其它传输机制的调制数据信号中。信号介质还包括任何信息传递介质。作为示例而非限制,信号介质包括诸如有线网络或直接连线的有线介质以及诸如声、RF、红外和其它无线介质的无线介质。
如前所述,硬件元件1001和计算机可读介质1002代表以硬件形式实现的指令、模块、可编程器件逻辑和/或固定器件逻辑,其在一些实施例中可以用于实现本文描述的技术的至少一些方面。硬件元件可以包括集成电路或片上系统、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、复杂可编程逻辑器件(CPLD)以及硅中的其它实现或其他硬件设备的组件。在这种上下文中,硬件元件可以作为执行由硬件元件所体现的指令、模块和/或逻辑所定义的程序任务的处理设备,以及用于存储用于执行的指令的硬件设备,例如,先前描述的计算机可读存储介质。
前述的组合也可以用于实现本文所述的各种技术和模块。因此,可以将软件、硬件或程序模块和其它程序模块实现为在某种形式的计算机可读存储介质上和/或由一个或多个硬件元件1001体现的一个或多个指令和/或逻辑。电子设备1000可以被配置为实现与软件和/或硬件模块相对应的特定指令和/或功能。因此,例如通过使用处理系统的计算机可读存储介质和/或硬件元件1001,可以至少部分地以硬件来实现将模块实现为可由电子设备1000作为软件执行的模块。指令和/或功能可以由例如一个或多个电子设备1000和/或处理系统1001执行/可操作以实现本文所述的技术、模块和示例。
本文描述的技术可以由电子设备1000的这些各种配置来支持,并且不限于本文所描述的技术的具体示例。
应当理解,为清楚起见,参考不同的功能单元对本公开的实施例进行了描述。然而,将明显的是,在不偏离本公开的情况下,每个功能单元的功能性可以被实施在单个单元中、实施在多个单元中或作为其它功能单元的一部分被实施。例如,被说明成由单个单元执行的功能性可以由多个不同的单元来执行。因此,对特定功能单元的参考仅被视为对用于提供所描述的功能性的适当单元的参考,而不是表明严格的逻辑或物理结构或组织。因此,本公开可以被实施在单个单元中,或者可以在物理上和功能上被分布在不同的单元和电路之间。
通过研究附图、公开内容和所附的权利要求书,本领域技术人员在实践所要求保护的主题时,能够理解和实现对于所公开的实施例的变型。在权利要求书中,词语“包括”不排除其他元件或步骤,并且“一”或“一个”不排除多个。在相互不同的从属权利要求中记载某些措施的纯粹事实并不表明这些措施的组合不能用来获利。

Claims (17)

  1. 一种用于边缘设备的图像重建方法,所述方法包括:
    从第一尺度的输入图像提取低级别特征以生成多个第一特征图,所述多个第一特征图相比于所述输入图像均具有大于所述第一尺度的第二尺度;
    从所述输入图像提取低级别特征以生成多个第二特征图,所述多个第二特征图均具有所述第二尺度;
    基于所述多个第二特征图生成多个掩模图;
    基于所述多个掩模图和所述多个第一特征图生成多个中间特征图,所述多个中间特征图均具有所述第二尺度;
    基于所述多个中间特征图合成具有所述第二尺度的重建图像。
  2. 根据权利要求1所述的方法,其中所述从所述输入图像提取低级别特征以生成多个第一特征图包括:
    对所述输入图像进行转置卷积处理以生成所述多个第一特征图;以及
    其中所述从所述输入图像提取低级别特征以生成多个第二特征图包括:
    对输入图像进行转置卷积处理以生成所述多个第二特征图。
  3. 根据权利要求2所述的方法,其中所述对输入图像进行转置卷积处理以生成所述多个第一特征图包括:使用单个转置卷积层对输入图像进行处理以生成所述多个第一特征图;以及
    所述对输入图像进行转置卷积处理以生成所述多个第二特征图包括:使用单个转置卷积层对输入图像进行处理以生成所述多个第二特征图。
  4. 根据权利要求1所述的方法,其中所述从所述输入图像提取低级别特征以生成多个第一特征图包括:
    对所述输入图像进行卷积处理以生成多个第一初始特征图组,每个第一初始特征图组包括多个第一初始特征图,且每个第一初始特征图具有所述第一尺度;
    对于每个第一初始特征图组,通过对其所包含的多个第一初始特征图进行重排列而生成相应的第一特征图,以及
    其中所述从所述输入图像提取低级别特征以生成多个第二特征图包括:
    对所述输入图像进行卷积处理以生成多个第二初始特征图组,每个第二初始特征图组包括多个第二初始特征图,且每个第二初始特征图具有所述第一尺度;
    对于每个第二初始特征图组,通过对其所包含的多个第二初始特征图进行重排列而生成相应的第二特征图。
  5. 根据权利要求4所述的方法,其中所述对所述输入图像进行卷积处理以生成多个第一初始特征图组包括:使用单个卷积层对输入图像进行卷积,以生成所述多个第一初始特征图组;
    所述对输入图像进行卷积处理以生成多个第二初始特征图组包括:使用单个卷积层对输入图像进行卷积,以生成所述多个第二初始特征图组。
  6. 根据权利要求4所述的方法,其中所述对所述输入图像进行卷积处理以生成多个第一初始特征图组包括:使用单个卷积层对输入图像进行卷积,以生成所述多个第一初始特征图组;
    所述对输入图像进行卷积处理以生成多个第二初始特征图组包括:使用第一卷积层和第二卷积层对输入图像进行卷积,以生成所述多个第二初始特征图组。
  7. 根据权利要求4所述的方法,其中每个第一初始特征图组中包含的第一初始特征图的数量取决于所述第二尺度与所述第一尺度的比例因子,每个第二初始特征图组中包含的第二初始特征图的数量取决于所述第二尺度与所述第一尺度的比例因子。
  8. 根据权利要求1所述的方法,其中所述多个掩模图与所述多个第一特征图存在对应关系,且每个掩模图表征所述多个第一特征图中对应的第一特征图的特征权重;
    其中基于所述多个掩模图和所述多个第一特征图生成多个中间特征图包括:基于每个掩模图所表征的特征权重对对应的第一特征图加权来生成多个所述中间特征图。
  9. 根据权利要求1所述的方法,其中所述第一特征图、所述第二特征图以及所述掩模图具有相同的数量,且所述掩模图具有所述第二尺度。
  10. 根据权利要求1-7中任一项所述的方法,其中所述基于所述多个第二特征图生成多个掩模图包括:
    基于所述多个第二特征图中对应位置处的多个特征元素形成多个特征元素组;
    分别对各特征元素组中的特征元素执行归一化处理,且基于归一化的特征元素生成对应的掩模图。
  11. 根据权利要求1-8中任一项所述的方法,其中所述掩模图为像素级别的,每个掩模图具有所述第二尺度且包含用于其对应的第一特征图中每个像素的特征权重。
  12. 根据权利要求1-9中任一项所述的方法,其中所述低级别特征包含所述输入图像的纹理特征、边缘特征和斑点特征中的至少一个。
  13. 根据权利要求1-9中任一项所述的方法,其中所述输入图像为待重建彩色图像的亮度通道图像。
  14. 根据权利要求1所述的方法,其中使用经训练的图像重建模型执行所述方法的步骤以将所述输入图像重建为超分辨率图像。
  15. 根据权利要求11所述的方法,其中所述图像重建模型是通过以下步骤训练得到的:
    将由参考图像和退化图像组成的训练样本集中的训练数据输入到所述图像重建模型中,得到输出图像;
    根据输出图像和参考图像基于预定的损失函数得到所述图像重建模型的总损失;
    根据所述总损失对原始模型进行所述图像重建模型的参数调整以得到经训练的图像重建模型。
  16. 一种电子设备,包括:
    存储器,其被配置成存储计算机可执行指令;
    处理器,其被配置成当所述计算机可执行指令被处理器执行时执行如权利要求1-15中的任一项所述的方法。
  17. 一种非易失性计算机可读存储介质,其存储有计算机可执行指令,当所述计算机可执行指令被执行时,执行如权利要求1-15中的任一项所述的方法。
PCT/CN2020/129145 2020-11-16 2020-11-16 图像重建方法、电子设备和计算机可读存储介质 WO2022099710A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/433,178 US11893710B2 (en) 2020-11-16 2020-11-16 Image reconstruction method, electronic device and computer-readable storage medium
CN202080002800.0A CN114830168A (zh) 2020-11-16 2020-11-16 图像重建方法、电子设备和计算机可读存储介质
PCT/CN2020/129145 WO2022099710A1 (zh) 2020-11-16 2020-11-16 图像重建方法、电子设备和计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/129145 WO2022099710A1 (zh) 2020-11-16 2020-11-16 图像重建方法、电子设备和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022099710A1 true WO2022099710A1 (zh) 2022-05-19

Family

ID=81602069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129145 WO2022099710A1 (zh) 2020-11-16 2020-11-16 图像重建方法、电子设备和计算机可读存储介质

Country Status (3)

Country Link
US (1) US11893710B2 (zh)
CN (1) CN114830168A (zh)
WO (1) WO2022099710A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116702876B (zh) * 2023-04-27 2024-04-12 贵州大学 一种基于预处理的图像对抗防御方法
CN117788477B (zh) * 2024-02-27 2024-05-24 贵州健易测科技有限公司 一种用于对茶叶卷曲度自动量化的图像重建方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458758A (zh) * 2019-07-29 2019-11-15 武汉工程大学 一种图像超分辨率重建方法、系统及计算机存储介质
US20190378242A1 (en) * 2018-06-06 2019-12-12 Adobe Inc. Super-Resolution With Reference Images
CN110770784A (zh) * 2017-06-21 2020-02-07 佳能株式会社 图像处理装置、成像装置、图像处理方法、程序、以及存储介质

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9154698B2 (en) * 2013-06-19 2015-10-06 Qualcomm Technologies, Inc. System and method for single-frame based super resolution interpolation for digital cameras
CN108133456A (zh) * 2016-11-30 2018-06-08 京东方科技集团股份有限公司 人脸超分辨率重建方法、重建设备以及计算机系统
CN109997168B (zh) * 2017-02-24 2023-09-12 渊慧科技有限公司 用于生成输出图像的方法和系统
KR102017997B1 (ko) * 2018-01-16 2019-09-03 한국과학기술원 특징맵 압축을 이용한 이미지 처리 방법 및 장치
CN109035260A (zh) * 2018-07-27 2018-12-18 京东方科技集团股份有限公司 一种天空区域分割方法、装置和卷积神经网络
US10304193B1 (en) * 2018-08-17 2019-05-28 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
BR112020022560A2 (pt) * 2018-09-30 2021-06-01 Boe Technology Group Co., Ltd. aparelho e método para processamento de imagens e sistema para rede neural de treinamento
US11037051B2 (en) * 2018-11-28 2021-06-15 Nvidia Corporation 3D plane detection and reconstruction using a monocular image
CN110415170B (zh) * 2019-06-24 2022-12-16 武汉大学 一种基于多尺度注意力卷积神经网络的图像超分辨率方法
KR20210051242A (ko) * 2019-10-30 2021-05-10 삼성전자주식회사 멀티 렌즈 영상 복원 장치 및 방법
CN110827200B (zh) * 2019-11-04 2023-04-07 Oppo广东移动通信有限公司 一种图像超分重建方法、图像超分重建装置及移动终端
US11501415B2 (en) * 2019-11-15 2022-11-15 Huawei Technologies Co. Ltd. Method and system for high-resolution image inpainting
US11544815B2 (en) * 2019-11-18 2023-01-03 Advanced Micro Devices, Inc. Gaming super resolution
CN111091576B (zh) * 2020-03-19 2020-07-28 腾讯科技(深圳)有限公司 图像分割方法、装置、设备及存储介质
WO2021196070A1 (en) * 2020-04-01 2021-10-07 Boe Technology Group Co., Ltd. Computer-implemented method, apparatus, and computer-program product
CN113591872A (zh) * 2020-04-30 2021-11-02 华为技术有限公司 一种数据处理系统、物体检测方法及其装置
CN111931781A (zh) * 2020-08-10 2020-11-13 商汤集团有限公司 图像处理方法及装置、电子设备和存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110770784A (zh) * 2017-06-21 2020-02-07 佳能株式会社 图像处理装置、成像装置、图像处理方法、程序、以及存储介质
US20190378242A1 (en) * 2018-06-06 2019-12-12 Adobe Inc. Super-Resolution With Reference Images
CN110458758A (zh) * 2019-07-29 2019-11-15 武汉工程大学 一种图像超分辨率重建方法、系统及计算机存储介质

Also Published As

Publication number Publication date
CN114830168A (zh) 2022-07-29
US11893710B2 (en) 2024-02-06
US20220351333A1 (en) 2022-11-03

Similar Documents

Publication Publication Date Title
CN111311629B (zh) 图像处理方法、图像处理装置及设备
Lv et al. Attention guided low-light image enhancement with a large scale low-light simulation dataset
US11537873B2 (en) Processing method and system for convolutional neural network, and storage medium
CN109712203B (zh) 一种基于自注意力生成对抗网络的图像着色方法
CN112233038B (zh) 基于多尺度融合及边缘增强的真实图像去噪方法
US11954822B2 (en) Image processing method and device, training method of neural network, image processing method based on combined neural network model, constructing method of combined neural network model, neural network processor, and storage medium
WO2019120110A1 (zh) 图像重建方法及设备
US20220222776A1 (en) Multi-Stage Multi-Reference Bootstrapping for Video Super-Resolution
CN109934792B (zh) 电子装置及其控制方法
CN110738605A (zh) 基于迁移学习的图像去噪方法、系统、设备及介质
TWI768323B (zh) 影像處理裝置以及其影像處理方法
US10198801B2 (en) Image enhancement using self-examples and external examples
US20210256304A1 (en) Method and apparatus for training machine learning model, apparatus for video style transfer
KR102616700B1 (ko) 영상 처리 장치 및 그 영상 처리 방법
WO2022134971A1 (zh) 一种降噪模型的训练方法及相关装置
TWI769725B (zh) 圖像處理方法、電子設備及電腦可讀儲存介質
CN110223304B (zh) 一种基于多路径聚合的图像分割方法、装置和计算机可读存储介质
WO2022099710A1 (zh) 图像重建方法、电子设备和计算机可读存储介质
WO2023000895A1 (zh) 图像风格转换方法、装置、电子设备和存储介质
CN113034358A (zh) 一种超分辨率图像处理方法以及相关装置
Zhai et al. An effective deep network using target vector update modules for image restoration
CN115035011A (zh) 一种融合策略下自适应RetinexNet的低照度图像增强方法
Huang et al. Bootstrap diffusion model curve estimation for high resolution low-light image enhancement
Liu et al. Arbitrary-scale super-resolution via deep learning: A comprehensive survey
Xu et al. Attention‐based multi‐channel feature fusion enhancement network to process low‐light images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961264

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/09/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20961264

Country of ref document: EP

Kind code of ref document: A1