WO2022057868A1 - 图像超分方法和电子设备 - Google Patents

图像超分方法和电子设备 Download PDF

Info

Publication number
WO2022057868A1
WO2022057868A1 PCT/CN2021/118901 CN2021118901W WO2022057868A1 WO 2022057868 A1 WO2022057868 A1 WO 2022057868A1 CN 2021118901 W CN2021118901 W CN 2021118901W WO 2022057868 A1 WO2022057868 A1 WO 2022057868A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution
image
target
weight matrix
feature map
Prior art date
Application number
PCT/CN2021/118901
Other languages
English (en)
French (fr)
Inventor
张璐
林焕
胡康康
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21868690.5A priority Critical patent/EP4207051A4/en
Publication of WO2022057868A1 publication Critical patent/WO2022057868A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the technical field of image processing, and in particular, to an image super-segmentation method and an electronic device.
  • Image super-resolution refers to the restoration of a high-resolution (HR) image or image sequence from a low-resolution (LR) image or image sequence.
  • Learning-based SR method is a hot topic in recent years. The basic idea is to calculate the mapping relationship between LR images and HR images in the training set based on the given training image set. Since the deep convolution model has the ability to obtain high-level abstract information of the image, this mapping relationship is usually realized by the deep convolution model.
  • the embodiment of the present application provides an image super-segmentation method, which is used to convert a low-resolution image into a high-resolution image display. Since the weight matrix in this method fuses pixel position offset information and texture features, it is possible to avoid only The image quality loss caused by interpolation processing based on pixel position offset information can improve the image quality of high-resolution images.
  • a first aspect of the embodiments of the present application provides an image super-resolution method, including: a terminal acquiring pixel position offset information according to an image of a first resolution and a target resolution, where the target resolution is greater than the first resolution ; The terminal performs feature extraction on the image of the first resolution to obtain a feature map containing texture features; the terminal fuses the pixel position offset information and the feature map to obtain target weights matrix; the terminal acquires the image of the target resolution according to the target weight matrix.
  • the terminal can obtain an image of the original resolution of the application, which is referred to as the first resolution in this embodiment, and can also obtain the target resolution when the image is displayed.
  • the terminal also extracts the texture features of the image of the first resolution, and obtains the target weight matrix by fusing the pixel position offset information and the texture features.
  • the target weight matrix uses The sampling weight of each pixel is provided in the up-sampling operation of the super-division method. Since the target weight matrix in this scheme combines the pixel position offset information and texture features, the high-resolution image is obtained based on the target weight matrix. It avoids the loss of image quality caused by the traditional interpolation operation when only relying on the pixel position offset information to obtain the weight matrix. Therefore, the image quality of the high-resolution image can be improved.
  • the pixel position offset information includes an offset matrix and a magnification ratio
  • the offset matrix is used to indicate the position of each pixel in the image of the first resolution.
  • Sampling offset the magnification ratio is the ratio between the target resolution and the first resolution.
  • the pixel position offset information specifically includes an offset matrix and a magnification ratio, wherein the magnification ratio can be calculated according to the first resolution and the target resolution, and the magnification ratio includes the image of the target resolution and the first resolution ratio.
  • the offset matrix consists of sample offsets for each pixel in the image at the target resolution. Since the target resolution is the resolution actually required by the image, its value is arbitrary, therefore, the magnification ratio can be any multiple, thereby obtaining the pixel position offset information, and further merging the target weight matrix obtained by the texture features, which can be used to achieve Any multiple of overscores.
  • the terminal performs information fusion of the pixel position offset information and the feature map to obtain a target weight matrix including: the terminal offsets the pixel position
  • the information and the feature map are input into a first neural network model to obtain the target weight matrix.
  • the information of the offset matrix, the enlargement ratio and the feature map can be fused to obtain the target weight matrix, which is different from the weight matrix in the existing super-segmentation method that only carries the pixel position.
  • the offset information is different, when a high-resolution image is obtained according to the target weight matrix provided by this method, the image quality loss similar to the traditional interpolation operation can be avoided, and therefore, the image quality of the high-resolution image can be improved.
  • the first neural network model includes a network model composed of a convolutional neural network and a fully connected neural network.
  • the first neural network model may be composed of a convolutional neural network and a fully connected neural network, wherein the convolutional neural network is used to convert the feature map containing texture features, and the fully connected neural network is used to process Pixel position offset information.
  • the terminal performs information fusion of the pixel position offset information and the feature map to obtain a target weight matrix including: the terminal according to the offset matrix and The initial weight matrix is obtained by enlarging the scale; the terminal inputs the initial weight matrix and the feature map into the second neural network model to obtain the target weight matrix.
  • the method of the embodiment of the present application provides another information fusion method, wherein the initial weight matrix obtained according to the offset matrix and the enlargement ratio is the weight matrix used for upsampling in the prior art, and the method may be based on the prior art , input the obtained initial weight matrix and feature map into the pre-trained second neural network model for information fusion, and obtain the target weight matrix used for image upsampling in this scheme, which improves the flexibility of the scheme implementation.
  • the terminal performing feature extraction on the image of the first resolution to obtain a feature map including texture features includes: the terminal extracting the image of the first resolution The image is input to a third neural network model to output the feature map.
  • the feature map may be obtained based on a neural network model, such as an existing convolutional neural network.
  • the acquiring, by the terminal, the image of the target resolution according to the target weight matrix includes: the terminal up-sampling the feature map according to the target weight matrix , to obtain an image at the target resolution.
  • the terminal upsamples the feature map according to the target weight matrix to obtain an image of a target resolution, which is of better quality than an image obtained by upsampling an image of a first resolution.
  • a second aspect of the embodiments of the present application provides a model training method, including: obtaining pixel position offset information according to a training image of a first resolution and a target resolution, and obtaining an initial weight map according to the pixel position offset information; Perform feature extraction on the image of the first resolution to obtain a feature map containing texture features; input the texture features and the initial weight map into a first neural network for training to obtain a first loss value; A loss value updates the weight parameters in the first network to obtain the target neural network.
  • the trained model can be used for image super-resolution, and the pixel position offset information is fused based on the existing initial weight map, and the target network trained by this method can be used for image super-resolution, The quality of the acquired high-resolution images is improved.
  • a third aspect of the embodiments of the present application provides a model training method, including: acquiring pixel position offset information according to a training image of a first resolution and a target resolution; performing feature extraction on the image of the first resolution, to obtain a feature map containing texture features; input the pixel position offset information and the initial weight map into a second neural network for training to obtain a first loss value; update the second network according to the first loss value The weight parameter in to get the target neural network.
  • the trained model can be used for image super-resolution, fusing pixel position offset information and a feature map containing texture features, and the target network trained by this method can be used for image super-resolution, The quality of the acquired high-resolution images is improved.
  • a fourth aspect of the embodiments of the present application provides an image super-division apparatus, including: an acquisition unit configured to acquire pixel position offset information according to an image of a first resolution and a target resolution, where the target resolution is greater than the first resolution a resolution; the obtaining unit is further configured to perform feature extraction on the image of the first resolution to obtain a feature map including texture features; a processing unit is configured to convert the pixel position offset information with the The feature map performs information fusion to obtain a target weight matrix; the obtaining unit is further configured to obtain an image of the target resolution according to the target weight matrix.
  • the pixel position offset information includes an offset matrix and a magnification ratio
  • the offset matrix is used to indicate the position of each pixel in the image of the first resolution.
  • Sampling offset the magnification ratio is the ratio between the target resolution and the first resolution.
  • the processing unit is specifically configured to: input the pixel position offset information and the feature map into a first neural network model to obtain the target weight matrix.
  • the first neural network model includes a network model composed of a convolutional neural network and a fully connected neural network.
  • the processing unit is specifically configured to: obtain an initial weight matrix according to the offset matrix and the enlargement ratio;
  • the initial weight matrix and the feature map are input into a second neural network model to obtain the target weight matrix.
  • the acquiring unit is specifically configured to: input the image of the first resolution into a third neural network model to output the feature map.
  • the obtaining unit is specifically configured to: upsample the feature map according to the target weight matrix to obtain an image of the target resolution.
  • a fifth aspect of the embodiments of the present application provides a computer program product containing instructions, characterized in that, when it runs on a computer, the computer is made to execute any one of the above-mentioned first aspect and various possible implementation manners method described in item.
  • a sixth aspect of the embodiments of the present application provides a computer-readable storage medium, including instructions, characterized in that, when the instructions are executed on a computer, the computer is made to execute the above-mentioned first aspect and various possible implementation manners. The method of any one.
  • a seventh aspect of the embodiments of the present application provides a chip, including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory, so as to execute the method in any possible implementation manner of any of the above aspects.
  • the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or a wire.
  • the chip further includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used for receiving data and/or information to be processed, the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing result through the communication interface.
  • the communication interface may be an input-output interface.
  • some of the one or more processors may also implement some steps in the above method by means of dedicated hardware, for example, the processing involving the neural network model may be performed by a dedicated neural network processor or graphics processor.
  • the method provided by the present application may be implemented by one chip, or may be implemented by multiple chips cooperatively.
  • the embodiments of the present application have the following advantages:
  • the terminal obtains the pixel position offset information according to the image of the first resolution and the target resolution.
  • the terminal also extracts the texture feature of the image of the first resolution, and by fusing the pixel position offset information and the target resolution
  • the texture feature obtains the target weight matrix.
  • the target weight matrix is used to provide the sampling weight of each pixel in the upsampling operation of the superdivision method. Since the target weight matrix in this scheme combines the pixel position offset information and texture features, based on this Obtaining a high-resolution image by using the target weight matrix can avoid image quality loss similar to traditional interpolation operations when obtaining the weight matrix by relying only on pixel position offset information. Therefore, the image quality of high-resolution images can be improved.
  • FIG. 1 is a schematic diagram of an application scenario of an image super-segmentation method
  • FIG. 2 is a schematic diagram of a system architecture of an image super-resolution method in an embodiment of the present application
  • FIG. 3 is a schematic diagram of an embodiment of an image super-resolution method in an embodiment of the present application.
  • 4a is a schematic diagram of an embodiment of a method for extracting feature maps in an embodiment of the present application
  • 4b is a schematic diagram of an embodiment of a feature fusion module in an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another embodiment of the image super-resolution method in the embodiment of the present application.
  • FIG. 6 is a schematic diagram of a product realization form of the image super-resolution method in the embodiment of the application.
  • FIG. 7 is a schematic diagram of a training process of a super-score module in an embodiment of the present application.
  • FIG. 8a is a schematic diagram of a collection process of a training data set in an embodiment of the present application.
  • FIG. 8b is a schematic diagram of an acquisition process of a high-resolution data set and a low-resolution data set in an embodiment of the present application
  • FIG. 8c is a schematic structural diagram of a super-resolution model in an embodiment of the present application.
  • 8d is a schematic diagram of a training process of a super-resolution model in an embodiment of the present application.
  • FIG. 8e is a schematic diagram of an application scenario where the super-resolution model is deployed in a terminal according to an embodiment of the present application
  • FIG. 9 is a schematic diagram of an embodiment of an image super-resolution apparatus in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an embodiment of a terminal in an embodiment of the present application.
  • the embodiment of the present application provides an image super-segmentation method, which is used to convert a low-resolution image into a high-resolution image display. Since the target weight matrix fuses pixel position offset information and texture features, it can avoid relying only on pixels The image quality loss caused by interpolation processing of position offset information improves the image quality of high-resolution images.
  • Image resolution refers to the amount of information stored in the image, which is how many pixels there are in each inch of the image. It is usually expressed by "the number of horizontal pixels ⁇ the number of vertical pixels", or it can be expressed by the specification code.
  • the image resolution is 640*480, which means that the number of horizontal pixels is 640, the number of vertical pixels is 480, and the resolution is 307,200 pixels, which is often referred to as 300,000 pixels.
  • the specification code P means progressive scan, the resolution corresponding to 720P is 1280*720, and similarly, the resolution corresponding to 1080P is 1920*1080.
  • the resolution can also be described by the specification code K.
  • 4K resolution means that the pixel value of each line in the horizontal direction reaches or is close to 4096.
  • Display resolution refers to the physical resolution of the computer monitor itself. For CRT monitors, it refers to the phosphor dots on the screen; for LCD monitors, it refers to the pixels on the display screen.
  • the display resolution is The display is processed and determined during production, which describes the number of pixels of the display itself, which is an inherent and unchangeable value. Display resolution is usually expressed in the form of "horizontal pixels X vertical pixels", such as 800 ⁇ 600, 1024 ⁇ 768, 1280 ⁇ 1024, etc. It can also be expressed by a specification code. Display resolution is very important for display devices. In the case of the same screen size, the higher the resolution, the more delicate the screen, that is, the details of the screen can be presented more clearly, which can greatly increase the user's vision. experience.
  • the screen resolution refers to the resolution used when the image is actually displayed, and the screen resolution can be set according to the user's needs.
  • the upper limit of the screen resolution is limited by the size of the display resolution.
  • Image super resolution refers to the restoration of a high resolution (HR) image or image sequence from a low resolution (low resolution, LR) image or image sequence, often referred to as super point.
  • Upsampling It is an interpolation process, which is applied to digital signal processing. When a series of digital sequences are upsampled, the output result is approximately equal to the sequence obtained by sampling the initial analog signal at a higher sampling rate.
  • Image upsampling Similar to upsampling, it is an interpolation process.
  • a first resolution image can be regarded as a two-dimensional digital matrix. When this matrix is upsampled, the output result is approximately equal to the real world.
  • the image of the analog signal is a digital matrix after sampling at a higher sampling rate, that is, the image of the target resolution.
  • the image upsampling process is described as: the pixels on the target resolution image sample and weight the pixels of the first resolution image, and the detailed process is: according to the pixel points of the high-resolution image
  • the coordinates calculate the sampling position of the corresponding first resolution image, and further calculate the sampling center and offset, take the sampling center and the pixels in the fixed-size neighborhood around it as the pixels to be processed, and then calculate according to the offset.
  • Sampling weight using the sampling weight to weighted and average the pixels to be processed in the neighborhood to obtain the upsampling result. It should be noted that the number of selected pixels in the vicinity of the sampling center can be preset, and the specific value thereof is not limited.
  • Pixel coordinates In an image, the coordinates of the location of the pixel point, and the pixel coordinates must be integers.
  • x_out and y_out respectively represent the coordinates of a pixel on the target resolution image in the length and width directions, satisfying 0 ⁇ x_out ⁇ w_out, 0 ⁇ y_out ⁇ h_out.
  • w_out represents the width of the target resolution
  • h_out represents the length of the target resolution.
  • Sampling position In the process of upsampling an image of the first resolution to obtain the image of the target resolution, the output result of each pixel in the image of the target resolution needs to be sampled from the image of the first resolution.
  • the position of each pixel in the target resolution image undergoes a calculation process related to the magnification, and corresponds to the position in the image of the first resolution, which is the sampling position.
  • the following takes a pixel (x_out, y_out) on the target resolution image as an example to introduce the calculation process of the sampling position of the pixel:
  • sample_pos x is the coordinate of the sampling position in the length direction
  • sample_pos y is the coordinate of the sampling position in the width direction
  • w_in is the pixel width of the first resolution image
  • h_in is the pixel length of the first resolution image
  • w_out represents The pixel width of the target resolution image
  • h_out represents the length of the target resolution.
  • Sampling center The pixel coordinates of the sampling center that can be obtained after the sampling position is rounded.
  • center x is the coordinate of the sampling center in the width direction
  • center y is the coordinate of the sampling center in the length direction
  • Offset or sampling offset, the distance between the sampling position and the sampling center, offset is an important basis for generating sampling weights.
  • the calculation formula of the offset is as follows:
  • offset x represents the offset distance of the sampling position relative to the sampling center in the width direction
  • offset y represents the offset distance of the sampling position relative to the sampling center in the length direction
  • the matrix composed of the offset of each pixel in the target resolution image is the offset matrix (offset matrix).
  • the image super-resolution method provided in the embodiments of the present application can be applied to various electronic devices with display functions, such as mobile terminals, tablets, notebooks, computers, TVs, all-in-one computers, or projectors, etc.
  • the specific types of electronic devices are not limited .
  • the output image content includes various media information or real-time operation pictures, etc., and may be a static image or a dynamic image, that is, a video, and the content of the image for super-resolution display is not limited in this application.
  • the following uses terminal equipment and images as examples for introduction.
  • the screen resolution has been continuously improved. Images are displayed in electronic devices at higher screen resolutions, while image resources are usually at lower native resolutions and often do not fit perfectly with the screen resolution.
  • the original image resolution is 180P, even if the image is enlarged to 540P through 3 times overscore, because the screen resolution of the mobile phone is 1080P when the phone is in landscape orientation, and 720P in portrait orientation , if you need to match the screen resolution, you still need to interpolate and stretch the 540P image, and the picture may have negative effects such as aliasing and blurring.
  • the image superdivision method provided by the embodiment of the present application is used to realize any multiple superdivision, adapt to the terminal device display, and can also reduce the loss of image quality caused by interpolation in the traditional superdivision method, and improve the image quality.
  • Figure 2 shows an example of a super-resolution application scenario of a mobile terminal, including three parts: an application program, a rendering pipeline and a display module.
  • the application program of the mobile terminal sends the image or video frame to be displayed into the rendering pipeline, and the rendering pipeline calls the super-resolution module to enlarge the image and render it, and finally display a high-resolution high-quality image on the display device through the display module.
  • the image super-score method provided by the embodiment of the present application realizes the image over-score through an improved super-score module.
  • the super-resolution module can be implemented by the super-resolution model to super-resolution low-resolution images to the target resolution.
  • image_out SR_model(LR_image,(height LR ,width LR ),(height dst ,width dst ))
  • LR_image refers to the low-resolution image to be super-resolution
  • image_out high-resolution image generated by the super-resolution model
  • (height dst , width dst ) target resolution
  • (height LR , width LR ) original resolution
  • SR_model The super-resolution model.
  • the terminal obtains pixel position offset information according to the image of the first resolution and the target resolution;
  • the image or video frame in the application program of the terminal needs to be displayed at the target resolution on the display screen or display window of the terminal.
  • the target resolution can be the resolution preset in the application program of the terminal, or it can be determined according to the user operation. The usage scenario is determined, and the specific value is not limited. It can be understood that the target resolution is generally higher than the first resolution.
  • the terminal obtains the original image of the application, and the resolution of the original image is the first resolution. Since the first resolution is smaller than the target resolution, the terminal needs to perform super-resolution processing on the image to obtain an image or video frame of the target resolution . Since a video frame is composed of a sequence of images, the image super-segmentation method provided by the embodiment of the present application is introduced by taking an image as an example in the embodiments of the present application and subsequent embodiments.
  • the pixel position offset information includes an offset matrix and an enlargement ratio, wherein the offset matrix is used to indicate the position offset of each pixel in the image of the first resolution when enlarged to the target resolution,
  • the enlargement ratio (scale) is the ratio between the target resolution and the first resolution, including the pixel value of the image of the target resolution in the length direction and the pixel value of the image of the first resolution in the length direction. , and the ratio of the pixel value of the target resolution image in the width direction to the pixel value of the first resolution image in the width direction.
  • the terminal can obtain the offset matrix and the magnification ratio based on the image of the first resolution and the target resolution according to the existing algorithm.
  • the offset matrix For the calculation method of the offset matrix, refer to the foregoing terminology introduction on the image upsampling process.
  • offset,scale GET_POSITION_INFO((height LR ,width LR ),(height HR ,width HR ))
  • offset the position offset information of the target resolution image mapped on the low-resolution image
  • scale the ratio of the target resolution to the first resolution
  • (height LR , width LR ) the low-resolution input super-resolution model
  • (height HR ,width HR ) The resolution of the high-resolution image corresponding to the low-resolution image, that is, the super-resolution target resolution
  • GET_POSITION_INFO Calculate the position offset according to the target resolution and the original resolution Shift information offset and magnification scale.
  • the terminal performs feature extraction on the image of the first resolution, and obtains a feature map including texture features;
  • the terminal extracts the texture feature of the image of the first resolution according to the preset neural network model, and obtains a feature map, which may specifically be a multi-channel feature map.
  • the preset neural network model is a pre-trained neural network, or an existing neural network, which is not specifically limited here.
  • the type of the neural network is convolutional neural networks (CNN).
  • LR_image the low-resolution image of the input super-resolution model
  • CNN the convolutional neural network for extracting features
  • feature maps the multi-channel feature map for the feature output of the convolutional neural network.
  • the feature map process usually does not change the resolution of the image, but it can change the number of channels in the image.
  • the original resolution is (height, width), and the number of channels is channel (if the image is in RGB format, the channel value is 3; if the image is a grayscale image, the channel value is 1)
  • the image is converted into a feature map of shape (height, width, channel'), and the value of channel' is related to the CNN network design.
  • step 301 and step 302 are not specifically limited.
  • the terminal performs information fusion of the pixel position offset information and the feature map to obtain a target weight matrix
  • the target weight matrix is used to provide the weight information required for the upsampling step in the superscore process.
  • the terminal inputs the offset matrix and magnification ratio obtained in step 301 and the feature map obtained in step 302 into the preset first neural network model to output the target weight matrix.
  • the terminal obtains an initial weight matrix according to the offset matrix and the magnification ratio obtained in step 301; then the terminal performs information fusion on the initial weight matrix and the feature map obtained in step 302 to obtain the target weight matrix.
  • Information fusion can be achieved through neural networks, optionally including common ways of channel fusion in neural networks: concatenation operator (concat), concatenation and stacking (concat+add), or attention mechanism (attention).
  • concat concatenation operator
  • concat+add concat+add
  • attention mechanism attention mechanism
  • weight matrix FUSION(TRANSFORM(feature maps),weight maps)
  • weight maps the weight value generated according to the pixel position information
  • OFFSET_TRANSFORM the algorithm of position information processing, which can be realized by using the fully connected neural network or other nonlinear mapping algorithm
  • feature maps the multi-channel feature map extracted by the feature extraction step
  • TRANSFORM Perform feature transformation on multi-channel feature maps
  • FUSION fuse texture feature information and pixel position offset information to generate a weight matrix
  • weight matrix target weight matrix.
  • the texture feature processing process feature transformation is mainly performed on the extracted feature map, so that the feature information can be adapted to the pixel position information, and the feature transformation does not change the resolution of the feature map; the pixel position offset
  • the position information can be represented by an offset matrix and a magnification ratio, and a preliminary weight map is generated through the conversion of the position information; finally, the weight map containing the pixel position offset information and the feature map containing the pixel texture feature information are used for information. Fusion to generate the final target weight matrix.
  • the terminal acquires the image of the target resolution according to the target weight matrix
  • the weight matrix uses the weight matrix to upsample the feature map to obtain a super-resolution image of the target resolution. Since the target weight matrix incorporates the information of the required magnification ratio, any multiple overscore from the first resolution image to the target resolution image can be achieved according to the magnification ratio.
  • SR_image fuse feature information and pixel position information to generate a weight matrix
  • Upsample_Transform use the weight matrix to upsample and channel transform the feature map to output an image of the target resolution.
  • FIG. 5 is a schematic diagram of a system for deploying an image super-segmentation method in an embodiment of the present application
  • FIG. 5 shows the technology and system that is deployed on the terminal device and adapts the screen resolution using the super-division module that can achieve any multiple of super-division.
  • the system mainly includes: a pixel position offset information extraction module (or a generation offset module) 501, a texture feature extraction module 502, and a feature fusion module, which may also be referred to as scale arbitrary super resolution in the embodiment of the present application , SASR) module 503, upsampling module 504, etc.
  • Fig. 5 shows an example of an arbitrary multiple super-score scenario of a terminal device.
  • the training of the feature extraction network and the information fusion network is completed on the PC side.
  • the pixel position offset information extraction module 501, texture The feature extraction network 502, the SASR module 503, and the upsampling module 504 are deployed on the computing chip of the mobile terminal.
  • the application program carried by the terminal device gives an image or video frame, which needs to be displayed on the screen or window.
  • the resolution of the screen and window will be larger than that of the image or video, and the pixel position is offset.
  • the displacement information extraction module 501 obtains the resolution of the original image (ie the first resolution) and the resolution of the target window or screen (ie the target resolution), generates an offset matrix through high and low resolution pixel mapping, and calculates the image length and
  • the wide magnification scale, scale and offset can be calculated by the neural network to obtain the initial weight matrix (ie the initial weight matrix) weight maps.
  • the texture feature extraction module 502 obtains the image of the first resolution, and performs feature extraction, and the extracted multi-channel feature map is converted into a new feature map (feature maps), and the feature maps and weight maps are fused by the SASR module 503.
  • the network generates the target weight matrix.
  • the upsampling module 504 uses the weight matrix to upsample the multi-channel feature map extracted by the feature extraction network to obtain a high-quality image of the target resolution.
  • the super-resolution model involved in the image super-resolution method needs to be acquired through pre-training before being deployed in the terminal device.
  • the super-resolution model mainly includes four parts: a texture feature extraction module, a pixel position offset information extraction module, a SASA module, and an upsampling module.
  • the texture feature extraction module can be implemented based on the model of convolutional neural network
  • the SASA module can be implemented based on the model of convolutional neural network and fully connected neural network.
  • the SASR module is composed of a neural network, and its parameters are involved in the update iteration;
  • the texture feature extraction module is composed of an existing neural network, and the offset module and the upsampling module are completed by ordinary mathematical calculations, that is, the training of the super-score model is only Involves the SASR module;
  • the texture feature extraction module and the SASR module are composed of neural networks, whose parameters participate in the update iteration;
  • the offset generation module and the upsampling module are completed by ordinary mathematical calculations, that is, the training of the super-score model involves texture features Extraction module and SASR module.
  • the texture feature extraction module and the SASR module are composed of neural networks, and their parameters are involved in the update iteration; and the offset generation module and the upsampling module are completed by ordinary mathematical calculation as an example for introduction.
  • FIG. 6 is a schematic diagram of the product realization form of the image super-segmentation method in the embodiment of the application; wherein, the training process of the neural network involved in the method is completed on the side of the first electronic device, and the first electronic device may be, for example, a server or a An electronic device with many computing resources and storage resources, such as a desktop computer, is introduced by taking a desktop computer (PC) as an example in this embodiment.
  • the trained neural network is deployed on the side of the second electronic device to achieve image super-score.
  • the second electronic device can be various electronic devices with display functions, including mobile terminals, tablet computers, wearable devices or smart screens, etc. In the embodiments, a mobile terminal is used as an example for description only.
  • Training datasets include high-resolution datasets and low-resolution datasets.
  • the sources of high-definition image data include: public datasets, web crawling, and self-collection, including high-quality images of buildings, landscapes, people, etc., while low-resolution datasets are generated from high-resolution images. Crop high-definition images and downsample them at different scales to reduce resolution.
  • Dataset training data set
  • labelset collected high-definition images
  • dataset low-resolution images produced by downsampling of high-definition images
  • Downsample downsampling, the operation of reducing images, and the downsampling multiple is not fixed.
  • the implementation process of collecting data sets in step 1 is shown.
  • high-definition images are collected, including public data sets, data sets searched by search engines, and manually collected data sets; then several kinds of images are generated.
  • the size range is within the range of (90, 400) (pixel*pixel), for example, (128, 128), (192, 192) can be generated , (256, 384), (384, 384) and other cutting sizes.
  • the collected high-definition images are evenly cropped into the resolution sizes of the several cropping sizes, and patches of different cropping sizes are obtained.
  • patches of different crop sizes are the image data in the high-resolution image dataset; then use the low-resolution size generator to determine a series of corresponding low-resolution sizes for each crop size patch, for example, for (128 , 128) of the crop size patch can generate (96, 96), (64, 64), (48, 48), (64, 48), (48, 64), (32, 32), for each
  • the patch obtained by cropping the resolution is down-sampled corresponding to the low-resolution size, so as to obtain low-resolution images with different reduction ratios as a low-resolution image dataset.
  • Figure 8b shows that the HR dataset is formed by generating patches of different sizes from the original high-definition image, and then a series of downsampling is performed for each size of the patch to generate a series of low-resolution images to form the LR dataset.
  • the far left in the figure represents the collected high-definition images, including categories such as person, building, and landscape.
  • the following images are HR dataset and low-resolution dataset in turn. (LR dataset).
  • the process of super-resolution of a low-resolution image to obtain a high-resolution image mainly includes four parts: a feature extraction module, an offset generation module, a SASR module, and an upsampling module.
  • the feature extraction module is used for the model based on the convolutional neural network to extract the pixel texture features of the low-resolution image.
  • the model structure can be designed and built according to the computing power of the platform.
  • the feature extraction module usually does not change the resolution of the image, but can change the number of channels of the image; the generation offset module is used to generate the offset matrix containing the pixel position information and obtain the magnification ratio of the image; the SASR module is used to fuse the pixel texture features and Pixel location information. That is, information fusion is performed according to multi-channel features, offset matrix and magnification ratio to generate a weight matrix.
  • the information fusion method includes concat, concat+add or attention.
  • the up-sampling module is used to up-sample the feature map corresponding to the multi-channel feature according to the target weight matrix generated by the SASA module to obtain a super-resolution image of the target resolution.
  • the specific model structure of the super-resolution model can refer to Figure 8d.
  • the low-resolution images 8001 of different resolutions are input into the super-resolution network, the model processes the low-resolution images, outputs the super-resolution images 8004, and calculates the loss 8005 between the super-resolution images and the high-resolution images , such as the regression loss (huber loss), use the optimizer 8006 to optimize the loss function, such as the Adam optimizer, to update the model weights until the loss function converges, and a trained super-score model is obtained.
  • the model processes the low-resolution images, outputs the super-resolution images 8004, and calculates the loss 8005 between the super-resolution images and the high-resolution images , such as the regression loss (huber loss), use the optimizer 8006 to optimize the loss function, such as the Adam optimizer, to update the model weights until the loss function converges, and a trained super-score model is obtained.
  • the trained super-score model is deployed on the mobile terminal and implemented by the glsl language.
  • the original rendering pipeline is that the image 8012 in the application (APP) 8011 needs to be displayed, and the image is enlarged or reduced to the target resolution through bilinear interpolation in the rendering pipeline 8013, and then displayed on the screen.
  • a pipeline adapted to the algorithm of arbitrary multiple superscores uses an arbitrary multiple overscore model instead of the bilinear interpolation algorithm.
  • the texture feature extraction module and SASR module in the trained model are converted into the glsl language through the conversion program (shaderNN converter) 8014, while the generation offset module and the upsampling module in the super-score model are manually converted into the glsl language to adapt to the entire super-resolution model. split model.
  • the image enters the rendering pipeline.
  • the rendering pipeline relies on the Opengl library (Opengl libs) to call each module that has been converted into the OpenGL Shading Language (glsl) language to process the image 8012.
  • the pipeline converts the target window or The resolution of the screen is sent to the super-resolution model to calculate the offset and other information, and the final image resolution generated by the model is the same as the resolution of the target window or screen.
  • a class of existing learning-based image super-resolution methods uses sub-pixel convolution layers to achieve super-score. Due to the limitation of the network structure, a network model can only achieve a fixed integer multiple of over-score. If there is a need to change the magnification , the network needs to be redesigned and retrained.
  • the image super-resolution method provided by the embodiments of the present application is based on an innovative up-sampling process, and uses a target weight matrix that integrates pixel position offset information and texture features to perform up-sampling, and can achieve arbitrary multiples of super-resolution.
  • this scheme will input the pre-trained neural network model (ie, the feature fusion module in this application) based on the actual required amplification ratio, and the output target weight matrix is used for upsampling, which can be obtained.
  • the magnification ratio ie any multiple
  • the target weight matrix in this scheme fuses pixel position offset information and texture features, which avoids the image quality loss problem similar to interpolation superdivision technology caused by upsampling only based on pixel position offset information.
  • the software or firmware includes, but is not limited to, computer program instructions or code, and can be executed by a hardware processor.
  • the hardware includes, but is not limited to, various types of integrated circuits, such as a central processing unit (CPU), a digital signal processor (DSP), a field programmable gate array (FPGA), or an application specific integrated circuit (ASIC).
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • the image super-resolution device includes:
  • an obtaining unit 901 configured to obtain pixel position offset information according to an image of a first resolution and a target resolution, where the target resolution is greater than the first resolution
  • the obtaining unit 901 is further configured to perform feature extraction on the image of the first resolution to obtain a feature map including texture features;
  • a processing unit 902 configured to perform information fusion on the pixel position offset information and the feature map to obtain a target weight matrix
  • the obtaining unit 901 is further configured to obtain an image of the target resolution according to the target weight matrix.
  • the pixel position offset information includes an offset matrix and an enlargement ratio
  • the offset matrix is used to indicate the sampling offset of each pixel in the image of the first resolution
  • the enlargement ratio is: The ratio between the target resolution and the first resolution.
  • the processing unit 902 is specifically configured to: input the pixel position offset information and the feature map into a first neural network model to obtain the target weight matrix.
  • the first neural network model includes a network model composed of a convolutional neural network and a fully connected neural network.
  • the processing unit 902 is specifically configured to: obtain an initial weight matrix according to the offset matrix and the enlargement ratio; input the initial weight matrix and the feature map into a second neural network model to obtain the target weight matrix.
  • the obtaining unit 901 is specifically configured to: input the image of the first resolution into a third neural network model to output the feature map.
  • the obtaining unit 901 is specifically configured to: upsample the feature map according to the target weight matrix to obtain an image of the target resolution.
  • FIG. 10 is a schematic diagram of an embodiment of a terminal in an embodiment of the present application.
  • the terminal 1000 provided in this embodiment may be various types of terminals with display functions, such as a mobile phone, a tablet computer, a desktop computer, a smart screen, or a wearable device, and the specific device form is not limited in this embodiment of the present application.
  • the terminal 1000 may vary greatly due to different configurations or performances, and may include one or more processors 1001 and a memory 1002 in which programs or data are stored.
  • the memory 1002 may be volatile storage or non-volatile storage.
  • the processor 1001 is one or more central processing units (CPU, central processing unit, which can be a single-core CPU or a multi-core CPU.
  • CPU central processing unit
  • the processor 1001 can communicate with the memory 1002 and execute on the terminal 1000 .
  • a series of instructions in memory 1002 can be accessed using any number of instructions.
  • the terminal 1000 also includes one or more wired or wireless network interfaces 1003, such as Ethernet interfaces.
  • the terminal 1000 may also include one or more power supplies; one or more input and output interfaces, which may be used to connect a display, a mouse, a keyboard, a touch screen device or a sensing device etc., the input and output interfaces are optional components, which may or may not exist, and are not limited here.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

一种图像超分方法,用于将低分辨率的图像转换为高分辨率的图像显示,应用于各类具有显示功能的电子设备,在获取像素位置偏移信息的基础上,还提取了低分辨率的图像的纹理特征,获取融合像素位置偏移信息和纹理特征的权重矩阵,基于该权重矩阵获取高分辨率的图像。由于目标权重矩阵融合了像素位置偏移信息和纹理特征,可以避免仅依靠像素位置偏移信息进行插值处理带来的画质损失,提升高分辨率图像的图像质量。

Description

图像超分方法和电子设备
本申请要求于2020年9月21日提交中国国家知识产权局、申请号为202010997488.5、发明名称为“图像超分方法和电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,尤其涉及一种图像超分方法和电子设备。
背景技术
随着电子显像技术发展,屏幕分辨率不断提升。即使是几英寸的手机屏幕也普遍达到了1080*1920(像素*像素)的分辨率,而物理尺寸更大的电视屏幕和电脑屏幕更是发展到高达4K甚至8K的分辨率。屏幕分辨率的提升使得图像和视频具有更大的显示空间,然而原有的视频、图片等图像资源常常无法完美适配屏幕分辨率,视频播放、视频通话时的网络带宽限制也会使设备接收到低分辨率的原始图像。
图像超分辨率(super resolution,SR)是指由一副低分辨率(low resolution,LR)图像或图像序列恢复出高分辨率(high resolution,HR)图像或图像序列。基于学习的SR方法是近年来的热点方向,其基本思路是根据已经给定的训练图像集,计算训练集中LR图像和HR图像之间的映射关系。由于深度卷积模型具有获取图像高层抽象信息的能力,因此这种映射关系通常采用深度卷积模型来实现。
现有的,能够进行非整数倍超分网络的上采样方式绝大多数仍然沿袭了传统插值算法的思想,包括取邻域、根据像素位置偏移信息计算权重、加权平均这三个步骤。渲染过程中无法避免插值操作带来的例如锯齿、模糊等画质损失问题。
发明内容
本申请实施例提供了一种图像超分方法,用于将低分辨率的图像转换为高分辨率的图像显示,由于本方法中权重矩阵融合了像素位置偏移信息和纹理特征,可以避免仅依靠像素位置偏移信息进行插值处理带来的画质损失,可以提升高分辨率图像的图像质量。
本申请实施例的第一方面提供了一种图像超分方法,包括:终端根据第一分辨率的图像和目标分辨率获取像素位置偏移信息,所述目标分辨率大于所述第一分辨率;所述终端对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图;所述终端将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵;所述终端根据所述目标权重矩阵,获取所述目标分辨率的图像。
本申请实施例的方法,终端可以获取应用程序的原始分辨率的图像,本实施例中将原始分辨率称为第一分辨率,还可以获取该图像显示时的目标分辨率,根据第一分辨率的图像和目标分辨率可以获取像素位置偏移信息,此外,终端还提取了第一分辨率的图像的纹理特征,通过融合像素位置偏移信息和纹理特征获取目标权重矩阵,目标权重矩阵用于超分方法的上采样操作中提供每个像素点的采样权重,由于本方案中的目标权重矩阵融合了像素位置偏移 信息和纹理特征,基于该目标权重矩阵获取高分辨率的图像,可以避免仅依靠像素位置偏移信息获取权重矩阵时带来的类似传统插值操作产生的画质损失,因此,可以提升高分辨率图像的图像质量。
在第一方面的一种可能的实现方式中,所述像素位置偏移信息包括偏移矩阵和放大比例,所述偏移矩阵用于指示所述第一分辨率的图像中每个像素点的采样偏移,所述放大比例为所述目标分辨率与所述第一分辨率之间的比值。
本申请实施例的方法,像素位置偏移信息具体包括偏移矩阵和放大比例,其中,根据第一分辨率和目标分辨率可以计算得到放大比例,放大比例包括目标分辨率的图像与第一分辨率的图像在长度方向以及宽度方向的像素数量的比值。偏移矩阵由目标分辨率的图像中每个像素点的采样偏移组成。由于目标分辨率为图像实际需要的分辨率,其数值具有任意性,因此,放大比例可以为任意倍数,由此获取像素位置偏移信息,进一步融合纹理特征得到的目标权重矩阵,可以用于实现任意倍数的超分。
在第一方面的一种可能的实现方式中,所述终端将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵包括:所述终端将所述像素位置偏移信息和所述特征图输入第一神经网络模型,以获取所述目标权重矩阵。
本申请实施例的方法,通过预先训练的第一神经网络模型,可以将偏移矩阵、放大比例和特征图的信息进行融合获取目标权重矩阵,与现有超分方法中权重矩阵仅携带像素位置偏移信息不同,根据本方法提供的目标权重矩阵获取高分辨率的图像时,可以避免类似传统插值操作产生的画质损失,因此,可以提升高分辨率图像的图像质量。
在第一方面的一种可能的实现方式中,所述第一神经网络模型包括由卷积神经网络和全连接神经网络组合而成的网络模型。
本申请实施例的方法,第一神经网络模型可以由卷积神经网络和全连接神经网络组合而成,其中,卷积神经网络用于转换包含纹理特征的特征图,全连接神经网络用于处理像素位置偏移信息。
在第一方面的一种可能的实现方式中,所述终端将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵包括:所述终端根据所述偏移矩阵和放大比例获取初始权重矩阵;所述终端将所述初始权重矩阵和所述特征图输入第二神经网络模型,以获取所述目标权重矩阵。
本申请实施例的方法,提供了另一种信息融合方法,其中根据所述偏移矩阵和放大比例获取初始权重矩阵为现有技术中用于上采样的权重矩阵,本方法可以基于现有技术,将获取的初始权重矩阵以及特征图输入预训练的第二神经网络模型中进行信息融合,获取本方案中用于图像上采样的目标权重矩阵,提高了方案实现的灵活性。
在第一方面的一种可能的实现方式中,所述终端对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图包括:所述终端将所述第一分辨率的图像输入第三神经网络模型,以输出所述特征图。
本申请实施例的方法,特征图可以基于神经网络模型获取,例如已有的卷积神经网络等。
在第一方面的一种可能的实现方式中,所述终端根据所述目标权重矩阵,获取所述目标分辨率的图像包括:所述终端根据所述目标权重矩阵对所述特征图进行上采样,以获取所述 目标分辨率的图像。
本申请实施例的方法,终端根据目标权重矩阵对所述特征图进行上采样获取目标分辨率的图像,相较对第一分辨率的图像进行上采样获取的图像的质量更好。
本申请实施例的第二方面提供了一种模型训练方法,包括:根据第一分辨率的训练图像和目标分辨率获取像素位置偏移信息,并根据像素位置偏移信息获取初始权重图;对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图;将所述纹理特征、所述初始权重图输入第一神经网络进行训练,获取第一损失值;根据所述第一损失值更新所述第一网络中的权重参数,以获取目标神经网络。
本申请实施例提供的模型训练方法,训练的模型可以用于图像超分辨率,基于现有的初始权重图融合像素位置偏移信息,通过该方法训练得到的目标网络可以用于图像超分,使得获取的高分辨率图像的画质提高。
本申请实施例的第三方面提供了一种模型训练方法,包括:根据第一分辨率的训练图像和目标分辨率获取像素位置偏移信息;对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图;将所述像素位置偏移信息、所述初始权重图输入第二神经网络进行训练,获取第一损失值;根据所述第一损失值更新所述第二网络中的权重参数,以获取目标神经网络。
本申请实施例提供的模型训练方法,训练的模型可以用于图像超分辨率,融合像素位置偏移信息和包含纹理特征的特征图,通过该方法训练得到的目标网络可以用于图像超分,使得获取的高分辨率图像的画质提高。
本申请实施例第四方面提供了一种图像超分装置,包括:获取单元,用于根据第一分辨率的图像和目标分辨率获取像素位置偏移信息,所述目标分辨率大于所述第一分辨率;所述获取单元,还用于对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图;处理单元,用于将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵;所述获取单元,还用于根据所述目标权重矩阵,获取所述目标分辨率的图像。
在第四方面的一种可能的实现方式中,所述像素位置偏移信息包括偏移矩阵和放大比例,所述偏移矩阵用于指示所述第一分辨率的图像中每个像素点的采样偏移,所述放大比例为所述目标分辨率与所述第一分辨率之间的比值。
在第四方面的一种可能的实现方式中,所述处理单元,具体用于:将所述像素位置偏移信息和所述特征图输入第一神经网络模型,以获取所述目标权重矩阵。
在第四方面的一种可能的实现方式中,所述第一神经网络模型包括由卷积神经网络和全连接神经网络组合而成的网络模型。
在第四方面的一种可能的实现方式中,所述处理单元,具体用于:根据所述偏移矩阵和放大比例获取初始权重矩阵;
将所述初始权重矩阵和所述特征图输入第二神经网络模型,以获取所述目标权重矩阵。
在第四方面的一种可能的实现方式中,所述获取单元,具体用于:将所述第一分辨率的图像输入第三神经网络模型,以输出所述特征图。
在第四方面的一种可能的实现方式中,所述获取单元,具体用于:根据所述目标权重矩阵对所述特征图进行上采样,以获取所述目标分辨率的图像。
本申请实施例第五方面提供了一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行如上述第一方面以及各种可能的实现方式中任一项所述的方法。
本申请实施例第六方面提供了一种计算机可读存储介质,包括指令,其特征在于,当所述指令在计算机上运行时,使得计算机执行如上述第一方面以及各种可能的实现方式中任一项所述的方法。
本申请实施例第七方面提供了一种芯片,包括一个或多个处理器。所述处理器中的部分或全部用于读取并执行存储器中存储的计算机程序,以执行上述任一方面任意可能的实现方式中的方法。可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。进一步可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。在一些实现方式中,所述一个或多个处理器中还可以有部分处理器是通过专用硬件的方式来实现以上方法中的部分步骤,例如涉及神经网络模型的处理可以由专用神经网络处理器或图形处理器来实现。本申请提供的方法可以由一个芯片实现,也可以由多个芯片协同实现。
其中,第四方面、第五方面、第六方面或第七方面中任一种实现方式所带来的技术效果可参见第一方面中相应实现方式所带来的技术效果,此处不再赘述。
从以上技术方案可以看出,本申请实施例具有以下优点:
本申请实施例的方法,终端根据第一分辨率的图像和目标分辨率获取像素位置偏移信息,此外,终端还提取了第一分辨率的图像的纹理特征,通过融合像素位置偏移信息和纹理特征获取目标权重矩阵,目标权重矩阵用于超分方法的上采样操作中提供每个像素点的采样权重,由于本方案中的目标权重矩阵融合了像素位置偏移信息和纹理特征,基于该目标权重矩阵获取高分辨率的图像,可以避免仅依靠像素位置偏移信息获取权重矩阵时带来的类似传统插值操作产生的画质损失,因此,可以提升高分辨率图像的图像质量。
附图说明
图1为图像超分方法的应用场景的示意图;
图2为本申请实施例中图像超分方法的系统架构的示意图;
图3为本申请实施例中图像超分方法的一个实施例示意图;
图4a为本申请实施例中特征图的提取方法的一个实施例示意图;
图4b为本申请实施例中特征融合模块的一个实施例示意图;
图5为本申请实施例中图像超分方法的另一个实施例示意图;
图6为本申请实施例中图像超分方法的产品实现形态的示意图;
图7为本申请实施例中超分模块的训练过程的示意图;
图8a为本申请实施例中训练数据集的采集过程的示意图;
图8b为本申请实施例中高分辨率数据集和低分辨率数据集的获取过程的示意图;
图8c为本申请实施例中超分模型的结构示意图;
图8d为本申请实施例中超分模型训练过程的示意图;
图8e为本申请实施例中超分模型在终端中部署的应用场景示意图;
图9为本申请实施例中图像超分装置的一个实施例示意图;
图10为本申请实施例中终端的一个实施例示意图。
具体实施方式
本申请实施例提供了一种图像超分方法,用于将低分辨率的图像转换为高分辨率的图像显示,由于目标权重矩阵融合了像素位置偏移信息和纹理特征,可以避免仅依靠像素位置偏移信息进行插值处理带来的画质损失,提升高分辨率图像的图像质量。
为了便于理解,下面对本申请实施例涉及的部分技术术语进行简要介绍:
1、为避免混淆,下面对几个涉及分辨率的术语进行介绍和区分。
图像分辨率:指图像中存储的信息量,是每英寸图像内有多少个像素点,常用“水平像素数×垂直像素数”来表示,也可以用规格代号表示。例如,图像分辨率640*480,代表水平像素点数量为640个,垂直像素点数量为480个,分辨率就为307200像素,也就是常说的30万像素。规格代号P是逐行扫描的意思,720P对应的分辨率为1280*720,类似地,1080P对应的分辨率为1920*1080。还可以用规格代号K描述分辨率,由于构成数字图像的像素数量巨大,通常以K来表示水平方向像素值,不考虑画幅比,其中,1K=2 10=1024,2K=2 11=2048,4K=2 12=4096。4K分辨率是指水平方向每行像素值达到或者接近4096个。
可以理解的是,图像分辨率越高,包含的数据越多,也能表现更丰富的细节,但同时需要更多的计算机存储资源。
显示分辨率:显示器分辨率是指计算机显示器本身的物理分辨率,对CRT显示器而言,是指屏幕上的荧光粉点;对LCD显示器来说,是指显示屏上的像素,显示分辨率是显示器在生产制造时加工确定,描述的是显示器自身的像素点数量,是固有的不可改变的值。显示器分辨率通常用“水平像素数X垂直像素数”的形式表示,如800×600,1024×768,1280×1024等,也可以用规格代号表示。显示分辨率对于显示设备而言拥有非常重要的意义,在同尺寸屏幕大小的情况下,分辨率越高意味着屏幕更加细腻,即能够将画面的细节呈现得更加清晰,能大大增加用户的视觉体验。
屏幕分辨率,是指实际显示图像时采用的分辨率,可以根据用户需要设置屏幕分辨率。屏幕分辨率的上限受显示分辨率大小的限制。
2、图像超分辨率(super resolution,SR)是指由一副低分辨率(low resolution,LR)图像或图像序列,恢复出高分辨率(high resolution,HR)图像或图像序列,常简称超分。
3、下面对图像上采样过程中涉及到的术语进行介绍:
1)上采样:是一种插值的过程,应用于数字信号处理,当一串数字序列经过上采样之后,输出的结果约等于初始模拟信号经过更高的采样速率采样后所得的序列。
2)图像上采样:与上采样类似,是一种插值的过程,一张第一分辨率的图像可以看作二维的数字矩阵,当这个矩阵经过上采样之后,输出的结果约等于真实世界的模拟信号图像经过更高的采样速率采样后的数字矩阵,即目标分辨率的图像。为了便于描述本申请的方案,将图像上采样过程描述为:目标分辨率图像上的像素点对第一分辨率图像的像素点进行采样并加权平均,详细过程为:根据高分辨率图像的像素坐标计算出相应第一分辨率图像的采样 位置,并进一步计算出采样中心和偏移,取采样中心及其周围固定大小的邻域内的像素点作为待处理的像素点,然后根据偏移计算出采样权重,使用采样权重对邻域内待处理的像素点加权平均,得到上采样结果。需要说明的是,选取的采样中心附近领域的像素点数量可以预设,对于其具体数值不做限定。
3)像素坐标:一张图像当中,像素点所处位置的坐标,像素坐标必为整数。以目标分辨率的图像为例:x_out,y_out分别表示目标分辨率图像上的一个像素点在长度方向和宽度方向上的坐标,满足0≤x_out<w_out,0≤y_out<h_out。其中w_out表示目标分辨率的宽度,h_out表示目标分辨率的长度。
4)采样位置:一张第一分辨率的图像经过图像上采样获得目标分辨率的图像过程中,目标分辨率的图像中每个像素点的输出结果需要从第一分辨率图像中采样获得。目标分辨率图像中每个像素点的位置经过与放大倍数相关的计算过程,对应到第一分辨率的图像当中的位置,即为采样位置。下面以目标分辨率图像上的一个像素点(x_out,y_out)为例,介绍该像素点的采样位置的计算过程:
Figure PCTCN2021118901-appb-000001
Figure PCTCN2021118901-appb-000002
其中,sample_pos x为采样位置在长度方向上的坐标;sample_pos y为采样位置在宽度方向上的坐标;w_in为第一分辨率图像的像素宽度;h_in为第一分辨率图像的像素长度;w_out表示目标分辨率图像的像素宽度,h_out表示目标分辨率的长度。
5)采样中心:采样位置经过取整操作,可以获取的采样中心的像素坐标。
下式为一种取整操作的示例:
Figure PCTCN2021118901-appb-000003
Figure PCTCN2021118901-appb-000004
其中,center x为采样中心在宽度方向上的坐标,center y为采样中心在长度方向上的坐标,
Figure PCTCN2021118901-appb-000005
代表对n向下取整。
6)偏移:或称采样偏移,采样位置与采样中心的距离,偏移是生成采样权重的重要依据。示例性地,偏移的计算公式如下:
offset x=sample_pos x-center x
offset y=sample_pos y-center y
其中,offset x代表采样位置相对采样中心在宽度方向上的偏移距离,offset y代表采样位置相对采样中心在长度方向上的偏移距离。
由目标分辨率图像中每个像素点的偏移组成的矩阵,即为偏移矩阵(offset矩阵)。
下面结合附图,对本申请的实施例进行描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了 一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块。在本申请中出现的对步骤进行的命名或者编号,并不意味着必须按照命名或者编号所指示的时间/逻辑先后顺序执行方法流程中的步骤,已经命名或者编号的流程步骤可以根据要实现的技术目的变更执行次序,只要能达到相同或者相类似的技术效果即可。
本申请实施例提供的图像超分方法可以应用于各类具有显示功能的电子设备,例如移动终端、平板、笔记本、电脑、电视、一体机或投影仪等,对于电子设备的具体类型不做限定。输出的图像内容包括各类媒体信息或实时操作画面等,可以是静态图像或动态图像即视频,本申请中对进行超分辨率显示的图像的内容不做限定。下面以终端设备和图像为例进行介绍。
首先,结合图1对图像超分方法的应用场景进行介绍。
随着电子显像技术发展,屏幕分辨率不断提升。图像在电子设备中显示时的屏幕分辨率较高,而图像资源的原始分辨率通常较低,常常无法完美适配屏幕分辨率。如图1所示,原始的图像分辨率为180P即使通过3倍超分,将图像放大至540P,由于手机在横屏时,屏幕分辨率为1080P;在竖屏时,屏幕分辨率则为720P,若需匹配屏幕分辨率,仍需对540P的图像进行插值拉伸,画面可能产生锯齿,模糊等插值带来的负面效果。
本申请实施例提供的图像超分方法,用于实现任意倍数超分,适配终端设备显示器,还可以减少传统超分方法由于插值带来的画质损失,改善图像画质。
下面介绍本申请实施例中图像超分方法的系统架构,请参阅图2。
图2给出了一种移动终端的超分辨率应用场景示例,包括应用程序、渲染管线和显示模块三个部分。移动终端的应用程序将需要显示的图像或视频帧送入渲染管线,渲染管线调用超分模块放大图像、进行渲染,最终通过显示模块在显示设备上显示一张高分辨率的高画质图像。
本申请实施例提供的图像超分方法,通过改进的超分模块实现图像超分。
根据实际应用场景,超分低分辨率图像至目标分辨率,超分模块具体可以通过超分模型实现。
超分过程的形式化描述如下:
image_out=SR_model(LR_image,(height LR,width LR),(height dst,width dst))
其中,LR_image:指待超分的低分辨率图像;image_out:超分模型生成的高分辨率图像;(height dst,width dst):目标分辨率;(height LR,width LR):原始分辨率;SR_model:超分模型。
下面,对本申请实施例提供的图像超分方法进行详细介绍,请参阅图3。
301、终端根据第一分辨率的图像和目标分辨率获取像素位置偏移信息;
终端的应用程序中的图像或视频帧,需在终端的显示屏幕或显示窗口中以目标分辨率显示,目标分辨率可以为终端的应用程序中预设的分辨率,也可以根据用户操作确定的使用场景确定,具体数值不做限定,可以理解的是,目标分辨率一般高于第一分辨率。终端获取应用程序的原始图像,原始图像的分辨率为第一分辨率,由于第一分辨率小于目标分辨率时,终端需要对图像进行超分辨率处理,以获取目标分辨率的图像或视频帧。由于视频帧由序列图像组成,本申请实施例及后续实施例中以图像为例对本申请实施例提供的图像超分方法进行介绍。
像素位置偏移信息包括偏移矩阵和放大比例,其中,偏移矩阵用于指示所述第一分辨率的图像中每个像素点的在放大至所述目标分辨率时的位置偏移量,所述放大比例(scale)为所述目标分辨率与所述第一分辨率之间的比值,包括目标分辨率的图像在长度方向的像素值与第一分辨率的图像在长度方向的像素值的比值,以及目标分辨率的图像在宽度方向的像素值与第一分辨率的图像在宽度方向的像素值的比值。考虑到第一分辨率和目标分辨率的数值大小具有任意性,超分需要满足的放大比例亦需要为任意倍数。
终端可以根据已有算法,基于第一分辨率的图像和目标分辨率获取offset矩阵和放大比例,offset矩阵的计算方法,可参考前述关于图像上采样过程的术语介绍。
本步骤的形式化描述为:
offset,scale=GET_POSITION_INFO((height LR,width LR),(height HR,width HR))
其中,offset:目标分辨率的图像映射在低分辨率图像上的位置偏移信息;scale:目标分辨率与第一分辨率的比值;(height LR,width LR):输入超分模型的低分辨率图像的分辨率大小;(height HR,width HR):低分辨率图像所对应的高分辨率图像的分辨率,即超分目标分辨率;GET_POSITION_INFO:根据目标分辨率和原始分辨率计算位置偏移信息offset和放大倍数scale。
302、终端对第一分辨率的图像进行特征提取,获取包含纹理特征的特征图;
终端根据预设的神经网络模型提取所述第一分辨率的图像的纹理特征,得到特征图,具体可以是多通道特征图。
可选地,该预设的神经网络模型为预训练的神经网络,或者为现有的神经网络,具体此处不做限定。可选地,该神经网络的类型为卷积神经网络(convolutional neural networks,CNN)。
本步骤的形式化描述为:
feature maps=CNN(LR_image)
其中,LR_image:输入超分模型的低分辨率图像;CNN:提取特征的卷积神经网络;feature maps:卷积神经网络提取特征输出的多通道特征图。
特征图的过程通常不改变图像的分辨率,但是可以改变图像的通道数目。如图4a所示,经过卷积神经网络的处理,原本分辨率为(height,width),通道数为channel(如果图像为RGB格式,channel值为3;如果图像为灰度图,channel值为1)的图像转化为了形状为(height,width,channel′)的特征图,channel′的值与CNN网络设计有关。
需要说明的是,步骤301和步骤302的执行顺序具体不做限定。
303、终端将像素位置偏移信息和特征图进行信息融合,以获取目标权重矩阵;
目标权重矩阵用于提供超分过程中上采样步骤所需的权重信息。终端根据所述像素位置偏移信息和所述特征图进行信息融合获取目标权重矩阵的方法有多种。
可选地,终端将步骤301获取的偏移矩阵和放大比例,以及步骤302获取的特征图输入预置的第一神经网络模型,以输出目标权重矩阵。
可选地,终端根据步骤301获取的偏移矩阵和放大比例获取初始权重矩阵;然后终端对初始权重矩阵和步骤302获取的特征图进行信息融合,以获取所述目标权重矩阵。信息融合可以通过神经网络实现,可选地,包括神经网络中通道融合的常用方式:连接算子(concat)、连接和叠加(concat+add)或注意力机制(attention)等。该步骤的形式化描述为:
weight maps=OFFSET_TRANSFORM(offset,scale)
weight matrix=FUSION(TRANSFORM(feature maps),weight maps)
其中,weight maps:根据像素位置信息生成的权重值;OFFSET_TRANSFORM:位置信息处理的算法,可以使用全连接神经网络或其他非线性映射算法实现;feature maps:由特征提取步骤提取的多通道特征图;TRANSFORM:对多通道特征图进行特征转换;FUSION:融合纹理特征信息和像素位置偏移信息生成权重矩阵。weight matrix:目标权重矩阵。
如图4b所示,纹理特征处理流程中,主要是对提取到的特征图进行特征转换操作,以使特征信息能够适配像素位置信息,特征转换不改变特征图的分辨率;像素位置偏移信息的处理流程中,位置信息可以用偏移矩阵和放大比例表示,经过位置信息转换生成初步的权重图;最后,包含像素位置偏移信息的权重图和包含像素纹理特征信息的特征图进行信息融合,生成最终的目标权重矩阵。
304、终端根据目标权重矩阵,获取所述目标分辨率的图像;
使用weight matrix对特征图进行上采样,得到目标分辨率的超分图片。由于目标权重矩阵融合了需求的放大比例的信息,可以按照该放大比例实现第一分辨率图像至目标分辨率图像的任意倍数超分。
本步骤的形式化描述:
SR image=Upsample_Transform(feature maps,wieght matrix)
其中,SR_image:融合特征信息和像素位置信息生成权重矩阵;Upsample_Transform:使用权重矩阵对特征图上采样和通道转换,可以输出目标分辨率的图像。
请参阅图5,为本申请实施例中部署图像超分方法的系统的示意图;
如图5所示,展示了部署在终端设备,使用可以实现任意倍数超分的超分模块适配屏幕分辨率的技术及系统。该系统主要包括:像素位置偏移信息提取模块(或称生成offset模块)501、纹理特征提取模块502、特征融合模块,本申请实施例中也可称为任意倍数超分辨率(scale arbitrary super resolution,SASR)模块503、上采样模块504等。图5给出了一种终端设备的任意倍数超分场景示例,可选地,其中特征提取网络和信息融合网络的训练在PC端完成,完成训练后,像素位置偏移信息提取模块501、纹理特征提取网络502、SASR模块503、上采样模块504部署在移动端的计算芯片上。
以终端设备为系统主体,终端设备所承载的应用程序给出图像或视频帧,需在屏幕或窗口中显示,一般屏幕和窗口的分辨率会大于图像或视频的分辨率,此时像素位置偏移信息提取模块501获取原始图像的分辨率(即第一分辨率)和目标窗口或屏幕的分辨率(即目标分辨率),经过高低分辨率像素点映射生成offset矩阵,并分别计算图像长和宽的放大倍数scale,scale和offset经过神经网络计算可以得到初步的权重矩阵(即初始权重矩阵)weight maps。纹理特征提取模块502获取第一分辨率的图像,并进行特征提取,提取出来的多通道特征图经过特征转换生成新的特征图(feature maps),feature maps与weight maps经过SASR模块503的信息融合网络生成目标权重矩阵。上采样模块504使用weight矩阵对特征提取网络提取出来的多通道特征图进行上采样,得到目标分辨率的高画质图像。
根据上述实施例中对图像超分方法的介绍可知,该图像超分方法涉及的超分模型在部署于终端设备之前,需要通过预先训练获取。具体的,本申请实施例中,超分模型主要包括四部分:纹理特征提取模块、像素位置偏移信息提取模块、SASA模块和上采样模块四部分。其中:纹理特征提取模块可以基于卷积神经网络的模型实现;SASA模块可以基于卷积神经网络 和全连接神经网络的模型实现。可选地,SASR模块由神经网络构成,其参数参与更新迭代;纹理特征提取模块由现有的神经网络构成,生成offset模块和上采样模块由普通数学计算完成,即对超分模型的训练仅涉及SASR模块;可选地,纹理特征提取模块和SASR模块由神经网络构成,其参数参与更新迭代;而生成offset模块和上采样模块则由普通数学计算完成,即超分模型的训练涉及纹理特征提取模块和SASR模块。
本申请实施例中,以纹理特征提取模块和SASR模块由神经网络构成,其参数参与更新迭代;而生成offset模块和上采样模块则由普通数学计算完成为例进行介绍。
请参阅图6,为本申请实施例中图像超分方法的产品实现形态的示意图;其中,本方法涉及的神经网络的训练过程在第一电子设备侧完成,第一电子设备例如可以是服务器或台式电脑等计算资源和存储资源较多的电子设备,本实施例中以台式电脑(PC)为例进行介绍。训练好的神经网络被部署于第二电子设备侧实现图像超分,第二电子设备可以为各类具有显示功能的电子设备,包括移动终端、平板电脑、可穿戴设备或智慧屏等等,本实施例中以移动终端为例仅仅介绍。如图6所示,首先使用搜集的高清数据集制作训练数据集,在PC上训练超分模型(包括纹理特征提取模块和SASR模块),基于渲染管线的转换程序(ShaderNN converter)将模型转化为可以在移动终端运行的形式,移动终端的应用程序将需要显示的图像送入渲染管线,渲染管线调用转化好的超分模型进行前向推理与渲染,最后,通过显示模块显示一张高画质的高分辨率图像。
下面对本申请实施例中超分模型的训练过程进行介绍,请参阅图7。
701、获取训练数据集;
首先,需要创建不同分辨率倍数关系的图像对作为训练数据集。
训练数据集包括高分辨率数据集和低分辨率数据集。
首先需要收集高清图像,高清图像数据的来源有:公开数据集、网络爬取、自行收集,包含建筑、风景、人物等种类的高质量图像,低分辨率数据集由高分辨率图像生成。将高清图像进行剪裁,然后以不同尺度进行下采样缩小分辨率。
针对本任务,需要构建包含各种纹理、线条的高清数据集,从互联网和已公开数据集中收集高清的具有不同纹理特征的图片,包含建筑、人物、风景等内容。高清的数据作为label,经过下采样的数据作为data。
本步骤的形式化描述:
labelset={buildings|person|landscape…}
labelset =Downsample(labelset)
Dataset={labelset,dataset|dataset∈labelset }
其中,Dataset:训练数据集合;labelset:收集到的高清图像;dataset:使用高清图像经过下采样制作的低分辨率图像;Downsample:下采样,缩小图像的操作,下采样倍数不固定。
具体地,如图8a所示,展示了步骤一收集数据集的实施流程,首先采集高清图像,包含公开数据集、使用搜索引擎搜索的数据集、人工采集的数据集;接下来生成若干种图像块(patch)的裁切尺寸,即分辨率大小,可选地,尺寸的范围在(90,400)(像素*像素)的范围内,例如可以生成(128,128)、(192,192)、(256,384)、(384,384)等若干个裁切尺寸。将搜集到的高清图像均匀裁切成该若干裁切尺寸的分辨率大小,得到不同剪裁尺寸的patch。这些不同剪裁尺寸的patch即为高分辨率图像数据集中的图像数据;然后使用低分辨 率尺寸生成器分别针对每个裁切尺寸的patch确定一系列对应的的低分辨率尺寸,例如针对(128,128)的裁切尺寸的patch可以生成(96,96)、(64,64)、(48,48)、(64,48)、(48,64)、(32,32),对每个裁切分辨率所得的patch进行对应低分辨率尺寸的下采样,以获取不同缩小比例的低分辨率图像,作为低分辨率图像数据集。
图8b展示了由原始高清图像生成不同尺寸的patch形成HR dataset,再针对每个尺寸的patch进行一系列下采样,生成一系列低分辨率图像形成LR dataset。图中最左侧表示收集到的高清图像,包含人像(person),建筑(building),风景(landscape)等类别,后面的图像依次为高分辨率数据集(HR dataset)和低分辨率数据集(LR dataset)。
702、搭建超分模型;
本申请实施例中,对低分辨率图像进行超分辨率获取高分辨图像的过程主要包括四部分:特征提取模块、生成offset模块、SASR模块和上采样模块等四部分构成。其中:特征提取模块用于基于卷积神经网络的模型,提取低分辨率图像的像素纹理特征,可选地,像素纹理特征的提取通过基于卷积神经网络的模型实现,提取低分辨率图像的多通道特征,模型结构可以根据平台算力设计搭建。特征提取模块通常不改变图像的分辨率,但是可以改变图像的通道数目;生成offset模块,用于生成包含像素位置信息的offset矩阵,并获取图像放大比例;SASR模块,用于融合像素纹理特征与像素位置信息。即根据多通道特征、offset矩阵和放大比例进行信息融合,生成权重矩阵。可选地,信息融合方法包括concat、concat+add或attention。上采样模块,用于根据SASA模块生成的目标权重矩阵,对多通道特征对应的特征图进行上采样,得到目标分辨率的超分图片。
如图8c所示,搭建模型,连接模型形成可训练超分网络,基于步骤701获取的训练数据集进行模型训练,最后使用优化器优化模型,可选地,优化器选用自适应矩估计(adaptive moment estimation,Adam)优化器。模型训练的过程为已有技术,具体此处不再赘述。需要说明的是,在模型训练的多次迭代过程中,每次迭代过程选用的训练数据图像对具有相同放大比例,多次迭代多次过程中不同迭代训练中使用的训练数据图像对可以具有不同的放大比例,由此,可以提升超分模型适用的超分放大比例范围。
超分模型的具体模型结构可以参考图8d。根据前面的介绍搭建纹理特征提取模块和SASR模块,然后基于图8d给出的生成offset模块8002和上采样模块8003的流程完成网络数据流之间的连接,完成一个可训练的超分网络模型。
703、根据训练数据集进行训练,获取训练好的超分模型;
最后,如图8d所示,将不同分辨率的低分辨率图像8001输入超分网络,模型处理低分辨率图像,输出超分图像8004,计算超分图像与高分辨率图像之间的损失8005,具体例如回归损失(huber loss),使用优化器8006优化损失函数,具体例如Adam优化器,更新模型权重,直至损失函数收敛,得到训练好的超分模型。
如图8e所示,在超分模型训练好之后,将训练好的超分模型部署在移动终端,依托glsl语言实现。原渲染管线为,应用程序(APP)8011中的图像8012需要显示,图像在渲染管线8013中经过双线性插值放大或缩小到目标分辨率,再显示到屏幕上。适配了任意倍数超分的算法的管线使用任意倍数超分模型代替双线性插值算法。训练好的模型中的纹理特征提取模块和SASR模块通过转换程序(shaderNN converter)8014转化为glsl语言,而超分模 型中的生成offset模块和上采样模块则手动转化为glsl语言,适配整个超分模型。超分模型部署完成后,图像进入渲染管线,渲染管线依托Opengl库(Opengl libs),调用已经转化为着色语言(OpenGL Shading Language,glsl)语言形式的各个模块处理图像8012,同时管线将目标窗口或屏幕的分辨率送到超分模型中以计算offset等信息,最终模型生成的图像分辨率与目标窗口或屏幕分辨率相同。
已有的一类基于学习的图像超分辨率方法,通过亚像素卷积层实现超分,受网络结构的限制,一个网络模型仅能实现固定整数倍的超分,如果有改变放大倍数的需求,则需要重新设计网络并重新训练。而本申请实施例提供的图像超分方法,基于创新的上采样过程,利用融合了像素位置偏移信息和纹理特征的目标权重矩阵进行上采样,可以实现任意倍数超分。在目标权重矩阵的获取过程中,本方案将基于实际需要的放大比例输入预训练的神经网络模型(即本申请中的特征融合模块),输出的目标权重矩阵用于进行上采样,即可获取满足实际需要的放大比例(即任意倍数)。同时,本方案中的目标权重矩阵融合了像素位置偏移信息和纹理特征,避免了仅基于像素位置偏移信息进行上采样造成的类似插值超分技术的画质损失问题。
上面介绍了本申请提供的图像超分方法,下面对实现该图像超分方法的图像超分装置进行介绍,请参阅图9,为本申请实施例中图像超分装置的一个实施例示意图。
图9中的各个模块的只一个或多个可以软件、硬件、固件或其结合实现。所述软件或固件包括但不限于计算机程序指令或代码,并可以被硬件处理器所执行。所述硬件包括但不限于各类集成电路,如中央处理单元(CPU)、数字信号处理器(DSP)、现场可编程门阵列(FPGA)或专用集成电路(ASIC)。
该图像超分装置,包括:
获取单元901,用于根据第一分辨率的图像和目标分辨率获取像素位置偏移信息,所述目标分辨率大于所述第一分辨率;
所述获取单元901,还用于对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图;
处理单元902,用于将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵;
所述获取单元901,还用于根据所述目标权重矩阵,获取所述目标分辨率的图像。
可选地,所述像素位置偏移信息包括偏移矩阵和放大比例,所述偏移矩阵用于指示所述第一分辨率的图像中每个像素点的采样偏移,所述放大比例为所述目标分辨率与所述第一分辨率之间的比值。
可选地,所述处理单元902,具体用于:将所述像素位置偏移信息和所述特征图输入第一神经网络模型,以获取所述目标权重矩阵。
可选地,所述第一神经网络模型包括由卷积神经网络和全连接神经网络组合而成的网络模型。
可选地,所述处理单元902,具体用于:根据所述偏移矩阵和放大比例获取初始权重矩阵;将所述初始权重矩阵和所述特征图输入第二神经网络模型,以获取所述目标权重矩阵。
可选地,所述获取单元901,具体用于:将所述第一分辨率的图像输入第三神经网络模 型,以输出所述特征图。
可选地,所述获取单元901,具体用于:根据所述目标权重矩阵对所述特征图进行上采样,以获取所述目标分辨率的图像。
请参阅图10,为本申请实施例中终端的一个实施例示意图;
本实施例提供的终端1000,可以为各类具有显示功能的终端,例如手机、平板电脑、台式电脑、智慧屏或可穿戴设备等,本申请实施例中对其具体设备形态不做限定。
该终端1000可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器1001和存储器1002,该存储器1002中存储有程序或数据。
其中,存储器1002可以是易失性存储或非易失性存储。可选地,处理器1001是一个或多个中央处理器(CPU,central processing unit,该CPU可以是单核CPU,也可以是多核CPU。处理器1001可以与存储器1002通信,在终端1000上执行存储器1002中的一系列指令。
该终端1000还包括一个或一个以上有线或无线网络接口1003,例如以太网接口。
可选地,尽管图10中未示出,终端1000还可以包括一个或一个以上电源;一个或一个以上输入输出接口,输入输出接口可以用于连接显示器、鼠标、键盘、触摸屏设备或传感设备等,输入输出接口为可选部件,可以存在也可以不存在,此处不做限定。
本实施例中终端1000中的处理器1001所执行的流程可以参考前述方法实施例中描述的方法流程,此处不加赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (17)

  1. 一种图像超分方法,其特征在于,包括:
    终端根据第一分辨率的图像和目标分辨率获取像素位置偏移信息,所述目标分辨率大于所述第一分辨率;
    所述终端对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图;
    所述终端将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵;
    所述终端根据所述目标权重矩阵,获取所述目标分辨率的图像。
  2. 根据权利要求1所述的方法,其特征在于,所述像素位置偏移信息包括偏移矩阵和放大比例,所述偏移矩阵用于指示所述第一分辨率的图像中每个像素点的采样偏移,所述放大比例为所述目标分辨率与所述第一分辨率之间的比值。
  3. 根据权利要求1或2所述的方法,其特征在于,所述终端将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵包括:
    所述终端将所述像素位置偏移信息和所述特征图输入第一神经网络模型,以获取所述目标权重矩阵。
  4. 根据权利要求3所述的方法,其特征在于,所述第一神经网络模型包括由卷积神经网络和全连接神经网络组合而成的网络模型。
  5. 根据权利要求2所述的方法,其特征在于,所述终端将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵包括:
    所述终端根据所述偏移矩阵和放大比例获取初始权重矩阵;
    所述终端将所述初始权重矩阵和所述特征图输入第二神经网络模型,以获取所述目标权重矩阵。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述终端对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图包括:
    所述终端将所述第一分辨率的图像输入第三神经网络模型,以输出所述特征图。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述终端根据所述目标权重矩阵,获取所述目标分辨率的图像包括:
    所述终端根据所述目标权重矩阵对所述特征图进行上采样,以获取所述目标分辨率的图像。
  8. 一种图像超分装置,其特征在于,包括:
    获取单元,用于根据第一分辨率的图像和目标分辨率获取像素位置偏移信息,所述目标分辨率大于所述第一分辨率;
    所述获取单元,还用于对所述第一分辨率的图像进行特征提取,以获取包含纹理特征的特征图;
    处理单元,用于将所述像素位置偏移信息和所述特征图进行信息融合,以获取目标权重矩阵;
    所述获取单元,还用于根据所述目标权重矩阵,获取所述目标分辨率的图像。
  9. 根据权利要求8所述的装置,其特征在于,所述像素位置偏移信息包括偏移矩阵和放大比例,所述偏移矩阵用于指示所述第一分辨率的图像中每个像素点的采样偏移,所述放大 比例为所述目标分辨率与所述第一分辨率之间的比值。
  10. 根据权利要求8或9所述的装置,其特征在于,所述处理单元,具体用于:
    将所述像素位置偏移信息和所述特征图输入第一神经网络模型,以获取所述目标权重矩阵。
  11. 根据权利要求10所述的装置,其特征在于,所述第一神经网络模型包括由卷积神经网络和全连接神经网络组合而成的网络模型。
  12. 根据权利要求9所述的装置,其特征在于,所述处理单元,具体用于:
    根据所述偏移矩阵和放大比例获取初始权重矩阵;
    将所述初始权重矩阵和所述特征图输入第二神经网络模型,以获取所述目标权重矩阵。
  13. 根据权利要求8至12中任一项所述的装置,其特征在于,所述获取单元,具体用于:
    将所述第一分辨率的图像输入第三神经网络模型,以输出所述特征图。
  14. 根据权利要求8至13中任一项所述的装置,其特征在于,所述获取单元,具体用于:
    根据所述目标权重矩阵对所述特征图进行上采样,以获取所述目标分辨率的图像。
  15. 一种终端,其特征在于,包括:一个或多个处理器和存储器;其中,
    所述存储器中存储有计算机可读指令;
    所述一个或多个处理器读取所述计算机可读指令以使所述终端实现如权利要求1至7中任一项所述的方法。
  16. 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至7中任一项所述的方法。
  17. 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至7中任一项所述的方法。
PCT/CN2021/118901 2020-09-21 2021-09-17 图像超分方法和电子设备 WO2022057868A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21868690.5A EP4207051A4 (en) 2020-09-21 2021-09-17 IMAGE SUPER-RESOLUTION METHOD AND ELECTRONIC DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010997488.5 2020-09-21
CN202010997488.5A CN114298900A (zh) 2020-09-21 2020-09-21 图像超分方法和电子设备

Publications (1)

Publication Number Publication Date
WO2022057868A1 true WO2022057868A1 (zh) 2022-03-24

Family

ID=80776498

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118901 WO2022057868A1 (zh) 2020-09-21 2021-09-17 图像超分方法和电子设备

Country Status (3)

Country Link
EP (1) EP4207051A4 (zh)
CN (1) CN114298900A (zh)
WO (1) WO2022057868A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115564653A (zh) * 2022-09-30 2023-01-03 江苏济远医疗科技有限公司 一种多因子融合的图像超分辨率方法
CN116205284A (zh) * 2023-05-05 2023-06-02 北京蔚领时代科技有限公司 基于新型重参数化结构的超分网络、方法、装置及设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116416134A (zh) * 2023-04-04 2023-07-11 阿里巴巴(中国)有限公司 图像超分处理方法、系统、设备、存储介质和程序产品
CN116862769B (zh) * 2023-07-04 2024-05-10 深圳市晶帆光电科技有限公司 一种图像分辨率提升方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062744A (zh) * 2017-12-13 2018-05-22 中国科学院大连化学物理研究所 一种基于深度学习的质谱图像超分辨率重建方法
CN108765279A (zh) * 2018-03-19 2018-11-06 北京工业大学 一种面向监控场景的行人人脸超分辨率重建方法
CN109903221A (zh) * 2018-04-04 2019-06-18 华为技术有限公司 图像超分方法及装置
US20190378242A1 (en) * 2018-06-06 2019-12-12 Adobe Inc. Super-Resolution With Reference Images

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109685717A (zh) * 2018-12-14 2019-04-26 厦门理工学院 图像超分辨率重建方法、装置及电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062744A (zh) * 2017-12-13 2018-05-22 中国科学院大连化学物理研究所 一种基于深度学习的质谱图像超分辨率重建方法
CN108765279A (zh) * 2018-03-19 2018-11-06 北京工业大学 一种面向监控场景的行人人脸超分辨率重建方法
CN109903221A (zh) * 2018-04-04 2019-06-18 华为技术有限公司 图像超分方法及装置
US20190378242A1 (en) * 2018-06-06 2019-12-12 Adobe Inc. Super-Resolution With Reference Images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4207051A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115564653A (zh) * 2022-09-30 2023-01-03 江苏济远医疗科技有限公司 一种多因子融合的图像超分辨率方法
CN116205284A (zh) * 2023-05-05 2023-06-02 北京蔚领时代科技有限公司 基于新型重参数化结构的超分网络、方法、装置及设备

Also Published As

Publication number Publication date
EP4207051A1 (en) 2023-07-05
EP4207051A4 (en) 2024-03-27
CN114298900A (zh) 2022-04-08

Similar Documents

Publication Publication Date Title
WO2022057868A1 (zh) 图像超分方法和电子设备
Sun et al. Learned image downscaling for upscaling using content adaptive resampler
TWI728465B (zh) 圖像處理方法和裝置、電子設備及儲存介質
CN108022212B (zh) 高分辨率图片生成方法、生成装置及存储介质
WO2019153671A1 (zh) 图像超分辨率方法、装置及计算机可读存储介质
Yan et al. Single image superresolution based on gradient profile sharpness
US8655109B2 (en) Regression-based learning model for image upscaling
CN110163237B (zh) 模型训练及图像处理方法、装置、介质、电子设备
US8538200B2 (en) Systems and methods for resolution-invariant image representation
WO2022110638A1 (zh) 人像修复方法、装置、电子设备、存储介质和程序产品
JP3837575B2 (ja) 超解像処理の高速化方法
CN113994366A (zh) 用于视频超分辨率的多阶段多参考自举
US10657711B2 (en) Surface reconstruction for interactive augmented reality
JP2019067078A (ja) 画像処理方法、及び画像処理プログラム
CN108876716B (zh) 超分辨率重建方法及装置
WO2023284401A1 (zh) 图像美颜处理方法、装置、存储介质与电子设备
CN112991171B (zh) 图像处理方法、装置、电子设备及存储介质
CN110290285B (zh) 图像处理方法、图像处理装置、图像处理系统及介质
CN107220934B (zh) 图像重建方法及装置
WO2024032331A9 (zh) 图像处理方法及装置、电子设备、存储介质
Ning et al. Multi-frame image super-resolution reconstruction using sparse co-occurrence prior and sub-pixel registration
Liu et al. Gradient prior dilated convolution network for remote sensing image super resolution
CN112581363A (zh) 图像超分辨率重建方法、装置、电子设备及存储介质
CN111857626A (zh) 基于屏幕坐标的图片显示方法、装置、设备及存储介质
CN111369425A (zh) 图像处理方法、装置、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21868690

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021868690

Country of ref document: EP

Effective date: 20230330

NENP Non-entry into the national phase

Ref country code: DE