CN115689881A

CN115689881A - Image super-resolution method, system, storage medium and terminal equipment

Info

Publication number: CN115689881A
Application number: CN202110865586.8A
Authority: CN
Inventors: 王树朋; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2021-07-29
Filing date: 2021-07-29
Publication date: 2023-02-03

Abstract

The embodiment of the invention discloses an image super-resolution method, an image super-resolution system, a storage medium and terminal equipment, which are applied to the technical field of information processing. In the method of the embodiment, the image super-resolution device performs a convolution operation based on the second convolution kernel on the low-dimensional features in the image to be processed through a simplified structure, namely the folded convolution module, so as to obtain the high-dimensional features of the image to be processed, and then determines the super-resolution image of the image to be processed according to the high-dimensional features, so that the operation in the whole process is simplified, and the method can be widely applied to user terminals with limited resources.

Description

Image super-resolution method, system, storage medium and terminal equipment

Technical Field

The invention relates to the technical field of information processing, in particular to an image super-resolution method, an image super-resolution system, a storage medium and terminal equipment.

Background

With the development of intelligent devices (such as display devices, shooting devices, and the like) and image and video processing technologies, multimedia platforms (such as video live platforms or video providing platforms, and the like) have become an indispensable part of our daily life, and thus, a large amount of multimedia information is generated.

In order to reduce the occupied space of multimedia, the existing multimedia platform generally processes multimedia, for example, a video transmitted to a cloud platform by a user is compressed on the premise of no obvious loss of image quality, so that the size of the compressed video is within 10M, but the quality of the compressed video is reduced, which affects the experience of the user in watching the video.

In order to enable the user terminal to clearly display the video of the video platform, in some application scenes, the user terminal can process the video of the video platform and then display the video by a video super-resolution method, wherein the video super-resolution can optimize the image quality of the video, but the video super-resolution method does not pursue an extreme super-resolution effect but achieves the balance of the speed and the effect of the video super-resolution. Specifically, an existing video super-resolution method mainly adopts a neural network with a complex structure to perform image super-resolution processing on each frame of image in a video, but because resources of a user terminal are relatively limited, the effect of optimizing the image quality of the video by adopting the method is not good.

Disclosure of Invention

The embodiment of the invention provides an image super-resolution method, an image super-resolution system, a storage medium and terminal equipment, which realize the processing of image super-resolution by adopting a simpler structure.

An embodiment of the present invention provides an image super-resolution method, including:

acquiring an image to be processed;

extracting low-dimensional features of the image to be processed, and inputting the low-dimensional features into a folded convolution module;

performing a convolution operation based on a second convolution kernel on the low-dimensional feature through the folded convolution module to obtain a high-dimensional feature of the image to be processed;

and determining a hyper-resolution image of the image to be processed according to the high-dimensional feature.

Another aspect of an embodiment of the present invention provides an image super-resolution device, including:

the image acquisition unit is used for acquiring an image to be processed;

the feature extraction unit is used for extracting low-dimensional features of the image to be processed and inputting the low-dimensional features into a folded convolution module;

the convolution unit is used for performing a convolution operation based on a second convolution kernel on the low-dimensional feature through the folded convolution module to obtain a high-dimensional feature of the image to be processed;

and the hyper-segmentation processing unit is used for determining a hyper-segmentation image of the image to be processed according to the high-dimensional feature.

In another aspect, an embodiment of the present invention further provides a computer-readable storage medium, which stores a plurality of computer programs, where the computer programs are suitable for being loaded by a processor and executing the image hyper-segmentation method according to an embodiment of the present invention.

In another aspect, an embodiment of the present invention further provides a terminal device, including a processor and a memory;

the memory is used for storing a plurality of computer programs, and the computer programs are used for being loaded by the processor and executing the image hyper-segmentation method according to the aspect of the embodiment of the invention; the processor is configured to implement each of the plurality of computer programs.

In the method of the embodiment, the image hyper-segmentation device performs a convolution operation based on the second convolution kernel on the low-dimensional features in the image to be processed through a simplified structure, namely, the folded convolution module, to obtain the high-dimensional features of the image to be processed, and then determines the hyper-segmentation image of the image to be processed according to the high-dimensional features, so that the operation in the whole process is simplified, and the method can be widely applied to user terminals with limited resources.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a diagram illustrating an image super-resolution method according to an embodiment of the present invention;

FIG. 2a is a flowchart of an image hyper-segmentation method according to an embodiment of the present invention;

FIG. 2b is a flow chart of a method for obtaining a current image hyper-segmentation model in an embodiment of the present invention;

FIG. 3a is a schematic structural diagram of a pre-trained hyper-segmentation model of an image according to an embodiment of the present invention;

FIG. 3b is a schematic structural diagram of a current image hyper-segmentation model according to an embodiment of the present invention;

FIG. 3c is a schematic diagram of another current image hyper-segmentation model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another pre-trained hyper-segmentation model for images and a current hyper-segmentation model for images in an embodiment of the present invention;

FIG. 5a is a schematic structural diagram of a pre-trained hyper-segmentation model according to an embodiment of the present invention;

FIG. 5b is a schematic structural diagram of a current image hyper-segmentation model according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a pre-trained hyper-segmentation model provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of a distributed system to which an information processing method is applied in another embodiment of the present invention;

FIG. 8 is a block diagram illustrating an exemplary block structure according to another embodiment of the present invention;

FIG. 9 is a schematic diagram of a logic structure of an image super-resolution device according to an embodiment of the present invention;

fig. 10 is a schematic logical structure diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the invention provides an image super-resolution method, which is mainly a method for optimizing image quality of an image, and as shown in fig. 1, the image super-resolution method can be realized by an image super-resolution device according to the following steps:

acquiring an image to be processed;

and determining the hyper-resolution image of the image to be processed according to the high-dimensional characteristic.

The extraction of the low-dimensional features of the image to be processed, the obtaining of the high-dimensional features of the image to be processed and the determination of the hyper-resolution image of the image to be processed can be realized through a current image hyper-resolution model, and the current image hyper-resolution model is a machine learning model based on artificial intelligence. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross subject, and relates to multiple subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

Therefore, in the method of this embodiment, the image super-resolution device performs a convolution operation based on the second convolution kernel on the low-dimensional feature in the image to be processed through a simplified structure, that is, the folded convolution module, to obtain the high-dimensional feature of the image to be processed, and then determines the super-resolution image of the image to be processed according to the high-dimensional feature, so that the operation in the whole process is simplified, and the method can be widely applied to user terminals with limited resources.

The embodiment of the invention provides an image super-resolution method, which is a method executed by an image super-resolution device, and a flow chart is shown as figure 2a, and comprises the following steps:

step 101, acquiring an image to be processed.

It can be understood that the method in this embodiment may be applied to an image super-resolution device, specifically, the image super-resolution device may be a resource-limited user terminal or other device, and the image super-resolution device may initiate the flow of this embodiment for any independent image or each frame image in any video.

In a specific implementation manner, a user may operate the image super-resolution device to enable the image super-resolution device to provide an initiation interface, where the initiation interface includes an input interface and an initiation interface, so that the user may input any image or video through the input interface, and operate the initiation interface, so that the image super-resolution device responds to the operation on the initiation interface, and will take each frame image in the image or video input by the user from the input interface as an image to be processed, and trigger the image super-resolution flow of this embodiment; in another specific implementation manner, the image super-resolution device may take each frame image in the received image or video as an image to be processed when receiving an image or video with a certain lower resolution (for example, the resolution is lower than a certain threshold) sent by another device, and automatically trigger the image super-resolution process of this embodiment.

And 102, extracting low-dimensional features of the image to be processed, and inputting the low-dimensional features into a folded convolution module, wherein the folded convolution module is a primary convolution operation module based on a second convolution kernel.

And 103, performing primary convolution operation based on a second convolution kernel on the low-dimensional features through the folded convolution module to obtain the high-dimensional features of the image to be processed.

Here, the low-dimensional feature of the image to be processed is mainly a local feature of an object included in the image to be processed, where the local feature is used to describe feature information that a plurality of elements in the object respectively have, specifically, attribute information that each element respectively corresponds to, and each element may correspond to a plurality of items of attribute information, for example, the object included in the image to be processed is a tree, which includes elements such as a trunk, a leaf, and a branch, and the attribute information that the trunk, the leaf, and the branch respectively correspond to is a local feature, where the attribute information refers to an abstract depiction of an object, including properties and relationships of the object. In general, the low-dimensional features of the image to be processed may include features of an edge element and an element at a corner in an object included in the image to be processed, and it should be noted that, in an implementation manner, the low-dimensional features may not include features of all elements in the object included in the image to be processed, for example, if the image to be processed includes a tree, the low-dimensional features of the image to be processed include features of elements such as a leaf and a branch of the tree.

In order to describe the image to be processed more accurately, the low-dimensional features of the image to be processed need to be mapped to a high-dimensional space, so as to obtain the high-dimensional features of the image to be processed, where the high-dimensional features include global features of an object included in the image to be processed, where the global features refer to overall features of the object composed of elements in the object, it can be understood that the overall features are feature information used for describing overall presentation of the elements after the elements form the object, specifically, multiple items of attribute information of the object, for example, the object included in the image to be processed is a tree, the overall features of the object may include feature information such as height, shape, and number of flowers opened on the tree, and the feature information is not features of elements such as leaves and branches of the tree, but features that the tree presents as a whole are features, i.e., the global features, and the high-dimensional features include the global features. And the folded convolution module is a module for performing convolution operation on the low-dimensional features once, and the convolution kernel of the folded convolution module is a second convolution kernel as follows.

Specifically, the image super-resolution device inputs the low-dimensional features of the image to be processed into the folded convolution module to perform convolution processing based on the second convolution kernel, so as to obtain the high-dimensional features of the image to be processed; or inputting the low-dimensional features of the image to be processed into the folded convolution module to perform convolution processing based on the second convolution kernel to obtain second convolved features, and overlapping the second convolved features with the low-dimensional features of the image to be processed to obtain the high-dimensional features of the image to be processed.

And step 104, determining a hyper-resolution image of the image to be processed according to the high-dimensional features.

As can be seen, in the method of this embodiment, the image hyper-segmentation device performs a convolution operation based on the second convolution kernel on the low-dimensional feature in the image to be processed through a simplified structure to obtain the high-dimensional feature of the image to be processed, and then determines the hyper-segmentation image of the image to be processed according to the high-dimensional feature. The operation is simplified, and the method can be widely applied to user terminals with limited resources.

In a specific application, the above steps 102 to 104 may be specifically implemented by a current image hyper-segmentation model, so that the image hyper-segmentation device is required to obtain the current image hyper-segmentation model first, and then the current image hyper-segmentation model is used to output a hyper-segmentation image of the image to be processed, specifically, as shown in fig. 2b, the current image hyper-segmentation model may be obtained by the following steps:

step 201, determining a pre-trained image hyper-segmentation model, where the pre-trained image hyper-segmentation model includes a plurality of convolution modules, and one convolution module is a module that performs a convolution operation based on a first convolution kernel.

The pre-trained image hyper-segmentation model is used for extracting low-dimensional features of the image to be processed, converting the low-dimensional features through a plurality of convolution modules to obtain high-dimensional features, and determining the hyper-segmentation image of the image to be processed according to the high-dimensional features.

Step 202, equivalently converting a plurality of convolution modules in the pre-trained hyper-segmentation model of the image into the folded convolution modules, and obtaining the converted hyper-segmentation model of the image as a current hyper-segmentation model of the image, so that the current hyper-segmentation model of the image can perform the steps 102 to 104, wherein the number of the convolution modules is larger than that of the folded convolution modules.

The pre-trained image hyper-segmentation model is a pre-trained image hyper-segmentation model and is used for converting an image with any resolution into an image with a higher resolution, namely a hyper-segmentation image, so that the image quality of the image is improved. After the pre-trained image hyper-segmentation model is trained in advance, a current image hyper-segmentation model can be obtained according to the pre-trained image hyper-segmentation model, and then the operation logic of the current image hyper-segmentation model can be stored in the image hyper-segmentation device.

Generally, the structure of the pre-trained image hyper-segmentation model is a neural network structure, and specifically, the pre-trained image hyper-segmentation model can comprise three parts, namely a first part structure, and is used for extracting low-dimensional features of an image; the second part structure is used for converting the low-dimensional features extracted by the first part into high-dimensional features, and the high-dimensional features are obtained mainly by adopting convolution operation performed by a plurality of convolution modules; and the third part structure is used for outputting the hyper-resolution image according to the high-dimensional features converted by the second part. In the embodiment of the invention, when the current image hyper-segmentation model is determined, the second part structure in the pre-trained image hyper-segmentation model is simplified and equivalent to obtain a new second part structure, so that the current image hyper-segmentation model is determined.

In a specific implementation, when the image hyper-segmentation apparatus performs the step 202, that is, equivalently converting the plurality of convolution modules in the pre-trained image hyper-segmentation model into the folded convolution modules, the method may include, but is not limited to, the following several ways:

(1) The pre-trained image hyper-resolution model comprises a plurality of convolution modules which are connected in parallel, the input of the plurality of convolution modules is the same, the output of the plurality of convolution modules is overlapped and then the high-dimensional features are output, specifically, the plurality of convolution modules respectively carry out convolution processing based on corresponding first convolution kernels on the low-dimensional features of the image to be processed to obtain a plurality of first post-convolution features, and the plurality of first post-convolution features are overlapped to obtain the high-dimensional features of the image to be processed.

In this case, when performing the equivalent transformation, the size of the second convolution kernel of the folded convolution module may be obtained according to the size of the first convolution, and the folded convolution module may be determined according to the size of the second convolution kernel.

For example, fig. 3a shows a plurality of convolution modules, specifically 4 convolution modules, in the pre-trained hyper-segmentation model of the image, wherein the sizes of the first convolution kernels are 1 × 1, 1 × K, K × 1, and K × K, respectively; fig. 3b and fig. 3c are respectively the folded convolution modules in the two determined current image hyper-resolution models, and the size of the second convolution kernel is K × K, where K is a natural number greater than 1. A processing module (not shown in fig. 3b and 3 c), such as a modified Linear Unit (ReLU), is generally added before the output in the current image hyper-resolution model and after the convolution calculation, so that the high-dimensional features output by the convolved convolution module can be output after being modified, and the nonlinearity of the depth network is ensured.

The calculation of the plurality of convolution modules is equivalent to the calculation of the folded convolution module, and is mainly based on the calculation characteristics in the following formula 1, wherein I represents the low-dimensional characteristics of the input image to be processed, the calculation of the plurality of convolution modules is embodied by a formula on the left side of an equation, and the calculation of the folded convolution module is embodied by a formula on the right side of the equation:

I*K ⁽¹⁾ +I*K ⁽²⁾ ＝I*(K ⁽¹⁾ +K ⁽²⁾ ) (1)

in the specific implementation, K =3 is taken as an example to explain that, in the process of training the pre-trained image hyper-segmentation model, the number of floating point of Per Second (FLOPs) is 16HWC ² FLOPs in the current image hyper-resolution model is 9HWC ² Wherein, H, W and C respectively represent the height, width and channel number of the characteristic diagram; and on the processed data set, the current image hyper-resolution model can bring about 0.05dB performance improvement.

(2) The plurality of convolution modules are connected in series and multiplied, the output of one convolution module in the plurality of convolution modules is connected to the input of the other convolution module, wherein a certain convolution module outputs high-dimensional features, specifically, one convolution module in the plurality of convolution modules performs convolution processing based on a corresponding first convolution kernel on the low-dimensional features, the obtained first convolved features are input into the other convolution module in the plurality of convolution modules to perform convolution processing based on the corresponding first convolution kernel, and the obtained another first convolved features are input into the other convolution module to perform convolution processing based on the corresponding first convolution kernel; and taking the product of the first convolved features obtained by the plurality of convolution modules as the high-dimensional feature of the image to be processed.

For example, fig. 4 shows a plurality of convolution modules in a pre-trained image hyper-segmentation model and a folded convolution module in a current image hyper-segmentation model, where the plurality of convolution modules are specifically 3 convolution modules, the sizes of first convolution kernels of the convolution modules are 1 × 1, K × K, and 1 × 1, respectively, and the size of a second convolution kernel of the folded convolution module is K × K, where K is a natural number greater than 1. In the 3 convolution modules, the dimension of the input feature of one convolution module with the first convolution kernel size of 1 multiplied by 1 is n, and the dimension of the output feature is rn; the dimension of the input feature of a convolution module with a first convolution kernel of K multiplied by K is rn, and the dimension of the output feature is rm; the dimension rm of the input feature of the other convolution module with the first convolution kernel size of 1 multiplied by 1 and the dimension of the output feature of m; and the folded convolution module has n dimension of input features and m dimension of output features.

The calculation of the plurality of convolution modules is equivalent to the calculation of the folded convolution modules, and is mainly based on the calculation characteristics in the following formula 2, where W represents a parameter of a convolution kernel of any convolution module, n, m, and p represent a dimension of an input feature, a dimension of an output feature, and a middle expansion ratio, respectively, the calculation of the plurality of convolution modules is embodied by a formula on the right side of an equation, the calculation of the folded convolution module is embodied by a formula on the left side of the equation, and there is no operation between the plurality of convolution modules that makes the model have a non-Linear capability, such as an activation operation of a modified Linear Unit (ReLU):

in a specific implementation, K =3,m = n =C is an example for explaining that in the process of training the pre-trained image hyper-resolution model, the FLOPs is 2rHWC ² +9r ² HWC ² When r is 16, the FLOPs in the pre-trained image hyper-segmentation module is about 256 times of the FLOPs in the current image hyper-segmentation model; and on the processed data set, the current image hyper-resolution model can bring about 0.15dB performance improvement.

(3) The structure of the plurality of convolution modules and the folded convolution module is the combination of (1) and (2), specifically, the plurality of convolution modules are convolution modules of a plurality of branches connected in parallel, the input of the plurality of branch convolution modules is the same, and the output of the plurality of branch convolution modules is overlapped to output high-dimensional characteristics, wherein in the plurality of branch convolution modules, the first branch convolution module comprises a plurality of first convolution sub-modules connected in series and multiplied, and the second branch convolution module comprises a second convolution sub-module. Specifically, the convolution modules of the multiple branches respectively perform convolution processing based on corresponding first convolution kernels on the low-dimensional features of the image to be processed to obtain multiple first post-convolution features, and the multiple first post-convolution features are overlapped to obtain the high-dimensional features of the image to be processed.

The method comprises the steps that a first convolution sub-module of a convolution module of a first branch performs convolution processing based on a corresponding first convolution kernel on low-dimensional features of an image to be processed, after a first post-convolution sub-feature is obtained, the sub-feature is input into another first convolution sub-module of a plurality of first convolution sub-modules to perform convolution processing based on the corresponding first convolution kernel, and after another first post-convolution sub-feature is obtained, the sub-feature is input into another first convolution sub-module to perform convolution processing based on the corresponding first convolution kernel; and taking the product of the first convolution sub-features respectively obtained by the plurality of first convolution sub-modules as the first convolution features obtained by the convolution module of the first branch.

For example, fig. 5a shows convolution modules of multiple branches in a pre-trained image hyper-resolution model, and fig. 5b shows a folded convolution module in a current image hyper-resolution model, where the sizes of first convolution kernels of convolution modules of two second branches are 1 × 1 and K × K, respectively, the convolution modules of the other two first branches respectively include 3 first convolution sub-modules that are connected in series and multiplied, and the sizes of first convolution kernels of the 3 first convolution sub-modules in the convolution module of one first branch are: 1 × 1, K × K, and 1 × 1, the sizes of the first convolution kernels of the 3 first convolution sub-modules in the convolution module of the other branch are respectively: 1 × 1, K × K, and 1 × 1, where the first convolution sub-module with the first convolution kernel having a size of K × K may specifically be an Average Pooling (AVG) module, and the second convolution kernel with the folded convolution module has a size of K × K, where K is a natural number greater than 1.

In a specific implementation, in addition to the performance improvement in the two cases (1) and (2) above, when the pre-trained image hyper-segmentation model is trained in the manner of combining the structures in (1) and (2) in (3), the performance can be further improved, so that in the process of training to obtain the pre-trained image hyper-segmentation model, the FLOPs of the pre-trained image hyper-segmentation model can be twice as much as the FLOPs of the pre-trained image hyper-segmentation model obtained in (2) above; and the current image hyper-segmentation model can bring about 0.12 to 0.25dB performance improvement on the processed data set.

It can be seen that, the image hyper-segmentation device determines a current image hyper-segmentation model with a simpler structure according to a pre-trained image hyper-segmentation model with a more complex structure to obtain a hyper-segmentation image of an image to be processed, wherein a plurality of convolution modules in the pre-trained image hyper-segmentation model are simplified into a folded convolution module, and the image to be processed is subjected to image hyper-segmentation processing through the current image hyper-segmentation model, so that the calculation of the image hyper-segmentation is simplified.

The image super-resolution method in the present invention is described below with a specific application example, which mainly includes the following two parts:

(1) And training to obtain a pre-trained image hyper-resolution model.

Specifically, when the image hyper-component model is trained, a Fast Region-based connected Network (FSRCNN) or an enhanced Deep Super-Resolution (EDSR) structure may be selected, and may generally include the structure shown in fig. 6: the characteristic extraction module is used for extracting image characteristics of the image, mainly low-dimensional characteristics of the image; the feature processing module is configured to convert the image features extracted by the feature extraction module into high-dimensional features, and specifically includes a plurality of convolution modules (in the figure, a convolution module is taken as an example); and the output module is used for outputting the high-resolution image according to the high-dimensional features.

After the pre-trained image hyper-segmentation model is trained, the current image hyper-segmentation model can be obtained according to the pre-trained image hyper-segmentation model, specifically, a plurality of convolution modules in the feature processing module can be simplified into at least one folded convolution module, and the operation logic of the current image hyper-segmentation model is stored in the image hyper-segmentation device. The image super-resolution device may be a resource-limited device such as a user terminal.

(2) And processing the image to be processed through the current image hyper-resolution model.

The image super-resolution device receives images or videos sent by other devices, and when the resolution of each frame of image in the received images or videos is smaller than a certain threshold, the image super-resolution process of the embodiment of the invention can be triggered. Specifically, each frame of image in the received image or video may be used as an image to be processed, and the current image hyper-segmentation model is called to perform hyper-segmentation processing on the image to be processed, so as to obtain a hyper-segmentation image with a resolution higher than that of the image to be processed.

In the following, another specific application example is used to describe the image super-resolution method in the present invention, and the image super-resolution device in the embodiment of the present invention is mainly a distributed system 100, and the distributed system may include a client 300 and a plurality of nodes 200 (any form of computing devices in an access network, such as servers and user terminals), where the client 300 and the nodes 200 are connected in a network communication manner.

Taking a distributed system as an example of a blockchain system, referring To fig. 7, which is an optional structural schematic diagram of the distributed system 100 applied To the blockchain system provided in the embodiment of the present invention, the system is formed by a plurality of nodes 200 (computing devices in any form in an access network, such as servers and user terminals) and a client 300, a Peer-To-Peer (P2P) network is formed between the nodes, and the P2P Protocol is an application layer Protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.

Referring to the functions of each node in the blockchain system shown in fig. 7, the functions involved include:

1) Routing, the basic function a node has for supporting communication between nodes.

Besides the routing function, the node may also have the following functions:

2) The application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recorded data, carrying a digital signature in the recorded data to represent a source of task data, and sending the recorded data to other nodes in the block chain system, so that the recorded data is added to a temporary block when the other nodes verify the source and the integrity of the recorded data.

For example, the service implemented by the application further includes a code implementing an image super-score function, which mainly includes:

acquiring an image to be processed; extracting low-dimensional features of the image to be processed, and inputting the low-dimensional features into a folded convolution module; performing a convolution operation based on a second convolution kernel on the low-dimensional features through the folded convolution module to obtain high-dimensional features of the image to be processed; and determining a hyper-resolution image of the image to be processed according to the high-dimensional feature.

3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.

Referring to fig. 8, an optional Block Structure (Block Structure) provided in the embodiment of the present invention is shown, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.

An embodiment of the present invention further provides an image super-resolution device, a schematic structural diagram of which is shown in fig. 9, and the image super-resolution device specifically includes:

an image acquisition unit 10 for acquiring an image to be processed;

a feature extraction unit 11, configured to extract low-dimensional features of the image to be processed, which are obtained by the image obtaining unit 10, and input the low-dimensional features into a folded convolution module of the convolution unit 12.

And the convolution unit 12 is configured to perform a convolution operation based on a second convolution kernel on the low-dimensional feature through the folded convolution module to obtain a high-dimensional feature of the image to be processed.

The convolution unit 12 is specifically configured to input the low-dimensional feature of the image to be processed to the folded convolution module to perform convolution processing based on a second convolution kernel, so as to obtain a high-dimensional feature of the image to be processed; or inputting the low-dimensional features of the image to be processed into the folded convolution module to perform convolution processing based on a second convolution kernel to obtain second convolved features, and overlapping the second convolved features with the low-dimensional features of the image to be processed to obtain the high-dimensional features of the image to be processed.

And a hyper-segmentation processing unit 13, configured to determine a hyper-segmentation image of the to-be-processed image according to the high-dimensional feature obtained by the convolution unit 12.

Further, the image super-resolution device of the embodiment may further include:

an equivalent transformation unit 14, configured to determine a pre-trained image hyper-segmentation model, where the pre-trained image hyper-segmentation model includes a plurality of convolution modules; the pre-trained image hyper-segmentation model is used for extracting low-dimensional features of an image to be processed, converting the low-dimensional features through the plurality of convolution modules to obtain high-dimensional features, and determining a hyper-segmentation image of the image to be processed according to the high-dimensional features; equivalently converting the plurality of convolution modules in the pre-trained image hyper-segmentation model into folded convolution modules in the convolution unit 12 to obtain a converted image hyper-segmentation model serving as a current image hyper-segmentation model.

In one case, when equivalently converting the plurality of convolution modules in the pre-trained image hyper-resolution model into the folded convolution module, the equivalence conversion unit 14 is specifically configured to, if the plurality of convolution modules respectively perform convolution processing based on corresponding first convolution kernels on the low-dimensional features of the image to be processed, to obtain a plurality of corresponding first post-convolution features, superimpose the plurality of first post-convolution features, to obtain the high-dimensional features of the image to be processed, and obtain the size of the second convolution kernel according to the size of the first convolution kernel;

and determining the folded convolution module according to the size of the second convolution kernel. When the size of the second convolution kernel is obtained according to the size of the first convolution kernel, specifically, if the plurality of convolution modules are 4 convolution modules, the sizes of the first convolution kernels of the 4 convolution modules are respectively: 1 × 1, 1 × K, K × 1, and K × K, the size of the second convolution kernel of the folded convolution module is K × K, and K is a natural number greater than 1.

In another case, when equivalently converting the plurality of convolution modules in the pre-trained image hyper-segmentation model into the folded convolution modules, the equivalence conversion unit 14 is specifically configured to, if one convolution module of the plurality of convolution modules performs convolution processing based on a corresponding first convolution kernel on the low-dimensional feature, obtain a first convolved feature, input the first convolved feature into another convolution module of the plurality of convolution modules to perform convolution processing based on the corresponding first convolution kernel, obtain another first convolved feature, and input the another convolution module into another convolution module to perform convolution processing based on the corresponding first convolution kernel; taking the product of the first convolved features respectively obtained by the plurality of convolving modules as the high-dimensional feature of the image to be processed, and obtaining the size of the second convolving kernel according to the size of the first convolving kernel; and determining the folded convolution module according to the size of the second convolution kernel. When the size of the second convolution kernel is obtained according to the size of the first convolution kernel, specifically, the convolution modules are 3 convolution modules, and the sizes of the first convolution kernels of the 4 convolution modules are respectively: 1 × 1, K × K, and 1 × 1, the size of the second convolution kernel of the folded convolution module is K × K, and K is a natural number greater than 1.

In another case, when equivalently converting the plurality of convolution modules in the pre-trained image hyper-resolution model into the folded convolution modules, the equivalence conversion unit 14 is specifically configured to, if the plurality of convolution modules are convolution modules of a plurality of branches, perform convolution processing based on corresponding first convolution kernels on the low-dimensional features of the image to be processed by the convolution modules of the plurality of branches, respectively, to obtain a plurality of first convolved features, and superimpose the plurality of first convolved features to obtain the high-dimensional features of the image to be processed; the convolution module of a first branch of the convolution modules of the multiple branches comprises multiple first convolution sub-modules, wherein one first convolution sub-module of the convolution module of the first branch performs convolution processing based on a corresponding first convolution kernel on the low-dimensional feature of the image to be processed to obtain a first post-convolution sub-feature, then inputs the first post-convolution sub-feature into another first convolution sub-module of the multiple first convolution sub-modules to perform convolution processing based on a corresponding first convolution kernel, and then inputs the another first post-convolution sub-feature into another first convolution sub-module to perform convolution processing based on a corresponding first convolution kernel; taking the product of the first convolved sub-features obtained by the plurality of first convolving sub-modules as a first convolved feature obtained by the convolving module of the first branch, and obtaining the size of the second convolving kernel according to the size of the first convolving kernel; and determining the folded convolution module according to the size of the second convolution kernel. When the size of the second convolution kernel is obtained according to the size of the first convolution kernel, specifically, if the convolution modules of the multiple branches include convolution modules of 4 branches, the sizes of the first convolution kernels of the convolution modules of two certain second branches are: 1 × 1 and K × K, the convolution modules of the other two first branches respectively include 3 first convolution sub-modules which are connected in series and multiplied, and the sizes of first convolution kernels of the 3 first convolution sub-modules are 1 × 1, K × K and 1 × 1 respectively; the size of a second convolution kernel of the folded convolution module is K multiplied by K, and K is a natural number larger than 1.

Therefore, in the image super-resolution device of the embodiment, the folded convolution module performs the first convolution operation based on the second convolution kernel on the low-dimensional features in the image to be processed to obtain the high-dimensional features of the image to be processed, and then determines the super-resolution image of the image to be processed according to the high-dimensional features, so that the operation in the whole process is simplified, and the image super-resolution device can be widely applied to user terminals with limited resources.

The present invention further provides a terminal device, a schematic structural diagram of which is shown in fig. 10, where the terminal device may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 20 (e.g., one or more processors) and a memory 21, and one or more storage media 22 (e.g., one or more mass storage devices) storing the application programs 221 or the data 222. Wherein the memory 21 and the storage medium 22 may be a transient storage or a persistent storage. The program stored in the storage medium 22 may include one or more modules (not shown), each of which may include a series of instruction operations for the terminal device. Still further, the central processor 20 may be arranged to communicate with the storage medium 22, and to execute a series of instruction operations in the storage medium 22 on the terminal device.

Specifically, the application program 221 stored in the storage medium 22 includes an application program for image super-segmentation, and the program may include the image obtaining unit 10, the feature extracting unit 11, the convolution unit 12, the super-segmentation processing unit 13, and the equivalent transformation unit 14 in the above-described image super-segmentation apparatus, which will not be described in detail herein. Still further, the central processor 20 may be configured to communicate with the storage medium 22 to execute a series of operations corresponding to an application program for hyper-scoring of images stored in the storage medium 22 on the terminal device.

The terminal equipment may also include one or more power supplies 23, one or more wired or wireless network interfaces 24, one or more input-output interfaces 25, and/or one or more operating systems 223, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.

The steps executed by the image super-resolution device in the above-described method embodiment may be based on the structure of the terminal device shown in fig. 10.

In another aspect, an embodiment of the present invention further provides a computer-readable storage medium, which stores a plurality of computer programs, where the computer programs are suitable for being loaded by a processor and executing the image super-resolution method performed by the image super-resolution device.

the memory is used for storing a plurality of computer programs, and the computer programs are used for being loaded by the processor and executing the image super-dividing method executed by the image super-dividing device; the processor is configured to implement each of the plurality of computer programs.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the image super-resolution method provided in the various alternative implementations of the above embodiments.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disk, or the like.

The image super-resolution method, system, storage medium and terminal device provided by the embodiments of the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and implementation manner of the present invention, and the description of the above embodiments is only used to help understanding the method and core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An image super-resolution method, comprising:

acquiring an image to be processed;

2. The method according to claim 1, wherein the obtaining the high-dimensional features of the image to be processed by performing, by the folded convolution module, a convolution operation based on a second convolution kernel on the low-dimensional features specifically includes: inputting the low-dimensional features of the image to be processed into the folded convolution module to carry out convolution processing based on a second convolution kernel, so as to obtain the high-dimensional features of the image to be processed;

or inputting the low-dimensional features of the image to be processed into the folded convolution module to perform convolution processing based on a second convolution kernel to obtain second convolved features, and overlapping the second convolved features with the low-dimensional features of the image to be processed to obtain the high-dimensional features of the image to be processed.

3. The method of claim 2, wherein the method further comprises:

determining a pre-trained hyper-segmentation model of an image, the pre-trained hyper-segmentation model of the image comprising a plurality of convolution modules;

the pre-trained image hyper-segmentation model is used for extracting low-dimensional features of an image to be processed, converting the low-dimensional features through the plurality of convolution modules to obtain high-dimensional features, and determining a hyper-segmentation image of the image to be processed according to the high-dimensional features;

equivalently converting the plurality of convolution modules in the pre-trained image hyper-resolution model into the folded convolution modules to obtain a converted image hyper-resolution model serving as a current image hyper-resolution model.

4. The method according to claim 3, wherein equivalently converting the plurality of convolution modules in the pre-trained image hyper-segmentation model into the folded convolution module comprises:

if the convolution modules respectively carry out convolution processing based on corresponding first convolution kernels on the low-dimensional features of the image to be processed to obtain a plurality of corresponding first post-convolution features, the plurality of first post-convolution features are overlapped to obtain high-dimensional features of the image to be processed, and the size of the second convolution kernel is obtained according to the size of the first convolution kernel;

and determining the folded convolution module according to the size of the second convolution kernel.

5. The method as claimed in claim 4, wherein said deriving the size of the second convolution kernel based on the size of the first convolution kernel comprises:

if the plurality of convolution modules are 4 convolution modules, the sizes of the first convolution kernels of the 4 convolution modules are respectively as follows: 1 × 1, 1 × K, K × 1, and K × K, the size of the second convolution kernel of the folded convolution module is K × K, and K is a natural number greater than 1.

6. The method according to claim 3, wherein equivalently converting the plurality of convolution modules in the pre-trained image hyper-segmentation model into the folded convolution module comprises:

if one convolution module in the convolution modules performs convolution processing based on a corresponding first convolution kernel on the low-dimensional feature, the obtained first feature after convolution is input to another convolution module in the convolution modules to perform convolution processing based on the corresponding first convolution kernel, and the obtained another first feature after convolution is input to another convolution module to perform convolution processing based on the corresponding first convolution kernel; taking the product of the first convolved features respectively obtained by the plurality of convolving modules as the high-dimensional feature of the image to be processed, and obtaining the size of the second convolving kernel according to the size of the first convolving kernel;

7. The method of claim 6, wherein the deriving the size of the second convolution kernel based on the size of the first convolution kernel comprises:

the convolution modules are 3 convolution modules, and the sizes of the first convolution kernels of the 4 convolution modules are respectively as follows: 1 × 1, K × K, and 1 × 1, the size of the second convolution kernel of the folded convolution module is K × K, and K is a natural number greater than 1.

8. The method according to claim 3, wherein the equivalently transforming the plurality of convolution modules in the pre-trained hyper-segmentation model into the folded convolution modules comprises:

if the convolution modules are convolution modules of a plurality of branches, the convolution modules of the plurality of branches respectively perform convolution processing based on corresponding first convolution kernels on the low-dimensional features of the image to be processed to obtain a plurality of first post-convolution features, and the plurality of first post-convolution features are overlapped to obtain the high-dimensional features of the image to be processed;

the convolution module of a first branch of the convolution modules of the multiple branches comprises multiple first convolution sub-modules, wherein one first convolution sub-module of the convolution module of the first branch performs convolution processing based on a corresponding first convolution kernel on the low-dimensional feature of the image to be processed to obtain a first post-convolution sub-feature, then the first post-convolution sub-feature is input into another first convolution sub-module of the multiple first convolution sub-modules to perform convolution processing based on the corresponding first convolution kernel, and then another first post-convolution sub-feature is input into another first convolution sub-module to perform convolution processing based on the corresponding first convolution kernel; taking the product of the first convolved sub-features obtained by the plurality of first convolving sub-modules as a first convolved feature obtained by the convolving module of the first branch, and obtaining the size of the second convolving kernel according to the size of the first convolving kernel;

9. The method of claim 8, wherein the deriving the size of the second convolution kernel based on the size of the first convolution kernel comprises:

if the convolution modules of the plurality of branches include convolution modules of 4 branches, the sizes of the first convolution kernels of the convolution modules of some two second branches are respectively as follows: 1 × 1 and K × K, the convolution modules of the other two first branches respectively include 3 first convolution sub-modules which are connected in series and multiplied, and the sizes of first convolution kernels of the 3 first convolution sub-modules are 1 × 1, K × K and 1 × 1 respectively; the size of a second convolution kernel of the folded convolution module is K multiplied by K, and K is a natural number larger than 1.

10. An image super-resolution device, comprising:

the image acquisition unit is used for acquiring an image to be processed;

11. A computer-readable storage medium, characterized in that it stores a plurality of computer programs adapted to be loaded by a processor and to execute the method of image hyper-segmentation according to any one of claims 1 to 9.

12. A terminal device comprising a processor and a memory;

the memory is used for storing a plurality of computer programs for being loaded by the processor and executing the image hyper-segmentation method according to any one of claims 1 to 9; the processor is configured to implement each of the plurality of computer programs.