CN113837941A

CN113837941A - Training method and device for image hyper-resolution model and computer readable storage medium

Info

Publication number: CN113837941A
Application number: CN202111124188.7A
Authority: CN
Inventors: 梁彦军
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-09-24
Filing date: 2021-09-24
Publication date: 2021-12-24
Anticipated expiration: 2041-09-24
Also published as: CN113837941B

Abstract

The application provides a training method and a device of an image hyper-resolution model, comprising the following steps: the initial model comprises a super-resolution module consisting of a plurality of information distillation submodules which are sequentially connected end to end; the information distillation submodule comprises a plurality of residual error units which are connected end to end in sequence; inputting the low-resolution image into an initial model, and extracting shallow features of the low-resolution image; extracting deep features from the shallow features through a hyper-resolution module; performing up-sampling processing after the shallow feature and the deep feature are added and fused to obtain a super-resolution image output by an initial model; and training the initial model according to the loss function determined by the super-resolution image and the high-resolution image to obtain an image super-resolution model. According to the method and the device, refined features can be extracted by using an information multiple distillation structure based on a residual error unit, and shallow features are superposed with the shallow features, so that the problems of degradation and gradient generated when a model network is in a deep layer are solved, and the parameter quantity and the calculated quantity are greatly reduced.

Description

Training method and device for image hyper-resolution model and computer readable storage medium

Technical Field

The application belongs to the technical field of computers, and particularly relates to a training method and device of an image hyper-segmentation model, an image hyper-segmentation method and device and a computer readable storage medium.

Background

Functions such as special-effect cameras and property shooting have become standard configurations of video applications, but due to the limitation of equipment computing power, an end-to-end model only allows reasoning with a small picture resolution, so that a generated image is fuzzy, and therefore a hyper-segmentation algorithm is needed to improve the overall image effect.

In the related technology, a trained convolutional neural network model can be adopted to perform super-resolution processing on an original image so as to obtain a super-resolution image with higher definition.

However, in the current scheme, the calculated amount, the parameter amount and the memory read-write amount of the convolutional neural network model are large, and the deployment of equipment with weak calculation power is difficult, so that the response time is difficult to meet the requirement of real-time performance.

Disclosure of Invention

In view of this, the present application provides a training method and apparatus for an image hyper-resolution model, and a computer-readable storage medium, which solve the problem that in the current scheme, the calculated amount, the parameter amount, and the memory read-write amount of the convolutional neural network model are large, and deployment of devices with weak own calculation power is difficult, so that response time is difficult to meet the requirement of real-time performance.

According to a first aspect of the present application, there is provided a training method of an image hyper-segmentation model, the method comprising:

acquiring a low-resolution image and a high-resolution image with the same content and an initial model, wherein the initial model comprises a super-resolution module consisting of a plurality of information distillation sub-modules which are sequentially connected end to end; the information distillation submodule comprises a plurality of residual error units which are connected end to end in sequence;

inputting the low-resolution image into the initial model, and extracting shallow features of the low-resolution image;

extracting, by the hyper-segmentation module, deep features from the shallow features; the residual error unit is used for reserving part of fine features of the output features and inputting the remaining coarse features into the next residual error unit, the fine features reserved by all the residual error units of the information distillation submodule are combined and then are superposed with the shallow feature to obtain the output features of the information distillation submodule, and the output feature of the last information distillation submodule is the deep feature;

performing up-sampling processing after the shallow feature and the deep feature are added and fused to obtain a super-resolution image output by the initial model;

and training an initial model according to a loss function determined by the super-resolution image and the high-resolution image to obtain an image super-resolution model.

According to a second aspect of the present application, there is provided an image super-resolution method, the method comprising:

inputting an image to be processed into an image hyper-resolution model to obtain a super-resolution image output by the image hyper-resolution model;

the image hyper-score model is obtained by training the training method of the image hyper-score model.

According to a third aspect of the present application, there is provided an apparatus for training an image hyper-segmentation model, the apparatus comprising:

the system comprises an acquisition module and an initial model, wherein the acquisition module is used for acquiring a low-resolution image and a high-resolution image with the same content, and the initial model comprises a super-resolution module consisting of a plurality of information distillation sub-modules which are sequentially connected end to end; the information distillation submodule comprises a plurality of residual error units which are connected end to end in sequence;

the first extraction module is used for inputting the low-resolution image into the initial model and extracting shallow features of the low-resolution image;

the second extraction module is used for extracting deep features from the shallow features through the super-separation module; the residual error unit is used for reserving part of fine features of the output features and inputting the remaining coarse features into the next residual error unit, the fine features reserved by all the residual error units of the information distillation submodule are combined and then are superposed with the shallow feature to obtain the output features of the information distillation submodule, and the output feature of the last information distillation submodule is the deep feature;

the up-sampling module is used for carrying out up-sampling processing after the shallow feature and the deep feature are added and fused to obtain a super-resolution image output by the initial model;

and the training module is used for training the initial model according to the loss function determined by the super-resolution image and the high-resolution image to obtain an image hyper-resolution model.

According to a fourth aspect of the present application, there is provided an image super-resolution apparatus, the apparatus comprising:

the super-resolution module is used for inputting the image to be processed into an image super-resolution model to obtain a super-resolution image output by the image super-resolution model;

the image hyper-score model is obtained by training of a training device of the image hyper-score model.

In a fifth aspect, the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores thereon a computer program, and when executed by a processor, the computer program implements the steps of the training method for the image hyper-segmentation model according to the first aspect.

Aiming at the prior art, the method has the following advantages:

the application provides a training method of an image hyper-resolution model, which can design the structure of the image hyper-resolution model as follows: the super-resolution module is composed of a plurality of information distillation sub-modules connected end to end in sequence, and each information distillation sub-module comprises a plurality of residual error units connected end to end in sequence.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart illustrating steps of a training method for an image hyper-segmentation model according to an embodiment of the present disclosure;

FIG. 2 is an architecture diagram of an image hyper-segmentation model provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an information distillation submodule provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a hyper-separation module according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating steps of an image super-resolution method according to an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating specific steps of a training method for an image hyper-segmentation model according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a basic residual unit according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a lightweight residual error unit provided in an embodiment of the present application;

FIG. 9 is a block diagram of a training apparatus for an image hyper-segmentation model according to an embodiment of the present disclosure;

fig. 10 is a block diagram of an image super-resolution device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a flowchart illustrating steps of a training method for an image hyper-segmentation model according to an embodiment of the present application, where as shown in fig. 1, the method may include:

step 101, acquiring a low-resolution image and a high-resolution image with the same content, and an initial model.

The initial model comprises a super-resolution module consisting of a plurality of information distillation submodules which are sequentially connected end to end; the information distillation submodule comprises a plurality of residual error units which are connected end to end in sequence.

In the embodiment of the application, based on the calculation force limit of the mobile terminal equipment, the requirement of the image hyper-resolution model deployed on the mobile terminal is to keep the lightweight of the model as much as possible on the premise of meeting the basic output precision. Therefore, how to design the lightweight image hyper-resolution model is a main problem to be solved by the embodiments of the present application.

In order to achieve the above purpose, in the embodiments of the present application, firstly, a lightweight framework of an initial model needs to be designed, and a low-resolution image and a high-resolution image with the same content are obtained to form a training data pair, so that the initial model is trained through the training data pair subsequently to obtain an image hyper-score model.

Referring to fig. 2, an architecture diagram of an image hyper-segmentation model provided in an embodiment of the present application is shown, where an initial model is an image hyper-segmentation model before training, the initial model and the image hyper-segmentation model have the same architecture, a main architecture of the image hyper-segmentation model includes a first convolution module, a hyper-segmentation module, a second convolution module, and an upsampling module, the first convolution module may extract a shallow feature of a low-resolution image through convolution operation, the hyper-segmentation module is configured to perform further feature extraction on the shallow feature to obtain a deep feature, the second convolution module is configured to reconstruct the deep feature through convolution processing to convert the deep feature into a direction of a high-frequency image feature, the upsampling module is configured to fuse the shallow feature and the deep feature to obtain a feature in which an edge semantic and a deep semantic of the low-resolution image are fused, and then, the super-resolution image which ensures the integrity and the accuracy of the content and improves the definition and the resolution ratio can be obtained by up-sampling based on the characteristics.

In order to realize the super-resolution of the image through the processing of the model, the model has the requirement that the model can firstly extract the accurate deep features of the low-resolution image, and the deep features express the deep semantic information of the low-resolution image, so that in the subsequent upsampling process of the model, the model can be subjected to upsampling reconstruction based on the fusion features of the deep features and the shallow features, and the super-resolution image with high definition and high resolution is obtained.

The key point of the structural design of the image hyper-resolution model in the embodiment of the application is the lightweight design of the hyper-resolution module for extracting the deep features of the low-resolution image, namely, on the premise of ensuring the accuracy of the extracted deep features, the structural lightweight of the hyper-resolution module is further realized, so that the image hyper-resolution model has smaller storage requirement and faster response speed, and the deployment difficulty of the image hyper-resolution model on low-computing-power equipment is reduced.

Specifically, the specific architecture of the super-resolution module according to the embodiment of the present application may introduce an idea of an Information multiple-Distillation Network (IMDN), and the whole super-resolution module architecture under the idea may be represented as an Information multiple-Distillation structure based on residual error units, and referring to fig. 3, it shows a schematic structural diagram of an Information Distillation sub-module provided in the embodiment of the present application, and the embodiment of the present application may construct an Information Distillation sub-module including k residual error units sequentially connected end to end, after a shallow layer feature is input, a part of useful fine features of the residual error units in the Information Distillation sub-module are segmented each time, and the rest coarse features are continuously transmitted to a next residual error unit, so as to further extract fine features through convolution (here, the meaning of multiple Distillation is embodied), and after all residual error units are operated, fine features extracted from each residual error unit are connected (Concat), and finally, performing dimension reduction by utilizing 1 x 1 convolution to adjust the number of channels to meet the requirement to obtain the output characteristics of the information distillation submodule, and performing segmentation extraction for multiple times to enable the output characteristics to contain the characteristics deeper than the shallow characteristics.

Further, in order to meet the requirement of high fineness of the depth features, referring to fig. 4, a schematic structural diagram of a super-resolution module provided by the embodiment of the present application is shown, and in the super-resolution module, a structure in which a plurality of information distillation sub-modules are sequentially connected end to end is further provided, that is, through a multi-round multiple distillation process, the fineness of the deep features is further improved to meet the requirement of fineness.

In the embodiment of the present application, it can be considered that when a network of a neural network is deeper, extracted features include more information of different levels, and more combinations of level information among different levels are provided, so that the extracted features are deep features, which also results in that when the network is deeper, a parameter amount is larger, more calculation resources are required, and the main problems encountered by deep learning for the depth of the network are gradient disappearance and gradient explosion, and a conventional corresponding solution is initialization and regularization of data, but although the problem of gradient is solved and the depth is further deepened, another problem is brought about, that is, a degradation problem of network performance is solved, so that an error rate is increased, and an information multiple distillation structure based on a residual unit can learn a residual between input and output by adding an identity map between the input and the output, the problem of difficulty in training can be solved, the problems of degradation and gradient are solved, a residual error structure for multi-information distillation is formed by a plurality of information distillation sub-modules and a plurality of residual error units included by the information distillation sub-modules, multi-distillation of shallow features can be fully utilized to extract deep features with high accuracy, and the shallow features are superposed with the shallow features, so that the problems of degradation and gradient generated when a model network is in a deep layer are solved, the model can more efficiently utilize the shallow visual information, and the quantity of parameters and calculated quantity of the lightweight model are greatly reduced while the accuracy is kept.

Optionally, step 101 may specifically include:

substep 1011, a high resolution image is acquired.

Sub-step 1012, down-sampling the high resolution image by a preset image scaling algorithm to obtain the low resolution image.

In the embodiment of the application, the low-resolution image and the high-resolution image are obtained by specifically obtaining the high-resolution image in a preset high-definition image library, and then down-sampling the high-resolution image through an image scaling algorithm to obtain the corresponding low-resolution image, so that the low-resolution image and the high-resolution image with the same content are obtained on the basis of different definitions. In addition, the content contained in the low-resolution image and the high-resolution image may be a specific object (person, animal, object, etc.), for example, the content may be a specific face of the person, in which case, a face region of the image may be extracted from the high-definition face image database as the high-resolution image, and then the low-resolution image may be obtained by down-sampling the high-resolution image.

And 102, inputting the low-resolution image into the initial model, and extracting shallow features of the low-resolution image.

In the embodiment of the present application, based on the framework of the initial model provided in step 101, the initial model may be specifically trained by using a training data pair composed of a low resolution image and a high resolution image having the same content.

In this step, referring to fig. 2, the low-resolution image input into the initial model may be processed by a first convolution module, where the first convolution module performs feature extraction on the low-resolution image through convolution operation to extract shallow features of the low-resolution image, that is, shallow features F1 ═ F1(Ilr), where F1 is the first convolution module, the convolution kernel size of the first convolution module F1 is 3 × 3, and the number of channels is c; ilr is a low resolution image. The shallow feature F1 is a shallow feature directly extracted from the low-resolution image by a convolution operation, and generally contains semantic information of an edge in the low-resolution image.

And 103, extracting deep features from the shallow features through the hyper-segmentation module.

The residual error unit is used for reserving part of fine features of the output features and inputting the remaining coarse features into the next residual error unit, the fine features reserved by all the residual error units of the information distillation submodule are combined and then are superposed with the shallow feature to obtain the output features of the information distillation submodule, and the output feature of the last information distillation submodule is the deep feature.

In this step, referring to fig. 2, the shallow feature output by the first convolution module may be further input into a super-divide module composed of a plurality of information distillation submodules sequentially connected end to end as shown in fig. 4, so that the shallow feature first enters the information distillation submodule 1 to be processed, and the output feature 1 of the information distillation submodule 1 is obtained, and the output feature 1 is further input into the information distillation submodule 2 to be processed, and the output feature n of the nth information distillation submodule, which is obtained as the output feature 2 … of the information distillation submodule 2, may be used as a deep feature extracted from the shallow feature. The mathematical representation of the process is: characteristics output by the ith information multiple distillation module

Comprises the following steps:

wherein i ∈ {1,2,3,..., n }, and

in the case of the shallow feature F1,

is the ith information distillation submodule.

Further, referring to fig. 3, a structure of an information distillation sub-module is shown, that is, an information distillation sub-module includes a plurality of residual error units connected end to end in sequence, after the shallow layer features are input, the residual error units segment a part of useful fine features each time, and the rest coarse features are continuously transmitted to the next residual error unit, so as to perform further fine features again through convolutionAfter all residual error units are operated, connecting the extracted fine features of each residual error unit, finally utilizing 1 multiplied by 1 convolution dimensionality reduction to adjust the number of channels to meet the requirement, obtaining the output features of the information distillation submodule, and enabling the output features to contain the features deeper than the shallow features through multiple segmentation extraction. The mathematical representation of the process is: the fine feature can be obtained after the features extracted by the jth residual error unit are segmented by channels

And roughness features

Such that:

wherein j belongs to {1,2, 3., k-1}, the output characteristic of the last residual error unit does not need to be segmented, namely the fine characteristic retained by the last residual error unit is the output characteristic of the residual error unit, and the channel number of the output characteristic channel is

Namely:

wherein the content of the first and second substances,

distilling the output characteristics of the submodule for the previous message

For the jth residual unit, Split_jA split operation is performed for the jth channel.

The remaining fine features can then be merged together by a feature channel join operation and reduced by a1 x 1 convolution with the channel dimension, i.e.:

wherein F^distilledIndicating the fine features retained after distillation of the information, f^conRepresenting a1 × 1 convolution operation and Concat represents a characteristic channel join operation.

In the embodiment of the application, the deeper the network of the neural network is, the higher the probability of the degradation problem of the network performance is, so that the error rate is increased, and the information multiple distillation structure based on the residual error unit can fully utilize the multiple distillation of the shallow features to extract the deep features, so that the degradation problem is solved, and meanwhile, the gradient problem is also solved, so that on the basis of improving the network performance, the calculation amount is greatly reduced, the training difficulty is simplified, the storage and calculation force requirements of the image hyper-resolution model are reduced, and the purpose of light weight is achieved.

And 104, performing up-sampling processing after the shallow feature and the deep feature are added and fused to obtain a super-resolution image output by the initial model.

In this step, referring to fig. 2, the deep features output by the super-resolution module may be reconstructed by convolution processing of the second convolution module to convert the depth features into the direction of the high-frequency image features, and then the reconstructed features may be input to the upsampling module, and the upsampling module may perform upsampling processing after the shallow features and the deep features are added and fused to obtain the super-resolution image output by the initial model. The mathematical representation of the process is:

super-resolution image I_sR＝f_up(F₁+F₃) Wherein f is_upFor up-sampling module, F₃The reconstructed features output by the second convolution module.

Specifically, the upsampling module may be composed of a convolution layer with a convolution kernel size of 3 × 3 and a pixel shuffling (Pixelshuffle) operation layer. The 3 × 3 convolutional layer is used for changing the number of channels of the input features from c to 3s²So that its output has a characteristic size of H_LR×W_LR×3s²，H_LR、W_LRThe height and the width of the low-resolution image are respectively, s is a resolution multiple between the low-resolution image and the high-resolution image, and the subsequent Pixelshuffle operation can obtain the super-resolution image by a periodic screening (periodic blurring) method, so that the size of the output super-resolution image is sH_LR×sW_LR×3。

Furthermore, the function of Pixelshuffle operation is to change an H × W low-resolution image into an rH × rW high-resolution image through Sub-pixel operation, and the implementation process does not directly generate the high-resolution image through interpolation or the like, but obtains r first through convolution²The feature map of each channel (the size of the feature map is consistent with that of the input low-resolution image) is obtained by a periodic screening method to obtain the high-resolution image, wherein r is an up-sampling factor (up scaling factor), namely the magnification of the image.

And 105, training an initial model according to a loss function determined by the super-resolution image and the high-resolution image to obtain an image hyper-resolution model.

In the embodiment of the application, in the training process, a super-resolution image I which is correspondingly output is obtained according to the input of the low-resolution image of the initial model_sRThen the high resolution image I corresponding to the low resolution image_HRCan be used as a calibration value to be compared with the super-resolution image I_sRThe mean absolute error (loss value) is calculated and the loss function is determined from the mean absolute error of the two:

adjusting parameters of the initial model through a loss function until the initial model reaches a training target after multiple times of input and output and parameter adjustment, and taking the initial model as an image hyper-resolution modelThe rows are used.

In summary, in the training method of the image hyper-segmentation model provided in the embodiment of the present application, the structure of the image hyper-segmentation model can be designed as follows: the super-resolution module is composed of a plurality of information distillation sub-modules connected end to end in sequence, and each information distillation sub-module comprises a plurality of residual error units connected end to end in sequence.

Fig. 5 is a flowchart of steps of an image super-resolution method provided in an embodiment of the present application, and as shown in fig. 5, the method may include:

step 201, inputting an image to be processed into an image hyper-resolution model to obtain a super-resolution image output by the image hyper-resolution model.

The image hyper-score model is obtained by training the training method of the image hyper-score model shown in fig. 1.

In the embodiment of the application, because the image hyper-resolution model is based on the information multiple distillation structure of the residual error unit, the purpose of lightening the image hyper-resolution model is achieved by further solving the problems of degradation and gradient generated when the model network is in a deep layer on the basis of ensuring that the extracted deep layer features have higher accuracy, and reducing the parameter and the calculated amount of the model, so that the image hyper-resolution model can take the image to be processed as input and the super-resolution image after the hyper-resolution as output in a model reasoning link, and the resolution and the definition of the image to be processed are improved. In addition, the light weight of the model is realized by the image hyper-segmentation model, so that the image hyper-segmentation model has smaller storage requirement and faster response speed, and the deployment difficulty of the image hyper-segmentation model on low-computing-power equipment is reduced.

Optionally, the image hyper-segmentation model comprises a target weight model and a target light weight model; step 201 may specifically include:

in the substep 2011, when the calculation power of the current device is greater than a preset calculation power threshold, inputting the image to be processed into a target weight model to obtain a super-resolution image output by the target weight model;

and a substep 2012, when the computing power of the current device is less than or equal to a preset computing power threshold, inputting the image to be processed into the target lightweight model to obtain a super-resolution image output by the target lightweight model.

In the embodiment of the application, different devices have different computing power bases, so that for a high computing power device and a low computing power device, a target weight model and a target lightweight model can be respectively set in the embodiment of the application, wherein the target weight model has higher computing power requirement and higher output precision, so that the target weight model can be deployed in the high computing power device for use; on the basis of ensuring certain output accuracy, the target lightweight model has a lightweight model structure, and the computational power requirement on deployed equipment is low, so that the target lightweight model can be deployed in low-computational-power equipment for use.

Specifically, the target weight model and the target lightweight model may be based on the residual unit-based information multi-distillation structure shown in fig. 2,3, and 4, and the target weight model and the target lightweight model may be determined by the number of information distillation sub-modules included in each of them, and the number and structure of residual units included in the information distillation sub-modules. The first number of the information distillation submodules included in the target weight model is greater than or equal to the second number of the information distillation submodules included in the target weight model, the number of residual error units included in the information distillation submodules of the target weight model is greater than or equal to the number of residual error units included in the information distillation submodules of the target weight model.

For example, in one implementation, for the target weight model, the number of the information distillation sub-modules included in the target weight model may be 6, the number of residual error units included in each information distillation sub-module is 6, the number of the basic channels is 64, and the residual error units may be basic residual error units; for the target lightweight model, the number of the information distillation sub-modules included in the target lightweight model can be 6, the number of residual error units included in each information distillation sub-module is 4, the number of the base channels is 32, and the residual error units can be light-weight residual error units with a depth separable convolution structure.

In addition, the distinction between the high-calculation-force equipment and the low-calculation-force equipment can be determined by judging whether the calculation force of the equipment is greater than a preset calculation force threshold value, for example, when the calculation force of the current equipment is greater than the preset calculation force threshold value, the current equipment is taken as the high-calculation-force equipment; when the calculation power of the current equipment is smaller than or equal to a preset calculation power threshold value, the current equipment is used as low calculation power equipment, and the preset calculation power threshold value can be set according to actual requirements.

Fig. 6 is a flowchart illustrating specific steps of a training method for an image hyper-segmentation model according to an embodiment of the present application, and as shown in fig. 6, the method may include:

step 301, acquiring a low resolution image and a high resolution image with the same content, and an initial model.

This step can refer to step 101 described above, and is not described here.

Step 302, inputting the low-resolution image into the initial model, and extracting shallow features of the low-resolution image.

This step can refer to step 102, which is not described herein.

And 303, extracting deep features from the shallow features through the hyper-segmentation module.

This step can refer to step 102, which is not described herein.

Optionally, step 303 may specifically include:

and a substep 3031 of performing convolution processing on the input features through a residual error unit of the super-resolution module to obtain a first convolution result.

And a substep 3032 of superposing the first convolution result with the input characteristic to obtain an output characteristic of the residual error unit.

In the embodiment of the application, the specific operation process of the residual error unit-based information multiple distillation structure in the super-separation module is as follows: each residual error unit calculates input features (the input features of the first residual error unit of the first information distillation submodule are shallow features) through convolution processing, and therefore deeper features of the input features are extracted.

Specifically, referring to fig. 7, which shows a schematic structural diagram of a basic residual error unit provided in the embodiment of the present application, a basic residual error unit may be formed by sequentially connecting a convolution layer 11 having a convolution kernel with a size of 3 × 3, an activation function (ReLU) layer 12, and a convolution layer 13 having a convolution kernel with a size of 3 × 3, so that an input feature of the residual error unit is sequentially subjected to convolution, activation, and convolution processing to obtain a first convolution result, and then the first convolution result is superimposed with the input feature to obtain an output feature of the basic residual error unit, where the output feature is a deeper feature relative to a shallow feature and includes more core semantic information.

Substep 3033, preserving part of the fine features of the output features, and inputting the remaining coarse features into a next residual unit; wherein the fine feature retained by the residual unit at the end is the output feature of the residual unit.

In the embodiment of the application, the residual error unit can extract deep features through convolution operation, and after the features are extracted, the extracted features can be further segmented through a Split operator, part of fine features of output features obtained after segmentation are reserved, and the remaining coarse features are input into the next residual error unit; the extracted fine features can be features of a front (1/4) c channel, the purpose of extracting the fine features of the front (1/4) c channel can be achieved through multiple training iterations, and features of a rear (3/4) c channel of output features of the residual error unit can be input into a next residual error unit as rough features to perform deeper extraction. The residual error unit at the end can cut the output characteristics again, that is, the fine characteristics retained by the residual error unit at the end are the output characteristics of the residual error unit at the end.

And a substep 3034 of combining the fine features retained by all residual error units of the information distillation submodule and then superposing the combined fine features with the shallow feature to obtain the output feature of the information distillation submodule, and taking the output feature of the last information distillation submodule as the deep feature.

In this step, after the fine features retained by all residual units of one information distillation submodule are combined, the output features of the information distillation submodule can be obtained, that is, the information distillation submodule extracts and combines the fine features of the shallow features through an information multiple distillation structure based on the residual units, so that information distillation is realized. In particular, the combination of the fine features retained by all residual units can be realized by a channel connection operation.

Optionally, the residual error unit is a lightweight residual error unit with a depth separable convolution structure; the lightweight residual unit includes: the first convolution layer, the first active layer, the second convolution layer, the second active layer and the third convolution layer are connected end to end in sequence, and the convolution kernel size of the first convolution layer and the convolution kernel size of the third convolution layer are 1 x 1; the convolution kernel size of the second convolution layer is 3 × 3.

In the embodiment of the present application, relative to the structure diagram of the basic residual error unit shown in fig. 7, the embodiment of the present application further provides a lightweight residual error unit with a depth separable convolution structure, so that the lightweight residual error unit has smaller requirements on storage and computation power, and can better meet the requirement for lightweight deployment.

Referring to fig. 8, which shows a schematic structural diagram of a lightweight residual error unit provided in this embodiment of the present application, a lightweight residual error unit may be formed by sequentially connecting a convolution layer 21 with a convolution kernel having a size of 1 × 1, an activation function layer 22, a depth convolution layer 23 with a convolution kernel having a size of 3 × 3, an activation function layer 24, and a convolution layer 25 with a convolution kernel having a size of 1 × 1, so that an input feature of the residual error unit is sequentially subjected to convolution, activation, deep convolution, activation, and convolution processing to obtain a processing result, and then the processing result is superimposed with the input feature to obtain an output feature of the lightweight residual error unit, where the output feature is a deep feature of a relatively shallow feature and contains more core semantic information.

Based on the design idea of light weight, the embodiment of the present application performs further light weight processing on the structure of the basic residual error unit shown in fig. 7 to obtain the structure of the lightweight residual error unit shown in fig. 8, and the optimization of light weight is to replace two convolution layers of 3 × 3 size in the basic residual error unit with convolution layers of 1 × 1 size + convolution layers of 3 × 3 size + convolution layers of 1 × 1 size in the lightweight residual error unit, and a convolution layer of 3 × 3 size in the middle of the structure of the lightweight residual error unit can firstly reduce the calculation amount by dimension reduction under one convolution layer of 1 × 1 size, and then perform channel reduction under another convolution layer of 1 × 1 size, so as to maintain the precision and reduce the calculation amount, thereby achieving the purpose of light weight reduction.

It should be noted that, for the lightweight residual unit and the basic residual unit, a Normalization operation is performed after the convolution processing, and it is shown that a Batch Normalization (Batch Normalization) process causes artifacts in the super-resolution image and limits the generalization capability of the model, and a Weight Normalization (Weight Normalization) process can bring about a better super-resolution effect. Therefore, the normalization processing layers in the two residual units are set as weight normalization processing layers, so that the probability of causing artifacts in the super-resolution image is reduced, and the generalization capability of the model is improved.

Optionally, step 303 may specifically include:

and a substep 3035 of combining the fine features retained by all residual error units of the information distillation submodule to obtain combined features.

And a substep 3036 of performing weighted fusion on the merged feature and the shallow feature to obtain an output feature of the information distillation submodule.

The respective weights of the merged feature and the shallow feature are adaptive weights respectively, and the initial value is 1.

In the present embodiment, the fine feature merging operation retained by all residual units of the information distillation submodule may be performed by a channel connection operation Concat, i.e.

The fine features retained for all residual units of the information distillation sub-module are combined to give a combined feature.

Specifically, in order to fully utilize the shallow feature and suppress the problems of gradient extinction and gradient explosion during deep network training, the embodiment of the present application may connect the shallow feature F in a self-adaptive weighted jump manner1, sending the data into each information distillation submodule, and performing weighted fusion with the output characteristics after information distillation, wherein the weighted fusion can be specifically expressed as:

wherein the content of the first and second substances,

weighting the fused output features for the information distillation submodules, F^distilledFor the merged feature, λ is the adaptive weight of the shallow feature F1, μ is the merged feature F^distilledAdaptive weights of (2). The initial value of the self-adaptive weight is 1, and the self-adaptive learning can be continuously carried out in the training iterative process until the value of the self-adaptive weight is optimal after the training is finished.

Optionally, after the sub-step 3036, the method further includes:

substep 3037, inputting the merged features into the convolution layer with convolution kernel size of 1 × 1 to obtain a second convolution result.

And a substep 3038 of taking the weighted fusion result of the second convolution result and the shallow feature as the output feature of the information distillation submodule.

And the weights of the second convolution result and the shallow feature are self-adaptive weights respectively, and the initial value is 1.

In this embodiment, referring to fig. 3, before performing weighted fusion, the merged feature may be input to a convolutional layer with a convolutional kernel size of 1 × 1 to perform channel dimension reduction of the merged feature, so as to obtain a second convolution result, and then the second convolution result and the weighted fusion result of the shallow feature are used as the output feature of the information distillation submodule.

For example, in fig. 3, if the number of channels of the features of the input residual unit 1 is c, then the number of channels of the fine features retained after each slicing of the residual unit is (1/4) c, so that the number of channels of the merged features obtained after merging the fine features retained by k residual units is (1/4) kc, and in order to make the number of channels of the features output by the final information distillation submodule also be c, then the convolutional layer with the convolutional kernel size of 1 × 1, which is input to the merged features, may be subjected to channel dimension reduction of the merged features, so that the number of channels of the merged features after reduction is c.

In addition, for the other residual units except for residual unit 1 and residual unit k in fig. 3, the first processing layer included in the residual units may be a convolution layer with a convolution kernel size of 1 × 1, and the purpose of this is to align the number of channels (3/4) c of the input features to the number of channels c.

And step 304, extracting the features of the depth features to reconstruct the features to obtain reconstructed features.

Referring to fig. 2, the deep features output by the super-segmentation module may be reconstructed by the convolution process of the second convolution module to convert the deep features into the high-frequency image features,

and 305, performing upsampling processing after the shallow feature and the reconstruction feature are added and fused to obtain a super-resolution image output by the initial model.

After performing feature reconstruction with reference to fig. 2, in the embodiment of the present application, the reconstructed features may be input to an upsampling module, and the upsampling module may perform upsampling processing after summing and fusing the shallow features and the deep features, so as to obtain a super-resolution image output by an initial model.

And step 306, training the initial model according to the loss function determined by the super-resolution image and the high-resolution image to obtain an image hyper-resolution model.

This step can refer to step 105, which is not described herein.

Optionally, the initial model includes: a weight model having a first number of first information distillation sub-modules, and a weight model having a second number of second information distillation sub-modules; the image hyper-resolution model comprises a target weight model and a target light weight model; step 306 may specifically include:

substep 3061, training the weight model using a first loss function determined by the high resolution image and the super-resolution image output by the weight model, to obtain a target weight model.

Optionally, the first number is greater than or equal to the second number, and the number of residual error units included in the first information distillation sub-module is greater than or equal to the number of residual error units included in the second information distillation sub-module.

Optionally, a residual error unit contained in the first information distillation submodule is a basic residual error unit; the second information distillation submodule comprises a light-weight residual unit with a depth separable convolution structure, wherein the light-weight residual unit comprises: the first convolution layer, the first active layer, the second convolution layer, the second active layer and the third convolution layer are connected end to end in sequence, and the convolution kernel size of the first convolution layer and the convolution kernel size of the third convolution layer are 1 x 1; the convolution kernel size of the second convolution layer is 3 × 3; the base residual unit includes: the convolution kernel size of the fourth convolution layer and the convolution kernel size of the fifth convolution layer are 3 x 3.

Specifically, for the target weight model, since the calculation power of the applicable device is high, the basic residual unit shown in fig. 7 may be specifically used as the residual unit in the target weight model, so as to improve the output accuracy. For the target lightweight model, because the applicable equipment of the model needs lightweight design, the residual error unit in the target lightweight model can specifically adopt the lightweight residual error unit shown in fig. 8, so that the storage and calculation requirements of the model are reduced, and the lightweight deployment requirement can be further met.

In this step, since the calculation power of the target weight model application device is high, the weight model can be trained by directly using the first loss function determined by the high-resolution image and the super-resolution image output by the weight model, and the target weight model is obtained. Wherein the first loss function

||I_sR-I_HR| | is super-resolution image I_sRAnd high resolution image I_HRAverage absolute error between.

And a substep 3062 of training the lightweight model by using the target weight model to obtain the target lightweight model.

In the embodiment of the present application, since the target lightweight model is suitable for a low-power mobile terminal device which is widely used at present, it is necessary to maintain a high training effect in consideration of the training weight reduction. Therefore, the embodiment of the present application can further perform knowledge distillation training on the lightweight model using the target weight model after completing the training of the target weight model, thereby propagating useful information from the target weight model to the lightweight model.

Optionally, the substep 3062 may specifically include:

and a substep a1, during the process of inputting the low-resolution image into the target weight model for processing, extracting output features of first information distillation sub-modules corresponding to each first level according to a plurality of preset first levels, and calculating visual information of the first level according to the output features corresponding to the first levels and the number of channels of the output features, wherein the visual information is used for representing feature depths of the output features.

A substep a2 of, during the process of inputting the low-resolution image into the lightweight model, extracting output features of second information distillation submodules corresponding to each of a plurality of preset second levels, and calculating visual information of the second levels based on the output features corresponding to the second levels and the number of channels of the output features; the first levels correspond to the second levels one to one.

And a substep a3 of training parameters of the lightweight model according to the visual information of the first level, the visual information of the second level and a loss function determined by the high-resolution image and the super-resolution image output by the target weight model, so as to obtain the target lightweight model.

In the embodiment of the present application, knowledge distillation training of the lightweight model is further performed through the trained target weight model, which requires statistics of corresponding features of the target weight model and the lightweight model and constrains similarity of the statistics.

Assuming that three levels of low, medium, and high are set, the 3 first levels may include: a first information distillation submodule (low level), a fourth first information distillation submodule (middle level), and a sixth first information distillation submodule (high level) of the target weight model; the 3 second hierarchies may include: the first second information distillation submodule (low level), the fourth second information distillation submodule (middle level) and the sixth second information distillation submodule (high level) of the lightweight model are arranged, so that the first level corresponds to the second level one by one.

In the process of respectively inputting the low-resolution images into the target weight model and the light weight model, the output characteristic a of a first information distillation submodule (low level), the output characteristic b of a fourth first information distillation submodule (middle level) and the output characteristic c of a sixth first information distillation submodule (high level) in the target weight model can be sequentially collected; and acquiring the output characteristic d of a first second information distillation submodule (a low level), the output characteristic e of a fourth second information distillation submodule (a middle level) and the output characteristic f of a sixth second information distillation submodule (a high level) in the lightweight model.

Based on the collected output characteristics of the corresponding hierarchy, the visual information corresponding to the hierarchy, specifically the visual information, can be calculated

And F is the output characteristic of the current information distillation submodule, and C is the channel number of the output characteristic. The lower the level of the visual information is, the more marginal the visual information covers; the higher the level of the visual information is, the more core semantic information is covered by the visual information.

For the above method for calculating visual information, in this example, the visual information corresponding to 3 first levels can be obtained: visual information a (low level), visual information B (medium level), visual information C (high level); and 3 visual information corresponding to the second levels: visual information D (low level), visual information E (medium level), visual information F (high level).

And finally, training parameters of the lightweight model based on the average absolute error between the visual information of each first level and the visual information of the corresponding second level and a loss function determined by the high-resolution image and the super-resolution image output by the target weight model to obtain the target lightweight model.

Optionally, the sub-step a3 may specifically include:

sub-step a31, calculating the difference between the visual information of the same first level and second level.

Substep a32, adding all the differences to a loss function determined by the high resolution image and the super resolution image output by the target weight model to obtain a second loss function.

And a substep A33, training parameters of the lightweight model through the second loss function, and obtaining the target lightweight model.

In the embodiment of the present application, for the above example, after obtaining the visual information corresponding to 3 first levels: visual information a (low level), visual information B (medium level), visual information C (high level); and 3 visual information corresponding to the second levels: second loss function of lightweight model finally determined after visual information D (low level), visual information E (middle level), and visual information F (high level)

λ₁，λ₂,，λ₃The balance coefficient of the loss function is a preset value.

Is visual information A;

is visual information D;

in order to be the visual information B,

is visual information E;

in order to be the visual information C,

is the visual information F.

Finally, a second loss function may be utilized

And only training the parameters of the light weight model to obtain the target light weight model.

To sum up, according to the training method of the image hyper-segmentation model provided by the embodiment of the application, the structure of the image hyper-segmentation model can be designed as follows: the super-resolution module is composed of a plurality of information distillation sub-modules which are sequentially connected end to end, the information distillation sub-modules comprise a plurality of residual error units which are sequentially connected end to end, an identity mapping can be added between input and output to learn residual errors between the input and the output according to the information multiple distillation structure based on the residual error units, refined features are extracted through the information multiple distillation structure based on the residual error units and are superposed with shallow features, the shallow features are input into each information distillation sub-module in a self-adaptive weighting fusion mode, so that the model can more efficiently utilize low-level visual information, the problems of gradient disappearance and gradient explosion are solved, the model can more efficiently utilize the shallow visual information, in addition, the lightweight design of a network structure is realized on the basis of the residual error units capable of separating convolution, the model greatly reduces the parameter quantity and the calculated quantity while maintaining the precision.

Furthermore, according to the embodiment of the application, two super-resolution models with different sizes and reasoning performance can be divided by adjusting the number of residual error units included in the information distillation submodule and the channel base number of the whole model, so that models with different sizes and different performances can be provided for devices with different computational magnitudes.

Furthermore, the super-resolution small model is subjected to distillation training through the trained super-resolution large model, and the super-resolution precision of the super-resolution small model can be effectively improved, so that the super-resolution large model is adopted to further perform knowledge distillation training on the super-resolution small model after the training on the super-resolution large model is completed, and useful information is transmitted to the super-resolution small model from the super-resolution large model.

Fig. 9 is a block diagram of an apparatus for training an image hyper-segmentation model according to an embodiment of the present application, and as shown in fig. 9, the apparatus 40 may include:

the acquisition module 401 is configured to acquire a low-resolution image and a high-resolution image with the same content, and an initial model, where the initial model includes a super-resolution module formed by a plurality of information distillation sub-modules connected end to end in sequence; the information distillation submodule comprises a plurality of residual error units which are connected end to end in sequence;

a first extraction module 402, configured to input the low-resolution image into the initial model, and extract shallow features of the low-resolution image;

a second extraction module 403, configured to extract, by the super-segmentation module, deep features from the shallow features; the residual error unit is used for reserving part of fine features of the output features and inputting the remaining coarse features into the next residual error unit, the fine features reserved by all the residual error units of the information distillation submodule are combined and then are superposed with the shallow feature to obtain the output features of the information distillation submodule, and the output feature of the last information distillation submodule is the deep feature;

an upsampling module 404, configured to perform upsampling processing after the shallow feature and the deep feature are added and fused, so as to obtain a super-resolution image output by the initial model;

and the training module 405 is configured to train the initial model according to the loss function determined by the super-resolution image and the high-resolution image, so as to obtain an image hyper-resolution model.

Optionally, the second extracting module 403 includes:

the first convolution submodule is used for performing convolution processing on the input features through a residual error unit of the super-resolution module to obtain a first convolution result;

the superposition submodule is used for superposing the first convolution result and the input characteristic to obtain the output characteristic of the residual error unit;

the segmentation submodule is used for reserving part of fine features of the output features and inputting the remaining coarse features into a next residual error unit; the fine features reserved by the residual error unit at the tail end are output features of the residual error unit;

and the merging submodule is used for merging the fine features retained by all residual error units of the information distillation submodule and then superposing the merged fine features with the shallow features to obtain the output features of the information distillation submodule, and taking the output feature of the last information distillation submodule as the deep features.

Optionally, the merge sub-module includes:

the merging unit is used for merging the fine features reserved by all residual error units of the information distillation submodule to obtain merged features;

the weighting unit is used for weighting and fusing the merging characteristic and the shallow layer characteristic to obtain the output characteristic of the information distillation submodule;

Optionally, the weighting unit includes:

the convolution subunit is used for inputting the merging characteristics into a convolution layer with a convolution kernel size of 1 × 1 to obtain a second convolution result;

the weighting subunit is used for taking the weighted fusion result of the second convolution result and the shallow feature as the output feature of the information distillation submodule;

Optionally, the residual error unit is a lightweight residual error unit with a depth separable convolution structure;

the lightweight residual unit includes: the first convolution layer, the first active layer, the second convolution layer, the second active layer and the third convolution layer are connected end to end in sequence, and the convolution kernel size of the first convolution layer and the convolution kernel size of the third convolution layer are 1 x 1; the convolution kernel size of the second convolution layer is 3 × 3.

Optionally, the initial model includes: a weight model having a first number of first information distillation sub-modules, and a weight model having a second number of second information distillation sub-modules; the image hyper-resolution model comprises a target weight model and a target light weight model;

the training module 405 includes:

the first training submodule is used for training the weight model by utilizing a first loss function determined by the high-resolution image and the super-resolution image output by the weight model to obtain a target weight model;

and the second training submodule is used for training the light weight model by using the target weight model to obtain the target light weight model.

Optionally, a residual error unit contained in the first information distillation submodule is a basic residual error unit; the residual error unit contained in the second information distillation submodule is a light-weight residual error unit with a depth separable convolution structure;

the lightweight residual unit includes: the first convolution layer, the first active layer, the second convolution layer, the second active layer and the third convolution layer are connected end to end in sequence, and the convolution kernel size of the first convolution layer and the convolution kernel size of the third convolution layer are 1 x 1; the convolution kernel size of the second convolution layer is 3 × 3;

the base residual unit includes: the convolution kernel size of the fourth convolution layer and the convolution kernel size of the fifth convolution layer are 3 x 3.

Optionally, the second training submodule includes:

a first dividing unit, configured to, during a process of inputting the low-resolution image into the target weight model for processing, extract output features of a first information distillation sub-module corresponding to each first level according to a plurality of preset first levels, and calculate visual information of the first level according to the output features corresponding to the first levels and the number of channels of the output features, where the visual information is used to represent feature depths of the output features;

a second dividing unit configured to, during input of the low-resolution image into the lightweight model and processing, extract output features of a second information distilling submodule corresponding to each second hierarchy according to a plurality of preset second hierarchies, and calculate visual information of the second hierarchies based on the output features corresponding to the second hierarchies and the number of channels of the output features; the first hierarchy and the second hierarchy correspond one to one;

and the fusion training unit is used for training the parameters of the light weight model according to the visual information of the first level, the visual information of the second level and the loss function determined by the high-resolution image and the super-resolution image output by the target weight model to obtain the target light weight model.

Optionally, the fusion training unit includes:

a calculating subunit for calculating a difference between the visual information of the same first level and second level;

a summation subunit, configured to sum up all the difference values with a loss function determined by the high resolution image and the super resolution image output by the target weight model, so as to obtain a second loss function;

and the training subunit is used for training the parameters of the lightweight model through the second loss function to obtain the target lightweight model.

Optionally, the apparatus further comprises:

the third extraction module is used for extracting the features of the depth features to reconstruct the features to obtain reconstructed features;

the upsampling module 404, comprising:

and the up-sampling sub-module is used for performing up-sampling processing after the shallow feature and the reconstruction feature are added and fused to obtain a super-resolution image output by the initial model.

Optionally, the obtaining module includes:

an acquisition submodule for acquiring a high-resolution image;

and the down-sampling sub-module is used for carrying out down-sampling on the high-resolution image through a preset image scaling algorithm to obtain the low-resolution image.

To sum up, the training device of the image hyper-resolution model provided by the embodiment of the application can design the structure of the image hyper-resolution model as follows: the super-resolution module is composed of a plurality of information distillation sub-modules connected end to end in sequence, and each information distillation sub-module comprises a plurality of residual error units connected end to end in sequence.

Fig. 10 is a block diagram of an image super-resolution device according to an embodiment of the present application, and as shown in fig. 10, the device 50 may include:

the hyper-resolution module 501 is configured to input an image to be processed into an image hyper-resolution model, so as to obtain a super-resolution image output by the image hyper-resolution model;

Optionally, the image hyper-segmentation model comprises a target weight model and a target light weight model; the super-divide module 501 includes:

the first super-resolution module is used for inputting the image to be processed into a target weight model when the calculation force of the current equipment is greater than a preset calculation force threshold value to obtain a super-resolution image output by the target weight model;

and the second super-resolution module is used for inputting the image to be processed into the target lightweight model when the computing power of the current equipment is less than or equal to a preset computing power threshold value, so as to obtain a super-resolution image output by the target lightweight model.

For the above device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.

Preferably, an embodiment of the present application further provides a terminal, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, and when the computer program is executed by the processor, the computer program implements each process of the above training method for an image hyper-segmentation model, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above training method for an image hyper-segmentation model, and can achieve the same technical effect, and is not described herein again to avoid repetition. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As is readily imaginable to the person skilled in the art: any combination of the above embodiments is possible, and thus any combination between the above embodiments is an embodiment of the present application, but the present disclosure is not necessarily detailed herein for reasons of space.

The training methods for the image hyper-segmentation models provided herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The structure required to construct a system having the aspects of the present application will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various application aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, application is directed to less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the training method of the image hyper-segmentation model according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A training method of an image hyper-resolution model is characterized by comprising the following steps:

2. The method of claim 1, wherein said extracting deep features from said shallow features by said hyper-segmentation module comprises:

performing convolution processing on the input features through a residual error unit of the super-resolution module to obtain a first convolution result;

superposing the first convolution result and the input characteristic to obtain the output characteristic of the residual error unit;

preserving part of the fine features of the output features, and inputting the remaining coarse features into a next residual error unit; the fine features reserved by the residual error unit at the tail end are output features of the residual error unit;

and combining the fine features retained by all residual error units of the information distillation submodule, then superposing the fine features with the shallow features to obtain the output features of the information distillation submodule, and taking the output feature of the last information distillation submodule as the deep feature.

3. The method of claim 2, wherein the merging of the fine features retained by all residual units of the information distillation sub-module and the superposition of the merged fine features with the shallow features to obtain the output features of the information distillation sub-module comprises:

merging the fine features retained by all residual error units of the information distillation submodule to obtain merged features;

carrying out weighted fusion on the merging characteristics and the shallow layer characteristics to obtain output characteristics of the information distillation submodule;

4. The method of claim 3, wherein the weighted fusion of the merged features and the shallow features to obtain the output features of the information distillation submodule comprises:

inputting the merging features into a convolution layer with a convolution kernel size of 1 multiplied by 1 to obtain a second convolution result;

taking the second convolution result and the weighted fusion result of the shallow feature as the output feature of the information distillation submodule;

5. The method of any of claims 1-4, wherein the residual units are lightweight residual units having a depth separable convolution structure;

6. The method of claim 1, wherein the initial model comprises: a weight model having a first number of first information distillation sub-modules, and a weight model having a second number of second information distillation sub-modules; the image hyper-resolution model comprises a target weight model and a target light weight model;

the training of the initial model according to the loss function determined by the super-resolution image and the high-resolution image to obtain the image super-resolution model comprises the following steps:

training the weight model by using a first loss function determined by the high-resolution image and a super-resolution image output by the weight model to obtain a target weight model;

and training the light weight model by using the target weight model to obtain the target light weight model.

7. The method of claim 6, wherein the first number is greater than or equal to the second number, and wherein the first information distillation sub-module comprises a number of residual units that is greater than or equal to the second information distillation sub-module.

8. The method according to claim 6 or 7, wherein the first information distillation submodule comprises a residual unit as a base residual unit; the residual error unit contained in the second information distillation submodule is a light-weight residual error unit with a depth separable convolution structure;

9. The method of claim 6, wherein training the lightweight model with a target weight model, resulting in a target lightweight model, comprises:

in the process of inputting the low-resolution image into the target weight model for processing, according to a plurality of preset first levels, respectively extracting output features of a first information distillation submodule corresponding to each first level, and calculating visual information of the first levels according to the output features corresponding to the first levels and the number of channels of the output features, wherein the visual information is used for representing feature depths of the output features;

in the process of inputting the low-resolution image into the lightweight model for processing, according to a plurality of preset second levels, respectively extracting the output features of a second information distillation submodule corresponding to each second level, and calculating the visual information of the second levels according to the output features corresponding to the second levels and the number of channels of the output features; the first hierarchy and the second hierarchy correspond one to one;

and training parameters of the lightweight model according to the loss function determined by the visual information of the first level, the visual information of the second level and the super-resolution image output by the high-resolution image and the target weight model to obtain the target lightweight model.

10. The method of claim 9, wherein training the parameters of the lightweight model according to the first level of visual information, the second level of visual information, and the loss function determined by the high resolution image and the super resolution image output by the target weight model to obtain a target lightweight model comprises:

calculating a difference between the visual information of the same first level and second level;

adding all the difference values to a loss function determined by the high-resolution image and the super-resolution image output by the target weight model to obtain a second loss function;

and training the parameters of the lightweight model through the second loss function to obtain the target lightweight model.

11. The method of claim 1, further comprising:

extracting the features of the depth features to carry out feature reconstruction to obtain reconstruction features;

the step of performing upsampling processing after the shallow feature and the deep feature are added and fused to obtain a super-resolution image output by the initial model comprises the following steps:

and performing up-sampling processing after the shallow feature and the reconstruction feature are added and fused to obtain a super-resolution image output by the initial model.

12. The method of claim 1, wherein the obtaining a low resolution image and a high resolution image having the same content comprises:

acquiring a high-resolution image;

and carrying out down-sampling on the high-resolution image through a preset image scaling algorithm to obtain the low-resolution image.

13. An image super-resolution method, characterized in that the method comprises:

wherein the image hyper-score model is trained by the training method of the image hyper-score model according to any one of claims 1 to 11.

14. The method of claim 13, wherein the image hyper-segmentation model comprises a target weight model and a target lightweight model;

the method for inputting the image to be processed into the image hyper-resolution model to obtain the super-resolution image output by the image hyper-resolution model comprises the following steps:

when the calculation force of the current equipment is larger than a preset calculation force threshold value, inputting the image to be processed into a target weight model to obtain a super-resolution image output by the target weight model;

and when the computing power of the current equipment is less than or equal to a preset computing power threshold value, inputting the image to be processed into a target lightweight model to obtain a super-resolution image output by the target lightweight model.

15. An apparatus for training an image hyper-segmentation model, the apparatus comprising:

16. An image super-resolution device, characterized in that the device comprises:

wherein the hyper-image model is trained by the training device of the hyper-image model according to claim 15.

17. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of training a hyper-segmentation model of images as set forth in any one of claims 1 to 14.