CN112308200B

CN112308200B - Searching method and device for neural network

Info

Publication number: CN112308200B
Application number: CN201910695706.7A
Authority: CN
Inventors: 宋德华; 贾旭; 王云鹤; 许春景
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2024-04-26
Anticipated expiration: 2039-07-30
Also published as: CN112308200A; WO2021018163A1

Abstract

The application discloses a neural network searching method and device in the field of computer vision in the field of artificial intelligence. The searching method comprises the following steps: constructing a basic unit, wherein the basic unit is a network structure obtained by connecting basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for carrying out dimension reduction operation and residual error connection operation on a first input feature map, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, and the residual error connection operation is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module; constructing a search space according to the basic unit and the network structure parameters; and searching a network structure in the search space to determine a target image super-resolution network. The method and the device can improve the precision of the super-resolution network under the condition of certain calculation performance.

Description

Searching method and device for neural network

Technical Field

The present application relates to the field of artificial intelligence, and more particularly, to a method and apparatus for searching a neural network.

Background

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, man-machine interaction, recommendation and search, AI-based theory, and the like.

With the rapid development of artificial intelligence technology, neural networks (e.g., deep neural networks) have achieved great success in the processing and analysis of a variety of media signals, such as images, video, and speech. The image super-resolution reconstruction technology is that a low-resolution image is reconstructed to obtain a high-resolution image, and the image super-resolution reconstruction processing through the deep neural network has obvious advantages, and the deep neural network model is larger and larger along with the improvement of the image super-resolution reconstruction technology effect. Because the computing performance and the storage space of the mobile device are very limited, the application of the super-division model on the mobile device is greatly limited, so people aim at designing a lightweight super-division network model, and the network scale is reduced as much as possible under the condition of ensuring certain identification precision.

In order to obtain a lightweight super-division network model, a neural network structure search (neural architecture search, NAS) method is applied to an image super-resolution reconstruction technology. Currently, the search space in the NAS method is usually a search space constructed by a basic convolution unit, and the search space may include a candidate neural network model constructed by a plurality of basic units, where the plurality of basic units perform nonlinear transformation on the input feature map on the same feature size based on the input feature map size, which results in that the number of parameters in the neural network model is proportional to the calculated amount, that is, the larger the parameter amount is, the larger the calculated amount of the network model is. In the case of a mobile device with limited computational performance, the amount of computation can only be reduced by reducing the number of parameters, thereby limiting the performance of the network model for super-resolution reconstruction. Therefore, in the case of limited computing performance of the mobile device, how to improve the performance of the super-resolution neural network is a problem to be solved.

Disclosure of Invention

The application provides a searching method and device for a neural network, which can improve the precision of the super-resolution network when performing image super-resolution processing under the condition that the computing performance of mobile equipment is limited.

In a first aspect, a method for searching a neural network structure is provided, including: constructing a basic unit, wherein the basic unit is a network structure obtained by connecting basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for carrying out dimension reduction operation and residual error connection operation on a first input feature map, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the residual error connection operation is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module, and the dimension of the feature map processed by the first module is the same as the dimension of the first input feature map; constructing a search space according to the basic unit and network structure parameters, wherein the network structure parameters comprise the types of basic modules used for constructing the basic unit, and the search space is used for searching the super-resolution network structure of the image; searching an image super-resolution network structure in the search space to determine a target image super-resolution network, wherein the target image super-resolution network is used for performing super-resolution processing on an image to be processed, the target image super-resolution network at least comprises the first module, and the target image super-resolution network is a network with calculated amount smaller than a first preset threshold value and image super-resolution precision larger than a second preset threshold value.

The basic unit may be a network structure obtained by connecting basic modules through basic operations of a neural network, and the network structure may include basic operations or combinations of basic operations in a predetermined convolutional neural network, and these basic operations or combinations of basic operations may be collectively referred to as basic operations.

For example, the basic operation may refer to a convolution operation, a pooling operation, a residual connection, and the like, and the basic operation may enable connection between the respective basic modules, thereby obtaining a network structure of the basic unit.

The feature addition may refer to adding different channel features for feature maps of the same scale.

In the embodiment of the application, the first module can carry out residual connection on the input feature map, namely, the feature addition processing can be carried out on the first input feature map and the feature map processed by the first module, so that the purpose that more local detail information in the first input feature map can be transferred to a later convolution layer is realized. The judicial module may be used to dimension down the first input feature map while ensuring that sufficient local detail information of the first input feature map is passed to the following convolution layers in the first module. The dimension of the input feature map can be reduced through dimension reduction operation, so that the calculation amount of a model is reduced, meanwhile, the residual error connection operation can well transfer information of a front layer to a rear layer, and the defect that the dimension reduction operation information is lost is overcome. Meanwhile, the dimension reduction operation can also quickly expand the receptive field of the features, so that the prediction of the high-resolution pixel points can better consider the context information, and the super-resolution precision is improved.

In one possible implementation, the basic unit is a basic module for constructing an image super-resolution network.

With reference to the first aspect, in certain implementations of the first aspect, the dimension reduction operation includes at least one of a pooling operation and a convolution operation with a step size Q, Q being a positive integer greater than 1.

Alternatively, the pooling operation may be an average pooling operation, or the pooling operation may be a maximum pooling operation.

In the embodiment of the application, the dimension of the first input feature map can be reduced through dimension reduction operation, so that the calculated amount of the target image super-resolution network is reduced under the condition that the parameter number is unchanged.

With reference to the first aspect, in some implementations of the first aspect, the feature map processed by the first module is a feature map after a dimension increasing operation, where the dimension increasing operation refers to restoring the dimension of the feature map after the dimension reducing process to the first dimension, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map after the dimension increasing operation.

Alternatively, the dimension up operation may refer to an up-sampling operation, or the dimension up-sampling operation may refer to a reverse convolution operation, where the up-sampling operation may refer to an interpolation method, that is, a suitable interpolation algorithm is adopted to insert new elements between pixel points on the basis of original image pixels; deconvolution operations may refer to the inverse of convolution operations, also known as transpose convolutions.

In the embodiment of the application, the dimension of the first input feature map after the dimension reduction operation can be converted from the second dimension to the original first dimension through the dimension increase operation, namely the dimension increase of the feature map after the dimension reduction operation is realized, so that the residual connection operation is realized under the same dimension.

With reference to the first aspect, in some implementations of the first aspect, the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to feature stitching an output feature map of each convolution layer of i-1 convolution layers with the first input feature map as an input feature map of an i-th convolution layer, and i is a positive integer greater than 1.

The feature stitching may refer to stitching M feature graphs of the same scale into a feature graph with K channels, where K is a positive integer greater than M.

In embodiments of the present application, maximum information flow in the network can be achieved by employing dense connectivity operations, by each layer being connected to all layers preceding that layer, i.e. the input to each layer is a concatenation of outputs of all the preceding layers. The information in the input feature map is better kept in the whole network through dense connection operation, so that the defect of information loss caused by dimension reduction operation can be better overcome.

With reference to the first aspect, in some implementations of the first aspect, the dense connection operation is a cyclic dense connection operation, where the cyclic dense connection operation refers to performing feature stitching on the first input feature map after the channel compression processing.

In the embodiment of the application, the depth of the target image super-resolution network can be deepened by adopting the cyclic operation, namely the cyclic dense connection operation, and the deeper the depth of the network structure is for the neural network structure, namely the more the number of convolution layers in the network structure is, so that the accuracy of processing the image by the target image super-resolution network is improved.

With reference to the first aspect, in some implementations of the first aspect, the first module is further configured to perform a rearrangement operation, where the rearrangement operation is to combine the plurality of first channel features of the first input feature map according to a preset rule to generate a second channel feature, where a resolution of the second channel feature is higher than a resolution of the first channel feature.

With reference to the first aspect, in certain implementation manners of the first aspect, the base module further includes a second module and/or a third module, where the second module is configured to perform a channel compression operation, the residual connection operation, and the dense connection operation on a second input feature map, where the channel compression operation refers to performing a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is configured to perform a channel switching operation, the residual connection operation, and the dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.

With reference to the first aspect, in certain implementation manners of the first aspect, the performing an image super-resolution network structure search in the search space to determine a target image super-resolution network includes:

Searching an image super-resolution network structure in the search space through an evolutionary algorithm to determine a first image super-resolution network;

and carrying out back propagation iterative training on the first image super-resolution network through a multi-stage weighted joint loss function to determine the target image super-resolution network, wherein the multi-stage weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network.

It should be appreciated that a training image is also required in super-resolution of the first image by means of a multi-stage weighted joint loss function, wherein the training image may refer to a sample image, i.e. a low resolution image, and a sample super-resolution image corresponding to the low resolution.

In the embodiment of the application, the first image super-resolution network determined by the evolution algorithm can be trained for the second time through a multi-stage weighting joint loss function, and finally, the parameters of the target image super-resolution network are determined to obtain the target image super-resolution network, so that the accuracy of processing the image by the target image super-resolution network is improved.

With reference to the first aspect, in certain implementations of the first aspect, the multi-level weighted joint loss function is obtained according to the following equation,

Wherein L represents the multi-level weighted joint loss function, L _k represents a loss value of the kth unit of the first image super-resolution network, the loss value refers to an image loss between a predicted super-resolution image corresponding to the output feature map of the kth unit and a sample super-resolution image, λ _k,t represents a weight of the loss value of the kth layer at time t, N represents the number of the unit included in the first image super-resolution network, and N is an integer greater than or equal to 1.

In embodiments of the present application, the weight of each intermediate layer image loss in the multi-level weighted joint loss function may vary over time (or the number of iterations). The loss function can be combined with the predicted image loss of each middle layer and embody the importance degrees of different layers in a weighted mode, wherein the weight value of each middle layer image loss can be changed along with the change of time, so that the parameters of the basic unit of the bottom layer can be more fully trained, and the performance of the super-resolution network is improved.

With reference to the first aspect, in certain implementation manners of the first aspect, the determining the first image super-resolution network by performing an image super-resolution network structure search in the search space through an evolutionary algorithm includes:

randomly generating P candidate network structures according to the basic unit, wherein P is an integer greater than 1;

training the P candidate network structures by adopting the multi-stage weighting joint loss function;

evaluating a performance parameter of each candidate network structure of the P candidate network structures after training, the performance parameter including a peak signal-to-noise ratio for indicating a difference between a predicted superresolution image and a sample superresolution image obtained by the each candidate network structure;

and determining the first image super-resolution network according to the performance parameters of the candidate network.

It should be understood that the candidate network structures need to be trained with training images and multi-level weighted joint loss functions when evaluating the performance parameters of the P candidate network structures, where the training images may refer to sample images, i.e., low resolution images, and sample super resolution images corresponding to low resolution.

In a second aspect, there is provided an image processing method including: acquiring an image to be processed; the method comprises the steps of performing super-resolution processing on an image to be processed according to a target image super-resolution network to obtain a target image of the image to be processed, wherein the target image is a super-resolution image corresponding to the image to be processed, the target image super-resolution network is a network determined by searching an image super-resolution network structure in a search space, the search space is constructed through a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise types of basic modules used for constructing the basic unit, the basic unit is a network structure obtained by connecting the basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for performing residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is used for performing feature addition processing on the first input feature map and the feature map processed through the first module, the dimension reduction operation is used for converting the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the basic module is used for constructing the basic module, the basic module is used for connecting the first module, the first dimension is used for obtaining the basic module, the basic module is used for performing the residual dimension operation, and the first dimension operation is used for processing the first dimension of the feature map after the first input feature map is the first module.

With reference to the second aspect, in certain implementations of the second aspect, the dimension reduction operation includes at least one of a pooling operation and a convolution operation with a step size Q, Q being a positive integer greater than 1.

With reference to the second aspect, in some implementations of the second aspect, the feature map processed by the first module is a feature map after a dimension-up operation, where the dimension-up operation refers to restoring the dimension of the feature map after the dimension-down operation to the first dimension, and the residual connection operation refers to performing a feature addition process on the first input feature map and the feature map after the dimension-up operation.

With reference to the second aspect, in some implementations of the second aspect, the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to feature stitching an output feature map of each of i-1 convolution layers with the first input feature map as an input feature map of an ith convolution layer, and i is a positive integer greater than 1.

With reference to the second aspect, in some implementations of the second aspect, the dense connection operation is a cyclic dense connection operation, where the cyclic dense connection operation refers to performing feature stitching on the first input feature map after the channel compression processing.

With reference to the second aspect, in some implementations of the second aspect, the first module is further configured to perform a rearrangement operation, where the rearrangement operation performs a merging process on a plurality of first channel features of the first input feature map according to a preset rule to generate a second channel feature, where a resolution of the second channel feature is higher than a resolution of the first channel feature.

With reference to the second aspect, in certain implementation manners of the second aspect, the base module further includes a second module and/or a third module, where the second module is configured to perform a channel compression operation, the residual connection operation, and the dense connection operation on a second input feature map, where the channel compression operation refers to performing a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is configured to perform a channel switching operation, the residual connection operation, and the dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.

With reference to the second aspect, in some implementations of the second aspect, the target image super-resolution network is a network determined by performing back propagation iterative training on a first image super-resolution network through a multi-level weighted joint loss function, where the multi-level weighted joint loss function is determined according to a loss between a predicted super-resolution image corresponding to a feature map output by each of the basic units in the first image super-resolution network and a sample super-resolution image, and the first image super-resolution network refers to a network determined by performing image super-resolution network structure search through an evolutionary algorithm in the search space.

With reference to the second aspect, in certain implementations of the second aspect, the multi-level weighted joint loss function is obtained according to the following equation,

Wherein L represents the multi-level weighted joint loss function, L _k represents a loss value of the kth unit of the first image super-resolution network, the loss value refers to an image loss between a predicted super-resolution image corresponding to the output feature map of the kth unit and the sample super-resolution image, λ _k,t represents a weight of the loss value of the kth layer at time t, N represents the number of the unit included in the first image super-resolution network, and N is an integer greater than or equal to 1.

With reference to the second aspect, in certain implementations of the second aspect, the first image super-resolution network is determined according to a performance parameter of each of P candidate network structures, the P candidate network structures being randomly generated according to the base unit, the performance parameter being a parameter that evaluates performance of the P candidate network structures trained by using the multi-stage weighted joint loss function, the performance parameter including a peak signal-to-noise ratio for indicating a difference between a predicted super-resolution image and a sample super-resolution image obtained by the each candidate network structure, P being an integer greater than 1.

In a third aspect, an image processing method is provided, applied to an electronic device having a display screen and a camera, the method including: detecting a first operation of a user for turning on a camera; in response to the first operation, displaying a shooting interface on the display screen, wherein the shooting interface comprises a view-finding frame, and a first image is included in the view-finding frame; detecting that the user indicates a second operation of the camera; and in response to the second operation, displaying a second image in the view-finding frame, wherein the second image is an image obtained by performing super-resolution processing on the first image acquired by the camera, the target image super-resolution network is applied to the super-resolution processing process, the target image super-resolution network is a network determined by searching a network structure in a search space, the search space is constructed by a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise a type of basic module used for constructing the basic unit, the basic unit is a network structure obtained by connecting basic modules through the basic operation of the neural network, the basic module comprises a first module, the first module is used for performing residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is used for performing feature addition processing on the first input feature map and the feature map processed by the first module, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the dimension of the first input feature map, and the residual dimension connection operation is used for performing feature addition processing on the first input feature map and the first dimension map.

With reference to the third aspect, in some implementations of the third aspect, the dimension reduction operation may include at least one of a pooling operation and a convolution operation with a step size Q, where Q is a positive integer greater than 1.

With reference to the third aspect, in some implementations of the third aspect, the feature map processed by the first module is a feature map after undergoing a dimension-increasing operation, where the dimension-increasing operation refers to restoring the dimension of the feature map after undergoing the dimension-reducing process to the first dimension, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map after undergoing the dimension-increasing operation.

With reference to the third aspect, in some implementations of the third aspect, the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to feature stitching an output feature map of each of i-1 convolution layers with the first input feature map as an input feature map of an ith convolution layer, and i is a positive integer greater than 1.

With reference to the third aspect, in some implementations of the third aspect, the dense connection operation is a cyclic dense connection operation, where the cyclic dense connection operation refers to performing feature stitching on the first input feature map after the channel compression processing.

With reference to the third aspect, in some implementations of the third aspect, the first module is further configured to perform a rearrangement operation, where the rearrangement operation performs a merging process on a plurality of first channel features of the first input feature map according to a preset rule to generate a second channel feature, where a resolution of the second channel feature is higher than a resolution of the first channel feature.

With reference to the third aspect, in some implementations of the third aspect, the base module further includes a second module and/or a third module, where the second module is configured to perform a channel compression operation, the residual connection operation, and the dense connection operation on a second input feature map, where the channel compression operation refers to performing a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is configured to perform a channel switching operation, the residual connection operation, and the dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, where M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.

With reference to the third aspect, in some implementations of the third aspect, the target image super-resolution network is a network determined by performing back propagation iterative training on a first image super-resolution network through a multi-level weighted joint loss function, where the multi-level weighted joint loss function is determined according to a loss between a predicted super-resolution image and a sample super-resolution image corresponding to a feature map output by each of the basic units in the first image super-resolution network, and the first image super-resolution network refers to a network determined by performing image super-resolution network structure search in the search space through an evolutionary algorithm.

With reference to the third aspect, in some implementations of the third aspect, the multi-level weighted joint loss function is obtained according to the following equation,

Wherein L represents the multi-level weighted joint loss function, L _k represents a loss value of a kth basic unit of the first image super-resolution network, the loss value refers to an image loss between a predicted super-resolution image corresponding to an output feature map of the kth basic unit and the sample super-resolution image, λ _k,t represents a weight of a loss value of the kth layer at a time t, N represents the number of basic units included in the first image super-resolution network, and N is an integer greater than or equal to 1.

With reference to the third aspect, in some implementations of the third aspect, the first image super-resolution network is determined according to a performance parameter of each candidate network structure among P candidate network structures, the P candidate network structures being randomly generated according to the base unit, the performance parameter being a parameter that evaluates performance of the P candidate network structures trained by using the multi-level weighted joint loss function, the performance parameter including a peak signal-to-noise ratio for indicating a difference between a predicted super-resolution image and a sample super-resolution image obtained by the each candidate network structure, P being an integer greater than 1.

It should be appreciated that the extensions, definitions, explanations and illustrations of the relevant content in the first aspect described above also apply to the same content in the second and third aspects.

In a fourth aspect, there is provided a search apparatus for a neural network, the apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to, when executed, perform: constructing a basic unit, wherein the basic unit is a network structure obtained by connecting basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for carrying out dimension reduction operation and residual error connection operation on a first input feature map, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the residual error connection operation is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module, and the dimension of the feature map processed by the first module is the same as the dimension of the first input feature map; constructing a search space according to the basic unit and network structure parameters, wherein the network structure parameters comprise the types of basic modules used for constructing the basic unit, and the search space is used for searching the super-resolution network structure of the image; searching an image super-resolution network structure in the search space to determine a target image super-resolution network, wherein the target image super-resolution network is used for performing super-resolution processing on an image to be processed, the target image super-resolution network at least comprises the first module, and the target image super-resolution network is a network with calculated amount smaller than a first preset threshold value and image super-resolution precision larger than a second preset threshold value.

In a possible implementation manner, the processor included in the searching apparatus of the neural network is further configured to perform the searching method in any implementation manner of the first aspect.

In a fifth aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to, when executed, perform: acquiring an image to be processed; the method comprises the steps of performing super-resolution processing on an image to be processed according to a target image super-resolution network to obtain a target image of the image to be processed, wherein the target image is a super-resolution image corresponding to the image to be processed, the target image super-resolution network is a network determined by searching an image super-resolution network structure in a search space, the search space is constructed through a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise types of basic modules used for constructing the basic unit, the basic unit is a network structure obtained by connecting the basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for performing residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is used for performing feature addition processing on the first input feature map and the feature map processed through the first module, the dimension reduction operation is used for converting the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the basic module is used for constructing the basic module, the basic module is used for connecting the first module, the first dimension is used for obtaining the basic module, the basic module is used for performing the residual dimension operation, and the first dimension operation is used for processing the first dimension of the feature map after the first input feature map is the first module.

In a possible implementation manner, the processor included in the image processing apparatus is further configured to perform the method in any implementation manner of the second aspect.

In a sixth aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to, when executed, perform: detecting a first operation of a user for turning on a camera; in response to the first operation, displaying a shooting interface on the display screen, wherein the shooting interface comprises a view-finding frame, and a first image is included in the view-finding frame; detecting that the user indicates a second operation of the camera; and in response to the second operation, displaying a second image in the view-finding frame, wherein the second image is an image obtained by performing super-resolution processing on the first image acquired by the camera, the target image super-resolution network is applied to the super-resolution processing process, the target image super-resolution network is a network determined by searching an image super-resolution network structure in a search space, the search space is constructed by a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise a type of basic module used for constructing the basic unit, the basic unit is a network structure obtained by connecting the basic module through the basic operation of a neural network, the basic module comprises a first module, the first module is used for performing residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is used for performing feature addition processing on the first input feature map and the feature map processed by the first module, the dimension reduction operation is used for converting the dimension of the first input feature map from an original first dimension to a second dimension, the dimension of the first input feature map is smaller than the first dimension of the first input feature map, and the dimension of the first input feature map is smaller than the first dimension of the first input feature map.

In a possible implementation manner, the processor included in the image processing apparatus is further configured to perform the method in any implementation manner of the third aspect.

In a seventh aspect, a computer readable medium is provided, the computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of any one of the above first to third aspects and implementations of the first to third aspects.

In an eighth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the implementations of the first to third aspects and the first to third aspects described above.

A ninth aspect provides a chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface, performing the method of any one of the above first to third aspects and first to third aspects.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the processor is configured to perform the method in any one of the foregoing first to third aspects and the first to third aspects when the instructions are executed.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence subject framework provided by an embodiment of the present application;

fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 3 is a schematic diagram of another application scenario provided in an embodiment of the present application;

Fig. 4 is a schematic diagram of still another application scenario provided in an embodiment of the present application;

fig. 5 is a schematic diagram of still another application scenario provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a convolutional neural network according to an embodiment of the present application;

FIG. 8 is a schematic diagram of another convolutional neural network structure provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a chip hardware structure according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a system architecture provided by an embodiment of the present application;

FIG. 11 is a schematic flow chart diagram of a method for searching a neural network provided by the implementation of the present application;

FIG. 12 is a schematic diagram of a target image super-resolution network according to an embodiment of the present application;

FIG. 13 is a schematic view of a first module according to an embodiment of the present application;

FIG. 14 is a schematic view of another first module according to an embodiment of the present application;

FIG. 15 is a schematic view of a first module according to an embodiment of the present application;

FIG. 16 is a schematic diagram of a rearrangement operation provided by an embodiment of the present application;

FIG. 17 is a schematic diagram of a second module according to an embodiment of the present application;

FIG. 18 is a schematic diagram of a third module according to an embodiment of the present application;

FIG. 19 is a schematic diagram of a channel switching process provided by an embodiment of the present application;

FIG. 20 is a schematic diagram of a search image super-resolution network according to an embodiment of the present application;

FIG. 21 is a schematic diagram of network training through a multi-level weighted joint loss function provided by an embodiment of the present application;

FIG. 22 is a schematic diagram of network structure searching based on an evolutionary algorithm provided by an embodiment of the application;

FIG. 23 is a schematic view of the effect of image processing through the target super-resolution network of an embodiment of the present application;

FIG. 24 is a schematic view of the effect of image processing through the target super-resolution network of an embodiment of the present application;

fig. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application;

FIG. 26 is a schematic flow chart of an image processing method provided by an embodiment of the present application;

FIG. 27 is a schematic diagram of a set of display interfaces provided by an embodiment of the present application;

FIG. 28 is a schematic diagram of another set of display interfaces provided by an embodiment of the present application;

fig. 29 is a schematic block diagram of a search apparatus of a neural network of an embodiment of the present application;

fig. 30 is a schematic block diagram of an image processing apparatus of an embodiment of the present application.

Detailed Description

The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

FIG. 1 illustrates a schematic diagram of an artificial intelligence framework that describes the overall workflow of an artificial intelligence system, applicable to general artificial intelligence field requirements.

The above-described artificial intelligence topic framework 100 is described in detail below in terms of two dimensions, the "Smart information chain" (horizontal axis) and the "information technology (information technology, IT) value chain" (vertical axis).

The "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process.

The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure 110

The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform.

The infrastructure may communicate with the outside through sensors, and the computing power of the infrastructure may be provided by the smart chip.

The smart chip may be a hardware acceleration chip such as a central processing unit (central processing unit, CPU), a neural Network Processor (NPU), a graphics processor (graphics processing unit, GPU), an Application Specific Integrated Circuit (ASIC), and a field programmable gate array (field programmable GATE ARRAY, FPGA).

The basic platform of the infrastructure can comprise a distributed computing framework, network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection network and the like.

For example, for an infrastructure, data may be obtained through sensor and external communication and then provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data 120

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to internet of things data of traditional equipment, wherein the data comprise service data of an existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing 130

Such data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) Generic capabilities 140

After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application 150

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

Application scenario one: intelligent terminal field of shooing

In one embodiment, as shown in fig. 2, the method for searching a neural network structure according to the embodiment of the present application may be applied to a real-time image super-resolution technology performed by an intelligent terminal device (for example, a mobile phone). The method for searching the neural network structure can determine the target image super-resolution network applied to the intelligent terminal shooting field. When a user shoots a long-distance object or a tiny object by using the intelligent terminal through the target image super-resolution network, the resolution of the shot image is lower, and details are unclear. The target image super-resolution network provided by the embodiment of the application can be used by a user to realize image super-resolution processing on the intelligent terminal, so that a low-resolution image can be converted into a high-resolution image, and a shot object is clearer.

Exemplary, the present application proposes an image processing method applied to an electronic device having a display screen and a camera, the method comprising: detecting a first operation of a user for turning on a camera; in response to the first operation, displaying a shooting interface on the display screen, wherein the shooting interface comprises a view-finding frame, and a first image is included in the view-finding frame; detecting a second operation of the user-indicated camera; and responding to the second operation, displaying a second image in the view-finding frame, wherein the second image is an image subjected to super-resolution processing aiming at the first image acquired by the camera, and the target super-resolution neural network is applied to the super-resolution processing process.

The target image super-resolution network is a network determined by searching an image super-resolution network structure in a search space, the search space is constructed by a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise the type of the basic module used for constructing the basic unit, the basic unit is a network structure obtained by connecting the basic modules through basic operation of a neural network, the basic module at least comprises a first module, the first module is used for carrying out residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the target image super-resolution network at least comprises the first module, and the first module is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module has the same dimension.

Optionally, in a possible implementation, the base unit is a base module for constructing an image super resolution network.

Optionally, in one possible implementation manner, the dimension reduction operation may include at least one of a pooling operation and a convolution operation with a step size Q, where Q is a positive integer greater than 1.

Optionally, in one possible implementation manner, the feature map processed by the first module is a feature map subjected to an dimension increasing operation, where the dimension increasing operation refers to restoring the dimension of the feature map subjected to the dimension reducing process to the first dimension, and the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map subjected to the dimension increasing operation.

Optionally, in one possible implementation manner, the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to performing feature stitching on an output feature map of each convolution layer in the i-1 th convolution layer and the first input feature map as an input feature map of the i-th convolution layer, and i is a positive integer greater than 1.

Optionally, in one possible implementation manner, the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation refers to feature stitching processing on the first input feature map after channel compression processing.

Optionally, in one possible implementation manner, the first module is further configured to perform a rearrangement operation, where the rearrangement operation is to combine the plurality of first channel features of the first input feature map according to a preset rule to generate a second channel feature, where a resolution of the second channel feature is higher than a resolution of the first channel feature.

Optionally, in a possible implementation manner, the base module further includes a second module and/or a third module, where the second module is configured to perform a channel compression operation, the residual connection operation, and the dense connection operation on a second input feature map, where the channel compression operation is a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is configured to perform a channel switching operation, the residual error connection operation, and the dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, where M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.

Optionally, in one possible implementation manner, the target image super-resolution network is a network determined by performing back propagation iterative training on a first image super-resolution network through a multi-level weighted joint loss function, where the multi-level weighted joint loss function is determined according to a loss between a predicted super-resolution image corresponding to a feature map output by each basic unit in the first image super-resolution network and a sample super-resolution image, and the first image super-resolution network is a network determined by performing image super-resolution network structure searching through an evolutionary algorithm in the search space.

In one possible implementation, the multi-level weighted joint loss function is optionally derived from the following equation,

Wherein L represents the multi-level weighted joint loss function, L _k represents a loss value of a kth unit of the first image super-resolution network, the loss value refers to an image loss between a predicted super-resolution image obtained by an output feature map of the kth unit and the sample super-resolution image, λ _k,t represents a weight of a loss value of the kth layer at a time t, N represents the number of the unit cells included in the first image super-resolution network, and N is an integer greater than or equal to 1.

Optionally, in one possible implementation manner, the first image super-resolution network is determined according to a performance parameter of each candidate network structure among P candidate network structures, where the P candidate network structures are randomly generated according to the base unit, and the performance parameter is a parameter for evaluating performance of the P candidate network structures trained by using the multi-level weighted joint loss function, where the performance parameter includes a peak signal-to-noise ratio, and the peak signal-to-noise ratio is used to indicate a difference between a predicted super-resolution image and a sample super-resolution image obtained by the each candidate network structure.

It should be noted that, the target image super-resolution network applied to the photographing field of the intelligent terminal provided by the embodiment of the present application is also applicable to the expansion, limitation, explanation and description of the related content of the target image super-resolution network in the related embodiments in fig. 10 to 22, which are not repeated herein.

As shown in fig. 2, for example, a schematic diagram of applying the target image super-resolution network to an intelligent terminal, where a low-resolution image 220 or an image 230 obtained when a user uses the intelligent terminal 210 (e.g., a mobile phone) to photograph an object with a relatively long distance, the super-resolution 240 (SR) shown in fig. 2 may be the target image super-resolution network in the embodiment of the present application, and the target image may be obtained after processing the target image super-resolution network, for example, the super-resolution image 221 may be obtained after processing the image 220 with super-resolution; the super-resolution image 231 can be obtained by subjecting the image 230 to super-resolution processing.

It should be noted that, the intelligent terminal 210 may be an electronic device with a camera, for example, the intelligent terminal may be a mobile phone with an image processing function, a tablet personal computer (tablet personal computer, TPC), a media player, a smart television, a notebook computer (LC), a personal digital assistant (personal DIGITAL ASSISTANT, PDA), a personal computer (personal computer, PC), a camera, a video camera, a smart watch, a Wearable Device (WD), or a vehicle-mounted terminal in an automatic driving vehicle, which is not limited in the embodiments of the present application.

And (2) an application scene II: security field

In one embodiment, as shown in fig. 3, the searching method of the neural network in the embodiment of the application can be applied to the field of security protection. For example, pictures (or videos) collected by monitoring equipment in public places are often affected by factors such as weather, distance and the like, and problems such as image blurring and low resolution exist. The super-resolution reconstruction can be carried out on the acquired picture through the target image super-resolution network, important information such as license plate numbers, clear faces and the like can be recovered for public security personnel, and important clue information is provided for case detection.

Exemplary, the present application provides an image processing method, the method including: obtaining a street view picture; performing super-resolution processing on the street view picture according to the target image super-resolution network to obtain a super-resolution image of the street view picture; and identifying information in the super-resolution image according to the super-resolution image of the street view picture.

It should be noted that, the target image super-resolution network applied to the security field provided by the embodiment of the present application is also applicable to the expansion, limitation, explanation and description of the related content of the target image super-resolution network in the related embodiments in fig. 10 to 22, which are not repeated here.

And (3) an application scene III: medical imaging field

In one embodiment, as shown in fig. 4, the searching method of the neural network according to the embodiment of the present application may be applied to the field of medical imaging. For example, the target image super-resolution network can reconstruct medical images in super-resolution, the requirement on imaging environment can be reduced on the basis of not increasing the cost of high-resolution imaging technology, and accurate detection of lesion cells is realized through the restored clear medical images, so that a doctor can make better diagnosis on the illness state of a patient.

Exemplary, the present application provides an image processing method, the method including: acquiring a medical image picture; performing super-resolution processing on the medical image picture according to the target image super-resolution network to obtain a super-resolution image of the medical image picture; and identifying and analyzing information in the super-resolution image according to the super-resolution image of the medical image picture.

The target image super-resolution network is a network determined by searching an image super-resolution network structure in a search space, the search space is constructed by a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise the type of the basic module used for constructing the basic unit, the basic unit is a network structure obtained by connecting the basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for carrying out residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the target image super-resolution network at least comprises the first module, and the first module is used for carrying out feature addition processing on the first input feature map and the feature map with the same dimension.

It should be noted that, the target image super-resolution network applied to the medical imaging field provided in the embodiment of the present application is also applicable to the expansion, definition, explanation and description of the related content of the target image super-resolution network in the related embodiments in fig. 10 to 22, which are not repeated herein.

And application scene IV: image compression field

In one embodiment, as shown in fig. 5, the searching method of the neural network according to the embodiment of the present application may be applied to the field of image compression. For example, in the occasion with higher real-time requirements such as video conference, the picture can be compressed in advance before transmission, after the transmission is completed, the receiving end decodes the picture and then carries out super-resolution reconstruction technology through the target image super-resolution network to restore the original image sequence, so that the space required by storage and the bandwidth required by transmission are greatly reduced.

Exemplary, the present application provides an image processing method, the method including: acquiring a compressed image; performing super-resolution processing on the compressed image according to a target image super-resolution network to obtain a super-resolution image of the compressed image; information in the super-resolution image is identified from the super-resolution image of the compressed image.

It should be noted that, the target image super-resolution network applied to the image compression field provided in the embodiment of the present application is also applicable to the expansion, limitation, explanation and description of the related content of the target image super-resolution network in the related embodiments in fig. 10 to 22, which are not repeated herein.

It should be understood that the foregoing is illustrative of an application scenario, and is not intended to limit the application scenario of the present application in any way.

Since embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, the following description will first discuss the terms and concepts related to neural networks that may be involved in embodiments of the present application.

(1) Neural network

The neural network may be composed of neural units, which may refer to an arithmetic unit with x _s and intercept 1 as inputs, and the output of the arithmetic unit may be:

Where s=1, 2, … … n, n is a natural number greater than 1, W _s is the weight of x _s, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to a next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression: wherein/> Is an input vector,/>Is an output vector,/>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for input vectors/>The output vector/>, is obtained through the simple operationSince the number of DNN layers is large, the coefficient W and the offset vector/>And the number of (2) is also relatively large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as/>The superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

In summary, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined as

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) Convolutional neural network

The convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, which can be regarded as a filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The convolution kernel can be initialized in the form of a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

(5) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the parameter in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

(6) Neural network structure search

Neural network structure search (Neural Architecture Search, NAS) is a technology for automatically designing a neural network, and a high-performance network structure can be automatically designed according to a sample set through an algorithm.

The search space, the search strategy and the performance evaluation strategy are core elements of the NAS algorithm.

The search space may refer to a collection of neural network structures searched, i.e., a space of solutions. In order to improve the search efficiency, the search space is sometimes limited or simplified. In some NAS implementations the network is split into elementary cells, or blocks, and a more complex network is formed by the stack of these cells. The base unit consists of a number of nodes (layers of the neural network) that are repeated multiple times throughout the network but have different weight parameters.

The search strategy may refer to a process of finding an optimal network structure in a search space. The search strategy defines how to find the optimal network structure, typically an iterative optimization process, essentially a super-parametric optimization problem.

The performance evaluation policy may refer to evaluating the performance of the searched network structure. The goal of the search strategy is to find a neural network structure, through which the performance of the searched network structure can be evaluated.

Fig. 6 illustrates a system architecture 300 provided by an embodiment of the present application. In fig. 6, a data acquisition device 360 is used to acquire training data. For the image processing method according to the embodiment of the present application, after the target image super-resolution network is determined by the searching method of the neural network according to the embodiment of the present application, the target super-resolution network may be further trained by the training image, that is, the training data collected by the data collecting device 360 may be a training image, where the training image may include a sample image and a super-resolution image corresponding to the sample image, and the sample image may refer to a low-resolution image, for example, the low-resolution image may refer to an image with unclear image quality and blurred image.

After the training data is collected, the data collection device 360 stores the training data in the database 330 and the training device 320 trains the target model/rule 301 based on the training data maintained in the database 330.

The training device 320 obtains the target model/rule 301 based on the training data, and the training device 320 processes the input original image and compares the output image with the original image until the difference between the output image of the training device 320 and the original image is smaller than a certain threshold value, thereby completing the training of the target model/rule 301.

For example, in the image processing method provided by the application, the target image super-resolution network for performing image super-resolution processing may be obtained by training through loss between a predicted super-resolution image of a sample image and the sample super-resolution image, and the trained network enables a difference between the predicted super-resolution image obtained by inputting the sample image into the target image super-resolution network and the sample super-resolution image to be smaller than a certain threshold value, thereby completing training of the target image super-resolution network.

The above-described object model/rule 301 can be used to implement the image processing method of the embodiment of the present application. The target model/rule 301 in the embodiment of the present application may be specifically a neural network.

It should be noted that, in practical applications, the training data maintained in the database 330 is not necessarily all acquired by the data acquisition device 360, but may be received from other devices. It should be further noted that the training device 320 is not necessarily completely based on the training data maintained by the database 330 to perform training of the target model/rule 301, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.

The target model/rule 301 obtained by training according to the training device 320 may be applied to different systems or devices, such as the execution device 310 shown in fig. 6, where the execution device 310 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR)/Virtual Reality (VR), a vehicle-mounted terminal, or the like, or may be a server, or a cloud, or the like. In fig. 6, the execution device 310 configures an input/output (I/O) interface 312 for data interaction with external devices, and a user may input data to the I/O interface 312 through the client device 340, where the input data may include in an embodiment of the present application: the image to be processed is input by the client device.

The preprocessing module 313 and the preprocessing module 314 are used for preprocessing according to the input data (such as the image to be processed) received by the I/O interface 312, and in the embodiment of the present application, the preprocessing module 313 and the preprocessing module 314 (or only one of the preprocessing modules) may be omitted, and the computing module 311 may be directly used for processing the input data.

In preprocessing input data by the execution device 310, or in performing processing related to computation or the like by the computation module 311 of the execution device 310, the execution device 310 may call data, code or the like in the data storage system 350 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 350.

Finally, the I/O interface 312 returns the processing results, such as the predicted depth processed depth image obtained as described above, to the client device 340 for provision to the user.

It should be noted that the training device 320 may generate, based on different training data, a corresponding target model/rule 301 for different targets or different tasks, where the corresponding target model/rule 301 may be used to achieve the targets or to perform the tasks, thereby providing the user with the desired results.

In the case shown in fig. 6, the user may manually give input data, which may be operated through an interface provided by the I/O interface 312. In another case, the client device 340 may automatically send input data to the I/O interface 312, and if the client device 340 is required to automatically send the input data requiring authorization from the user, the user may set the corresponding permissions in the client device 340. The user may view the results output by the execution device 310 at the client device 340, and the particular presentation may be in the form of a display, sound, action, or the like. The client device 340 may also be used as a data collection terminal to collect input data of the input I/O interface 312 and output results of the output I/O interface 312 as new sample data as shown in the figure, and store the new sample data in the database 330. Of course, the input data of the input I/O interface 312 and the output result of the output I/O interface 312 as shown in the figure may be stored as new sample data in the database 330 directly by the I/O interface 312 instead of being collected by the client device 340.

It should be noted that fig. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 6, the data storage system 350 is an external memory with respect to the execution device 310, and in other cases, the data storage system 350 may be disposed in the execution device 310.

As shown in fig. 6, a training device 320 trains to obtain a target model/rule 301, where the target model/rule 301 may be a neural network in the present application in the embodiment of the present application, and specifically, the neural network provided in the embodiment of the present application may be a CNN, a deep convolutional neural network (deep convolutional neural networks, DCNN), or the like.

Since CNN is a very common neural network, the structure of CNN will be described in detail with reference to fig. 7. As described in the basic concept introduction above, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (DEEP LEARNING) architecture, where the deep learning architecture refers to learning at multiple levels at different abstraction levels through machine learning algorithms. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to an image input thereto.

The structure of the neural network specifically adopted by the image processing method in the embodiment of the present application may be as shown in fig. 7. In fig. 7, a Convolutional Neural Network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (where the pooling layer is optional), and a neural network 430. The input layer 410 may acquire an image to be processed, and process the acquired image to be processed by the convolution layer/pooling layer 420 and the following neural network layer 430, so as to obtain a processing result of the image. The internal layer structure of CNN 400 in fig. 7 is described in detail below.

Convolution layer/pooling layer 420:

The convolution/pooling layer 420 as shown in fig. 7 may include layers as examples 421-426, for example: in one implementation, layer 421 is a convolutional layer, layer 422 is a pooling layer, layer 423 is a convolutional layer, layer 424 is a pooling layer, layer 425 is a convolutional layer, and layer 426 is a pooling layer; in another implementation 421, 422 are convolutional layers, 423 is a pooling layer, 424, 425 are convolutional layers, 426 is a pooling layer. I.e. the output of the convolution layer may be used as input to a subsequent pooling layer or as input to another convolution layer to continue the convolution operation.

The internal principle of operation of one convolution layer will be described below using convolution layer 421 as an example.

The convolution layer 421 may comprise a number of convolution operators, also called kernels, which act in the image processing as a filter to extract specific information from the input image matrix, which may be essentially a weight matrix, which is usually predefined, which is usually processed pixel by pixel (or two pixels by two pixels, etc., depending on the value of the step size stride) in the horizontal direction on the input image during the convolution operation of the image, thereby completing the extraction of specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix produces a convolved output of a single depth dimension, but in most cases does not use a single weight matrix, but instead applies multiple weight matrices of the same size (row by column), i.e., multiple homography matrices. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by the "multiple" as described above.

Different weight matrices may be used to extract different features in the image, e.g., one weight matrix is used to extract image edge information, another weight matrix is used to extract a particular color of the image, yet another weight matrix is used to blur unwanted noise in the image, etc. The sizes (rows and columns) of the weight matrixes are the same, the sizes of the convolution feature images extracted by the weight matrixes with the same sizes are the same, and the convolution feature images with the same sizes are combined to form the output of convolution operation.

The weight values in the weight matrices are required to be obtained through a large amount of training in practical application, and each weight matrix formed by the weight values obtained through training can be used for extracting information from an input image, so that the convolutional neural network 400 can perform correct prediction.

When convolutional neural network 400 has multiple convolutional layers, the initial convolutional layer (e.g., 421) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 400 increases, features extracted by the later convolutional layers (e.g., 426) become more complex, such as features of higher level semantics, which are more suitable for the problem to be solved.

Pooling layer:

Since it is often desirable to reduce the number of training parameters, the convolutional layers often require periodic introduction of pooling layers, each of 421-426 as illustrated at 420 in FIG. 7, which may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. The only purpose of the pooling layer during image processing is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image. The average pooling operator may calculate pixel values in the image over a particular range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.

Neural network layer 430:

After processing by convolutional layer/pooling layer 420, convolutional neural network 400 is not yet sufficient to output the desired output information. Because, as previously described, the convolution/pooling layer 420 will only extract features and reduce the parameters imposed by the input image. However, to generate the final output information (the required class information or other relevant information), convolutional neural network 400 needs to utilize neural network layer 430 to generate an output of one or a set of the required number of classes. Thus, multiple hidden layers (431, 432 to 43n as shown in fig. 7) may be included in the neural network layer 430, and the output layer 440, where parameters included in the multiple hidden layers may be pre-trained according to relevant training data of a specific task type, for example, the task type may include image recognition, image classification, image detection, and image super-resolution reconstruction.

After the underlying layers in the neural network layer 430, i.e., the final layer of the overall convolutional neural network 400 is the output layer 440, the output layer 440 has a class-cross entropy-like loss function, specifically for calculating the prediction error, once the forward propagation of the overall convolutional neural network 400 (e.g., propagation from 410 to 440 in fig. 7) is completed, the backward propagation (e.g., propagation from 440 to 410 in fig. 7) begins to update the weights and deviations of the aforementioned layers to reduce the loss of the convolutional neural network 400 and the error between the result output by the convolutional neural network 400 through the output layer and the desired result.

The structure of the neural network specifically adopted by the image processing method in the embodiment of the present application may be as shown in fig. 8. In fig. 8, a Convolutional Neural Network (CNN) 500 may include an input layer 510, a convolutional layer/pooling layer 520 (where the pooling layer is optional), and a neural network 530. In contrast to fig. 7, the plurality of convolutional layers/pooling layers in the convolutional layer/pooling layer 520 in fig. 8 are parallel, and the features extracted respectively are input to the full neural network layer 530 for processing.

It should be noted that the convolutional neural network shown in fig. 7 and fig. 8 is only an example of two possible convolutional neural networks of the image processing method according to the embodiment of the present application, and in a specific application, the convolutional neural network adopted by the image processing method according to the embodiment of the present application may also exist in the form of other network models.

Fig. 9 is a hardware structure of a chip according to an embodiment of the present application, where the chip includes a neural network processor 600. The chip may be provided in an execution device 310 as shown in fig. 6 to perform the calculation of the calculation module 311. The chip may also be provided in a training device 320 as shown in fig. 6 for completing training work of the training device 320 and outputting the target model/rule 301. The algorithms of the layers in the convolutional neural network as shown in fig. 7 or fig. 8 may be implemented in the chip as shown in fig. 9.

The neural network processor NPU 600 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU) which distributes tasks. The NPU 600 has a core part of an arithmetic circuit 603, and the controller 604 controls the arithmetic circuit 603 to extract data in a memory (a weight memory or an input memory) and perform an operation.

In some implementations, the arithmetic circuitry 603 internally includes a plurality of processing units (PEs). In some implementations, the arithmetic circuit 603 is a two-dimensional systolic array. The arithmetic circuitry 603 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 603 is a general purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit 603 takes the data corresponding to the matrix B from the weight memory 602 and buffers the data on each PE in the arithmetic circuit 603. The arithmetic circuit 603 performs matrix operation on the matrix a data and the matrix B data from the input memory 601, and stores the partial or final result of the matrix in the accumulator 608 (accumulator).

The vector calculation unit 607 may further process the output of the operation circuit 603, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 607 may be used for network calculations of non-convolutional/non-FC layers in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector computation unit 607 can store the vector of processed outputs to the unified memory 606. For example, the vector calculation unit 607 may apply a nonlinear function to an output of the arithmetic circuit 603, for example, a vector of accumulated values, to generate an activation value. In some implementations, the vector calculation unit 607 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuitry 603, for example for use in subsequent layers in a neural network.

The unified memory 606 is used for storing input data and output data.

The weight data is transferred to the input memory 601 and/or the unified memory 606 directly by the memory cell access controller 605 (direct memory access controller, DMAC), the weight data in the external memory is stored in the weight memory 602, and the data in the unified memory 606 is stored in the external memory.

A bus interface unit (bus interface unit, BIU) 610 for implementing interactions between the main CPU, DMAC and finger memory 609 via the bus.

An instruction fetch memory (instruction fetch buffer) 609 coupled to the controller 604 is used to store instructions for use by the controller 604.

The controller 604 is configured to invoke an instruction cached in the instruction fetch memory 609, so as to control a working process of the operation accelerator.

Typically, the unified memory 606, the input memory 601, the weight memory 602, and the finger memory 609 are On-Chip (On-Chip) memories, and the external memory is a memory external to the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), a high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

The operations of the layers in the convolutional neural network shown in fig. 7 or fig. 8 may be performed by the operation circuit 603 or the vector calculation unit 607.

The execution device 310 in fig. 6 described above is capable of executing the image processing method or the respective steps of the image processing method of the embodiment of the present application, and the CNN model shown in fig. 7 and 8 and the chip shown in fig. 9 may also be used to execute the image processing method or the respective steps of the image processing method of the embodiment of the present application.

The following describes the searching method of the neural network according to the embodiment of the present application in detail with reference to fig. 10 to 24, and it should be noted that the target super-resolution network determined by the searching method of the neural network according to the embodiment of the present application may be used to execute the image processing method according to the embodiment of the present application.

As shown in fig. 10, an embodiment of the present application provides a system architecture 700. The system architecture includes a local device 720, a local device 730, and an execution device 710 and data storage system 750, where the local device 720 and the local device 730 are connected to the execution device 710 through a communication network.

The execution device 710 may be implemented by one or more servers. Alternatively, the execution device 710 may be used with other computing devices, such as: data storage, routers, load balancers, etc. The execution device 710 may be disposed on one physical site or distributed across multiple physical sites. The execution device 710 may use data in the data storage system 750 or invoke program code in the data storage system 750 to implement the method of searching for neural network structures of embodiments of the present application.

It should be noted that, the executing device 710 may also be referred to as a cloud device, and the executing device 710 may be deployed at the cloud.

Specifically, the execution device 710 may perform the following process: constructing a basic unit, wherein the basic unit is a network structure obtained by connecting basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for performing dimension reduction operation and residual connection operation on a first input feature map, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the residual connection operation is used for performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimension of the feature map processed by the first module is the same as that of the first input feature map; constructing a search space according to the basic unit and network structure parameters, wherein the network structure parameters comprise the types of basic modules used for constructing the basic unit, the search space is used for searching an image super-resolution network structure, and the basic unit is a basic module used for constructing an image super-resolution network; searching an image super-resolution network structure in the search space to determine a target image super-resolution network, wherein the target image super-resolution network is used for performing super-resolution processing on an image to be processed, the target image super-resolution network at least comprises the first module, and the target image super-resolution network is a network with calculated amount smaller than a first preset threshold value and image super-resolution precision larger than a second preset threshold value.

The process execution device 710 can acquire a target neural network by means of network structure search (neural architecture search, NAS), which can be used for image super-resolution processing, etc.

In one possible implementation, the method of performing the search for the network structure by the device 710 may be an offline search method performed in the cloud.

The user may operate respective user devices (e.g., local device 720 and local device 730) to interact with the execution device 710. Each local device may represent any computing device, such as a personal computer, computer workstation, smart phone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set top box, game console, etc.

The local device of each user may interact with the performing device 710 via a communication network of any communication mechanism/communication standard, which may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

In one implementation, the local devices 720, 730 may obtain relevant parameters of the target neural network from the executing device 710, deploy the target neural network on the local devices 720, 730, perform image super resolution processing using the target neural network, and so on.

In another implementation, the target neural network may be deployed directly on the execution device 710, where the execution device 710 obtains the image to be processed from the local device 720 and the local device 730, and performs image super-resolution processing on the image to be processed according to the target neural network.

For example, the target neural network may be a target image super-resolution network in the embodiment of the present application.

The following describes the neural network searching method according to the embodiment of the present application in detail with reference to fig. 11. The method shown in fig. 11 may be performed by a neural network search device, which may be a device with sufficient computing power for neural network searching, such as a computer, a server, or the like.

The method 800 shown in fig. 11 includes steps 810 through 830, each of which is described in detail below.

Step 810: the method comprises the steps of constructing a basic unit, wherein the basic unit is a network structure obtained by connecting basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for carrying out dimension reduction operation and residual error connection operation on a first input feature map, the dimension reduction operation is used for converting the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the residual error connection operation is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module, and the dimension of the feature map processed by the first module is identical with that of the first input feature map.

In one possible implementation manner, the above basic unit may be a basic module for constructing an image super-resolution network, and as shown in fig. 12, the target image super-resolution network may include three major parts, that is, a feature extraction part, a nonlinear variation part and a reconstruction part, where the feature extraction module is used to obtain image features of an image to be processed, and in fig. 12, the image to be processed may be a low resolution image (LR); the nonlinear transformation part is used for transforming the image features of the image to be input, mapping the image features from a first feature space to a second feature space, wherein the first feature space is the feature space where the image to be processed is extracted, and the second high-dimensional space is easier to reconstruct the super-resolution image under the normal condition; the reconstruction part is used for carrying out up-sampling and convolution processing on the image characteristics output by the nonlinear variation part to obtain a super-resolution image corresponding to the image to be input. In the embodiment of the application, the nonlinear transformation part network structure can be searched in the search space in a NAS mode.

In one possible implementation manner, the first input feature map input to the first module is a first scale, the first input feature map input to the first module is transformed to a second scale after undergoing a dimension reduction operation, the first input feature map of the second scale is transformed to a third scale after undergoing a dimension increase operation, the third scale is located between the first scale and the second scale, and at this time, in order to realize a residual connection operation, that is, connection of feature maps on the same scale, the dimension reduction operation and dimension reduction can be performed on the first input feature map of the first scale again until the dimension of the first input feature map of the third scale is the same, that is, the dimension of the feature map processed by the first module is the same as the dimension of the first input feature map.

The above-described basic cell may be a network obtained by connecting basic modules according to basic operations of a neural network. The basic module may include a first module, where the first module may be a scale module (contextual residual dense block, CRDB), and the scale module may be used to perform a dimension reduction operation and a residual connection operation on the first input feature map, that is, the scale module may include a pooling sub-module and a residual connection for processing the first input feature map.

Illustratively, the scale of the first input feature map may be reduced by a dimension reduction operation, where the dimension reduction operation may refer to a pooling operation on the first input feature map, or the dimension reduction operation may also refer to a convolution operation with a step size Q on the first input feature map, where Q is a positive integer greater than 1.

The residual connection operation is used for performing feature addition processing on the first input feature map and the feature map processed by the first module, where the feature map processed by the first module may refer to a feature map processed by an up-scaling operation, the up-scaling operation refers to restoring the scale of the feature map processed by the down-scaling operation to the original first scale, and the residual connection operation may refer to performing feature addition processing on the first input feature map and the feature map processed by the up-scaling operation.

For example, the above dimension up operation may refer to an up-sampling operation, or the dimension up-sampling operation may refer to a reverse convolution operation (backwards strided convolution), where the up-sampling operation may refer to an interpolation method, that is, inserting new elements between pixels based on original image pixels by using a suitable interpolation algorithm; deconvolution operations may refer to the inverse of convolution operations, also known as transpose convolutions.

It should be understood that feature addition may refer to adding information of different channel features for feature maps of the same scale.

In the embodiment of the application, the scale module can carry out residual connection on the input feature map, namely, the feature addition processing can be carried out on the first input feature map and the feature map processed by the first module, so that the purpose that more local detail information in the first input feature map can be transferred to a later convolution layer is realized. The scale module can be used for performing the dimension reduction operation on the first input characteristic diagram when the scale module ensures that enough local detail information of the first input characteristic diagram is transferred to a subsequent convolution layer. On the one hand, the dimension of the input feature map can be reduced through dimension reduction operation, so that the calculation amount of a model is reduced, meanwhile, the residual error connection operation can well transfer information of a front layer to a rear layer, and the defect of losing dimension reduction operation information is overcome. On the other hand, the dimension reduction operation can also quickly expand the receptive field of the features, so that the prediction of the high-resolution pixel points can better consider the context information, and the super-resolution precision is improved.

It should be understood that the image super-resolution reconstruction technique refers to a high-resolution image obtained by reconstructing a low-resolution image. Therefore, more local information of image features is needed in the image super-resolution processing, and the current commonly used image super-resolution network model does not use the dimension reduction operation, mainly because the dimension reduction operation can lose the local information of part of the low-resolution input image. In the embodiment of the application, the information in the input feature map is better kept in the whole network through residual connection operation and/or dense connection operation, namely, the information of the front layer can be well transferred to the rear layer, so that the defect of information loss caused by dimension reduction operation can be overcome. The dimension reduction operation can reduce the calculated amount of the model, expand the receptive field of the features and improve the precision of the image super-resolution network.

In one possible implementation, a specific form of the network structure of the scale module may be as shown in fig. 13. Three scale modules are shown in fig. 13, a d-1 scale module, a d-th scale module, and a d+1-th scale module, respectively. The d-th scale module may include a pooling sub-module, and the dimension reduction operation may be used to downsample the input feature map, thereby reducing the feature size.

For example, the dimension reduction operation may refer to a pooling operation, such as average pooling, or maximum pooling.

In the schematic diagram of the scale module network structure shown in fig. 13, the residual connection may refer to the characteristic addition of the output characteristic diagram of the CRDB d-1 module and the processed characteristic diagram, where the processed characteristic diagram refers to the characteristic diagram obtained by respectively performing the pooling operation, the 3×3 convolution operation, the linear rectification function (RECTIFIED LINEAR unit, reLU), the dimension-increasing operation and the 1×1 convolution operation on the input characteristic diagram.

Further, in order to improve the performance of the target image super-resolution network. The scale module may be further configured to perform a dense connection operation on the first input feature map, where the dense connection operation may refer to feature stitching an output feature map of each of the i-1 convolution layers with the input feature map as an input feature map of an ith convolution layer.

Illustratively, a specific form of the network structure of the scale module may be as shown in fig. 14. The dense connectivity operation allows maximum information flow in the network by each layer being connected to all layers preceding that layer, i.e. the input to each layer is a concatenation of outputs of all the preceding layers. The information (during forward calculation) or gradient (during backward calculation) in the input feature map can be better kept in the whole network through dense connection operation, so that the defect of losing the information during dimension reduction operation can be better overcome. In other words, when the image super-resolution processing is performed, the information in the feature map can be ensured to be better transferred to a later layer in the network structure through the residual error connection operation and the dense connection operation, and at the moment, the feature size is reduced by adopting the dimension reduction operation to carry out downsampling on the input feature map, so that the calculation amount of a model can be reduced under the condition that the accuracy of the image super-resolution processing is ensured.

It should be understood that feature stitching may refer to stitching M feature graphs of the same scale into one feature graph with K channels, where K is a positive integer greater than M.

For example, as shown in fig. 14, the dense connection operation refers to transferring the output feature map of each layer to the following layers, where the input of the following layers is obtained by performing feature map stitching through the output of the preceding layers.

The deeper the network structure is for the neural network structure, i.e. the greater the number of convolution layers in the network structure, the greater the accuracy of the resulting processed image.

In one possible implementation, a specific form of the network structure of the scale module may be as shown in fig. 15. The scale module can be used for carrying out residual connection operation, dimension reduction operation, convolution operation and cyclic dense connection operation on the input feature map, namely the scale module can comprise residual connection, pooling sub-module, convolution sub-module and cyclic dense connection. The depth of the scale module network structure can be increased through cyclic dense connection operation, so that the precision of super-resolution processing is improved.

Note that, the calculation of the loop (repetitive) operation on the feature map of the normal scale increases the calculation amount rapidly, but the calculation amount is small when the feature map after the dimension reduction operation is subjected to the loop operation. The combination of a certain number of operations and dimension reduction operations can improve the superresolution without increasing the calculation amount and the parameter amount.

The first module, namely the scale module, provided by the embodiment of the application can reduce the calculated amount, reduce the parameters, enlarge the receptive field, and decouple the parameter amount and the calculated amount. First, the dimension reduction operation in the dimension module can reduce the calculation amount of the network structure by reducing the dimension of the feature map.

For example, taking the dimension reduction operation as 2×2 pooling as an example, assuming that the size of the input feature map is C _in ×w×h, the convolution kernel is k×k, and the number of channels of the output feature is C _out, the calculation amount of the normal convolution is:

FLOPs_ori＝2(C_inK²+1)C_out·HW；

Wherein the computational effort of the network model may be represented by floating point operations per second (floating point operations, FLOPs). FLOPs _ori represents the computational effort of the network model through normal convolution.

The calculated amount of convolution after the pooling operation (FLOPs _pool) is:

By comparing the above calculation amount of the normal convolution with the calculation amount of the convolution after the pooling operation, the pooling operation can reduce the calculation amount by 75%, and even if the three-cycle operation is added, only the original calculation amount is restored.

Optionally, in an embodiment of the present application, the first module is further configured to perform a rearrangement operation, where the rearrangement operation is to combine the plurality of first channel features of the first input feature map according to a preset rule to generate a second channel feature, where a resolution of the second channel feature is higher than a resolution of the first channel feature.

For example, as shown in fig. 16, the rearrangement operation may refer to that 4 different first feature channels are combined according to rules from left to right and from top to bottom, and the 4 positive channel feature maps are combined into a second channel feature map, where the resolution of the second channel feature map is higher than that of the first channel feature map.

It should be noted that, the above description uses 4 feature maps as the plurality of feature maps, and the preset rule is exemplified from left to right and from top to bottom, and the present application is not limited in any way. The dimension-increasing sub-modules shown in fig. 13-15 may all be used to perform a rearrangement operation.

By the dimension up sub-module shown in fig. 13 to 15, i.e. by performing a rearrangement operation before the 1x1 convolution, a certain amount of parameters can be reduced. The rearrangement operation can be seen as converting a plurality of low-scale feature channels into one high-scale feature channel, thereby reducing the number of channels.

For example, taking the dimension reduction operation 2x2 pool as an example, assuming that the number of convolution layers is N _conv, the number of output channels of each convolution layer is G, and the number of output channels of the 1x1 convolution layer is C _out, the parameter param _ori of the normal convolution is:

Param_ori＝N_conv·G·C_out；

The parameter number of the scale module is param _up:

As can be seen from the comparison of the above-mentioned parameter amounts of the normal convolution with the parameter amounts of the scale module, up-sampling by using a reordering operation can reduce the scale module by 75% of the parameter amounts after the 1x1 convolution layer.

In the implementation of the application, the basic module for constructing the basic unit comprises a scale module, on one hand, the scale module can enlarge the receptive field through dimension reduction operation, and the prediction of the high-resolution pixel point is promoted to better consider the information of the context; on the other hand, since the common superdivision method does not use dimension reduction operation, the dimension of the input feature map of the whole nonlinear variation part is not changed, so that the linear relation between the parameter quantity and the calculated quantity is caused. The dimension module provided by the embodiment of the application uses dimension reduction operation to enable the parameter quantity and the calculated quantity to be relatively independent, and gives more possibility to a search algorithm in the NAS.

In an embodiment of the application, the basic modules constituting the basic unit comprise, in addition to the first modules described above, i.e. the scale modules, also second modules and/or third modules. The second module and the third module further included in the base module are described in detail below with reference to fig. 17 to 19.

In one possible implementation, a second module may also be included in the base module, which may be a compact module (shrink residual dense block, SRDB). The compaction module may refer to performing channel compression processing on the basis of the residual dense module (residual dense block, RDB), thereby implementing a reserved dense connection and effectively reducing the number of model parameters.

Specifically, the compacting module is configured to perform a channel compression operation, a residual connection operation, and a dense connection operation on the second input feature map, where the channel compression operation may refer to performing a convolution operation with a convolution kernel of 1×1 on the second input feature map.

It should be appreciated that when the second module is the first module in the base unit, the second input profile may refer to a profile of the last base unit output of that base unit; when the second module is not the first module in the base unit, the second input profile may refer to a profile that is output after being processed by a module preceding the module. In the embodiment of the application, the first input feature map, the second input feature map and the third input feature map all correspond to the same image to be processed.

For example, the network structure of the compact modules may be as shown in fig. 17, and three compact modules, d-1 st compact module, d-th compact module, and d+1 th compact module, respectively, are shown in fig. 17. The number of channels can be compressed by adopting the 1x1 convolution check feature diagram, and then the 3x3 convolution feature transformation is carried out, so that a compact residual error dense module can be simply called a compact module, and the number of parameters can be greatly reduced under the condition of keeping dense links.

In one possible implementation, a third module may also be included in the base module, which may be referred to as a grouping module (group residual dense block, GRDB). The grouping module may refer to dividing the convolution operation into a plurality of separate computations on the basis of a residual intensive module, thereby facilitating a reduction in model parameters.

Specifically, the grouping module may be a module configured to perform a channel switching operation, a residual connection operation, and a dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process may refer to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, where M is an integer greater than 1.

For example, the network structure of the grouping module may be as shown in fig. 18, and three grouping modules, namely, the d-1 th grouping module, the d-th grouping module, and the d+1 th grouping module are shown in fig. 18. The input of the convolution layer is formed by directly splicing the characteristic diagrams of the previous layers, if the group convolution is directly adopted, the single channel characteristic of the output layer can only accept the characteristic of the previous convolution layer, so that the mutual cooperation among the channel characteristics is not facilitated. Therefore, in the embodiment of the application, channel switching (Channel switching) operation is added on the basis of the residual dense module, so that the residual dense module forming the packet can be simply called as a packet module, thereby effectively reducing the parameter quantity of the network.

Illustratively, as shown in fig. 19, assuming that the third input feature map includes three sub-feature maps 1,2, and 3, each sub-feature map includes 3 adjacent channel features, channel switching may be to reorder originally adjacent channel features in the same sub-feature map, so as to obtain channel feature adjacency corresponding to different sub-feature maps.

In the embodiment of the present application, the basic unit is a network structure obtained by connecting basic modules through a basic operation of a neural network, and one cell as shown in fig. 12 may be one basic unit, which is a basic module for constructing an image super-resolution network. The basic modules are used to construct basic cells, as shown in fig. 20, and each basic cell may be obtained by connecting different basic modules through a basic operation of a neural network, where the basic modules may include one or more of the first module, the second module, and the third module.

Step 820, constructing a search space according to the basic unit and the network structure parameter, wherein the network structure parameter comprises the type of the basic module used for constructing the basic unit, and the search space is used for searching the super-resolution network structure of the image.

The specific form of the basic unit may be any one of the possible implementation manners of the step 810.

Illustratively, in an embodiment of the present application, the network parameters may include:

(1) The number of basic units;

(2) A module type selected by each base unit (cell);

for example, the types of the basic modules may include three different types of the first module, the second module, and the third module, for example, C represents the first module, i.e., the scale module, S represents the second module, i.e., the compacting module, and G represents the third module, i.e., the grouping module.

(3) The number of convolutional layers in one module in the base unit;

For example, the number of convolutional layers may be {4,6,8}.

(4) The number of channels output by each convolution layer in a module in the basic unit;

For example, the number of channels may be {16,24,32,48}.

(5) The number of channels output by the whole basic unit;

for example, the number of output channels of one basic unit may be {16,24,32,48}.

(6) State of the basic unit: 1 indicates that the current node is accessing the network, and 0 indicates that the current node is not accessing the network.

In the embodiment of the application, the search space obtained by constructing the basic unit through the basic module can select candidate network structures in the given type of the basic module, which is equivalent to discretizing continuous search space, and can effectively reduce the size of the search space.

Step 830: searching an image super-resolution network structure in a search space to determine a target image super-resolution network, wherein the target image super-resolution network is used for performing super-resolution processing on an image to be processed, the target image super-resolution network at least comprises a first module, and the target image super-resolution network is a network with calculated amount smaller than a first preset threshold value and image super-resolution precision larger than a second preset threshold value.

It should be understood that, the above-mentioned search for the super-resolution network structure of the image in the search space to determine the super-resolution network of the target image may refer to a network structure satisfying the constraint condition by searching in the search space through an algorithm search, or may refer to a network structure satisfying the constraint condition selected in the search space by manual search.

In one possible implementation manner, the constraint condition may mean that the calculated amount is smaller than a first preset threshold value and the image super-resolution precision is larger than a second preset threshold value, so that the precision of the target image super-resolution network for performing the image super-resolution processing is higher under the condition that the calculation performance of the mobile device is limited.

In one possible implementation, the constraint condition may refer to the calculated amount being smaller than a first preset threshold, the image super-resolution precision being larger than a second preset threshold, and the parameter amount being smaller than a third preset threshold.

By way of example, common search algorithms may include, but are not limited to, the following: random search, bayesian optimization, evolutionary algorithm, reinforcement learning, gradient-based algorithm, etc. The specific flow of the method for searching the super-resolution network structure of the image in the search space can refer to the prior art, and detailed description of all the searching methods is omitted in the present application for brevity.

In one embodiment, the lightweight, fast and highly accurate supersplit network structure can be searched by an evolutionary algorithm by targeting the parameters, computation and model effects (PSNR) of the network model.

For example, the process of performing a network search in the search space to determine a target image super-resolution network includes the steps of: performing network searching in a search space through an evolutionary algorithm to determine a first image super-resolution network; and carrying out back propagation iterative training on the first image super-resolution network through a multi-stage weighted joint loss function to determine a target image super-resolution network, wherein the multi-stage weighted joint loss function is determined according to loss between a predicted super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network and a sample super-resolution image.

In other words, in the embodiment of the application, the parameters of the target image super-resolution network can be finally determined to obtain the target image super-resolution network by performing secondary training on the first image super-resolution network determined by the evolutionary algorithm through the multi-stage weighted joint loss function.

Specifically, searching the target image super-resolution network in the search space through the evolutionary algorithm comprises the following steps: randomly generating P candidate network structures according to the basic unit; training the P candidate network structures by adopting a multi-stage weighting joint loss function; evaluating performance parameters of each candidate network structure in the P trained candidate network structures, wherein the performance parameters comprise peak signal-to-noise ratio (SNR) which is used for indicating the difference between a predicted superdivision image and a sample superdivision image obtained through each candidate network structure; and determining the first image super-resolution network according to the performance parameters of the candidate network.

For example, as shown in fig. 22, the evolutionary algorithm execution flow may include the steps of:

The first step: randomly generating P individuals (i.e., candidate network structures) which are an initializing population;

And a second step of: evaluating the fitness (i.e., performance parameters) of each network structure, including parameters, computation, and accuracy, which may be measured by peak signal-to-noise ratio (PSNR);

And a third step of: selecting and updating elite individuals, wherein the elite individuals can be network structures with performance parameters meeting preset conditions;

fourth step: generating next generation individuals by crossover and mutation;

fifth step: repeating the second step to the fourth step until the evolutionary algorithm converges, and returning to the final elite individual (namely the first image super-resolution network). The elite individual may refer to a target network structure determined by performing an algorithm.

When the performance parameters of the network structure are evaluated in the second step, the multi-stage weighted joint loss function provided by the application can be used for training the network structure to be evaluated, and the peak signal-to-noise ratio of the network structure can be evaluated after the multi-stage weighted joint loss function is trained.

For example, the multi-level weighted joint loss function is obtained according to the following equation,

Where L may represent a multi-level weighted joint loss function, L _k may represent a loss value of a kth layer of the first image super-resolution network, the loss value may refer to an image loss between a predicted super-resolution image corresponding to an output feature map of the kth layer and a sample super-resolution image, and λ _k,t may represent a weight of the loss value of the kth layer at time t.

For example, as shown in FIG. 21, the degree of training of the underlying base unit may vary due to the number of base modules in the base unit. In order to learn parameters of basic units at the bottom layer more fully and improve search stability and model performance, the embodiment of the application provides a multi-stage weighting joint loss function, namely a predicted super-resolution image can be obtained according to an output feature diagram of each basic unit during training, a loss value between the predicted super-resolution image and a sample super-resolution image is calculated, and the image loss value of each basic unit is weighted and then the network is trained.

It should be noted that the weight of each intermediate layer image loss may vary with time (or the number of iterations). The loss function can be combined with the predicted image loss of each middle layer and embody the importance degrees of different layers in a weighted mode, wherein the weight value of each middle layer image loss can be changed along with the change of time, so that the parameters of the basic unit of the bottom layer can be more fully trained, and the performance of the super-resolution network is improved.

TABLE 1

Table 1 is the performance of the basic module of the present application constructed basic unit by testing on a standard supersplit dataset. Table 1 shows experimental results of several image super-resolution network models constructed by the basic module provided by the application, wherein the number of floating point operations per second (floating point operations, FLOPs) represents the calculated amount of the network model, namely the number of floating point operations per second, and can be used for describing the calculated amount of the neural network and evaluating the calculation efficiency of the model; parameter numbers (parameters) may be used to describe the number of parameters that the neural network contains, for evaluating the size of the model; SET5, SET14, B100, urban100 represent the names of different data SETs, and by training of the data SETs, the image super-resolution precision of the network model can be evaluated, for example, the peak signal-to-noise ratio PSNR of the network model can be evaluated; baseline represents a small, residual-dense network. As can be seen from table 1, the basic modules (including, for example, the scale module GRDN, the compacting module SRDN, and the grouping module CRDN) and the target image super-resolution network, that is, the efficient super-resolution network (EFFICIENT SUPER-resolution network, ESRN) according to the embodiments of the present application can effectively improve model accuracy under the condition that the parameter amount and the calculation amount are unchanged.

TABLE 2

Table 2 shows the test results of the multi-stage weighted joint loss function according to the embodiment of the present application. Table 2 shows experimental results of a deep convolutional network after applying a multi-stage weighted Joint loss function, where join loss represents a network model trained by the multi-stage weighted loss function proposed by the embodiments of the present application. As can be seen from the table 2, the accuracy of the image super-resolution network can be effectively improved by training the image super-resolution network through the multi-stage weighting joint loss function provided by the embodiment of the application.

TABLE 3 Table 3

/>

Table 3 is statistics of results of the image super-resolution network provided in the embodiments of the present application on a standard dataset. Wherein, the type 1 indicates that the running time of the image super-resolution model is Fast; the running time of the type 2 image super-resolution model is Very Fast; the model comprises a selection supersplit network (deep networks with SUs, selNet), a cascading residual network (CASCADING RESIDUAL NETWORK, CARN), a small cascading residual network (MINI CASCADING residual network, CARN-M), a lightweight fast super-resolution network (fast accurate and light super-resolution network, FALSR), FALSR-A and FALSR-B represent different network models; the ESRN represents a target image super-resolution network in an embodiment of the present application, i.e., a high-efficiency super-resolution network, which may be, for example, a fast high-efficiency super-resolution network (FAST EFFICIENT super-resolution network, ESRN-F), and a small high-efficiency super-resolution network (MINI EFFICIENT super-resolution network, ESRN-M). As can be seen from Table 3, the calculated amount and the image super-resolution precision of the target image super-resolution network and the basic module provided by the embodiment of the application are superior to those of other network models.

TABLE 4 Table 4

/>

Table 4 shows test results of the target image super-resolution network provided by the embodiment of the application on different super-resolution scales. Wherein, the multiple of x 3 represents that the super-resolution test of 3-fold scale is performed on the basis of 720p (1280 x 720) of the output super-resolution image; the multiple x 4 indicates that 4-fold scale super-resolution test is performed on the basis of 720p (1280 x 720) of the output super-resolution image; the model includes super-resolution convolutional neural networks (super resolution convolutional neural network, SRCNN), deep super-split networks (very deep convolutional super resolution network, VDSR), selNet, CARN, CARN-M, ESRN, ESRN-F, ESRN-M.

The calculation of FLOPs in the above tables 1 to 3 is a test result of performing x2 scale image super resolution processing by taking 720p (1280 x 720) as an example of outputting a super resolution image, and as can be seen from the data in tables 1 to 3, the neural network searching method provided by the embodiment of the application can find a model with better super resolution under different parameter amounts. In addition, since the dimension reduction operation is introduced into the image super-resolution network provided by the embodiment of the application, a rapid medium parameter model can be searched through the calculated amount FLOPs of the constraint model, and the calculated amount can be reduced by nearly half under the condition that the image super-resolution effect is ensured to be higher than that of the FALSR-A model. Meanwhile, image super-resolution tests are respectively carried out on the x3 scale and the x4 scale, the test results are shown in table 5, and experimental results of the lightweight network (ESRN, ESRN-F, ESRN-M) searched by the neural network searching method implemented by the application on different super-resolution scales are all higher than those of other network models.

TABLE 5

Model	RDN	CARN	ESRN	ESRN-F	ESRN-M
						GPU runtime (ms)	181.5	45.6	52.5	36.2	30.9

Table 5 is a test result of the runtime of the target image super-resolution network provided by the embodiment of the present application. As can be seen from Table 5, the super-resolution network obtained by the neural network searching method in the embodiment of the application has high precision and high operation efficiency.

Fig. 23 and 24 are effect diagrams of image super-resolution processing performed by the target image super-resolution network determined by the search method of the neural network of the embodiment of the present application.

Fig. 23 and fig. 24 show the image effect after the image super-resolution network constructed by the basic module according to the present application performs the image resolution processing. Taking the super-resolution processing of the scale x3 as an example, fig. 23 shows a visual effect diagram after the super-resolution processing of the image in the Set14 dataset. Fig. 24 shows a visual effect diagram after super resolution processing of images in the Urban100 dataset. Among these are high resolution networks (HR), pyramid superdivision networks (DEEP LAPLACIAN PYRAMID networks for super-resolution, lapSRN), bicubic interpolation networks, CARN-M, CARN, VDSR, ESRN-M, ESRN. Images with higher definition can be obtained by using the lightweight, efficient deep convolutional networks (e.g., ESRN-M) provided by embodiments of the present application in both the Urban100 and Set14 datasets. Therefore, the super-resolution network of the image obtained by the neural network searching method can not only reduce the network parameter quantity and the calculated quantity, but also effectively improve the visual effect of the super-resolution of the image, so that the edge of the super-resolution image is clearer.

Fig. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The method 900 shown in fig. 25 includes steps 910 and 920, and the details of steps 910 and 920 are described below.

Step 910: and acquiring an image to be processed.

The image to be processed may be an image captured by the electronic device through a camera, or the image to be processed may also be an image obtained from inside the electronic device (for example, an image stored in an album of the electronic device, or a picture obtained by the electronic device from the cloud).

Step 920: and performing super-resolution processing on the image to be processed according to a target image super-resolution network to obtain a target image, wherein the target image is a super-resolution image corresponding to the image to be processed.

The target image super-resolution network may be obtained according to the method shown in fig. 11.

The target image super-resolution network is a network determined by searching an image super-resolution network structure in a search space, the search space is constructed by a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise types of basic modules used for constructing the basic unit, the basic unit is a network structure obtained by connecting the basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for carrying out residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is used for carrying out feature addition processing on the first input feature map and the feature map processed by the first module, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, and the target image super-resolution network at least comprises the first module, and the dimension of the feature map processed by the first module is the same as that of the first input feature map.

Optionally, in a possible implementation manner, the base module further includes a second module and/or a third module, where the second module is configured to perform a channel compression operation, the residual connection operation, and the dense connection operation on a second input feature map, where the channel compression operation is a convolution operation with a convolution kernel of 1×1 on the second input feature map; the third module is configured to perform a channel switching operation, the residual error connection operation, and the dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.

Wherein L represents the multi-level weighted joint loss function, L _k represents a loss value of a kth unit of the first image super-resolution network, the loss value refers to an image loss between a predicted super-resolution image corresponding to an output feature map of the kth unit and the sample super-resolution image, λ _k,t represents a weight of a loss value of the kth layer at a time t, N represents the number of the unit cells included in the first image super-resolution network, and N is an integer greater than or equal to 1.

Fig. 26 is a schematic flowchart of an image display method provided by an embodiment of the present application. The method 1000 shown in fig. 26 includes steps 1010 through 1040, each of which is described in detail below.

Step 1010, a first operation by a user to turn on the camera is detected.

And step 1020, in response to the first operation, displaying a shooting interface on the display screen, and displaying the shooting interface on the display screen, wherein the shooting interface comprises a view-finding frame, and the view-finding frame comprises a first image.

In one example, the shooting behavior of the user may include a first operation by the user to turn on the camera; and displaying a shooting interface on a display screen in response to the first operation.

Fig. 27 (a) shows a graphical user interface (GRAPHICAL USER INTERFACE, GUI) of the handset, which is the desktop 1110 of the handset. When the electronic device detects an operation that the user clicks on an icon 1120 of a camera Application (APP) on the desktop 1110, the camera application may be launched, displaying another GUI as shown in fig. 27 (b), which may be referred to as a photographing interface 1130. A viewfinder 1140 may be included on the capture interface 1130. In the preview state, a preview image can be displayed in real time in the viewfinder 1140.

For example, referring to (b) of fig. 27, after the electronic device starts the camera, a first image, which is a color image, may be displayed in the viewfinder 1140. A control 1150 for indicating a photographing mode may also be included on the photographing interface, as well as other photographing controls.

In one example, the shooting behavior of the user may include a first operation by the user to turn on the camera; and displaying a shooting interface on a display screen in response to the first operation. For example, the electronic device may start the camera application after detecting a first operation of clicking an icon of the camera Application (APP) on the desktop by the user, and display the photographing interface. A viewfinder may be included on the capture interface, it being understood that the viewfinder may be sized differently in the capture mode and the record mode. For example, the viewfinder may be a viewfinder in a photographing mode. In the video mode, the viewfinder may be the entire display screen. In the preview state, before the user turns on the camera and does not press the photographing/video button, the preview image can be displayed in real time in the viewfinder.

In one example, the preview image may be a color image, and the preview image may be an image displayed with the camera set to an automatic resolution.

Step 1030, detecting that the user indicates a second operation of the camera.

For example, it may be that a second operation indicating the first processing mode by the user is detected. Wherein the first processing mode may be a professional photographing mode (e.g., a super resolution photographing mode). Referring to fig. 28 (a), a photographing option 1160 is included on the photographing interface, and after the electronic device detects that the user clicks the photographing option 1160, referring to fig. 28 (b), the electronic device displays a photographing mode interface. After the electronic device detects that the user clicks the shooting mode interface to indicate the professional shooting mode 1161, the mobile phone enters the professional shooting mode.

For example, a second operation for instructing photographing by the user may be detected, which is an operation for instructing photographing in the case of photographing a distant object or photographing a minute object. Referring to fig. 28 (c), the electronic apparatus detects a second operation 1170 for the user to instruct photographing in a low-illuminance environment.

It should be appreciated that the second operation by the user to indicate the photographing behavior may include pressing a photographing button in a camera of the electronic device, may include the user device indicating the photographing behavior through voice, or may further include the user other indicating the photographing behavior by the electronic device. The foregoing is illustrative and not intended to limit the application in any way.

Step 1040, in response to the second operation, displaying a second image in the viewfinder, where the second image is an image obtained by performing super-resolution processing on the first image acquired by the camera, and the target image is a super-resolution image corresponding to the image to be processed.

The target image super-resolution network is a network determined by searching an image super-resolution network structure in a search space, the search space is constructed by a basic unit and network structure parameters, the search space is used for searching the image super-resolution network structure, the network structure parameters comprise the type of the basic module used for constructing the basic unit, the basic unit is a network structure obtained by connecting the basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for carrying out residual connection operation and dimension reduction operation on a first input feature map, the residual connection operation is to carry out feature addition processing on the first input feature map and the feature map processed by the first module, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the target image super-resolution network at least comprises the first module, and the first module is used for carrying out feature map processing and the feature map with the same dimension as the first input feature map.

Referring to fig. 28, a second image is displayed in the viewfinder in fig. 28 (d), and a first image is displayed in the viewfinder in fig. 28 (c), the second image and the first image being identical or substantially identical in content, but the second image being superior in image quality to the first image, for example, the second image being higher in resolution than the first image.

Wherein L represents the multi-level weighted joint loss function, L _k represents a loss value of a kth unit of the first image super-resolution network, the loss value refers to an image loss between a predicted super-resolution image corresponding to an output feature map of the kth unit and a sample super-resolution image, λ _k,t represents a weight of a loss value of the kth layer at a time t, N represents the number of the unit included in the first image super-resolution network, and N is an integer greater than or equal to 1.

It should be understood that the above description is intended to aid those skilled in the art in understanding the embodiments of the present application, and is not intended to limit the embodiments of the present application to the specific values or particular scenarios illustrated. It will be apparent to those skilled in the art from the foregoing description that various equivalent modifications or variations can be made, and such modifications or variations are intended to be within the scope of the embodiments of the present application.

The searching method and the image processing method for the neural network provided by the embodiment of the present application are described in detail above with reference to fig. 1 to 28, and the device embodiment of the present application will be described in detail below with reference to fig. 29 and 30. It should be understood that the searching device for the neural network in the embodiment of the present application may perform the searching method for the neural network in the foregoing embodiment of the present application, the image processing device may perform the various image processing methods in the foregoing embodiment of the present application, that is, the following specific working processes of various products may refer to the corresponding processes in the foregoing method embodiment.

Fig. 29 is a schematic hardware structure of a search device for a neural network according to an embodiment of the present application. The neural network search apparatus 1200 shown in fig. 29 (the apparatus 1200 may be a computer device in particular) includes a memory 1201, a processor 1202, a communication interface 1203, and a bus 1204. Wherein the memory 1201, the processor 1202 and the communication interface 1203 are communicatively coupled to each other via a bus 1204.

The memory 1201 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 1201 may store a program, and when the program stored in the memory 1201 is executed by the processor 1202, the processor 1202 is configured to perform respective steps of the neural network searching method of the embodiment of the present application, for example, perform respective steps shown in fig. 11.

It should be understood that the search device of the neural network shown in the embodiment of the present application may be a server, for example, may be a cloud server, or may also be a chip configured in the cloud server.

The processor 1202 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application SPECIFIC INTEGRATED Circuit (ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to implement the neural network search method of the present application.

The processor 1202 may also be an integrated circuit chip with signal processing capabilities. In implementation, various steps of the neural network search method of the present application may be performed by integrated logic circuitry of hardware or instructions in software form in the processor 1202.

The processor 1202 may also be a general purpose processor, a digital signal processor (DIGITAL SIGNAL processing unit, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1201, and the processor 1202 reads the information in the memory 1201, and in combination with its hardware, performs the functions that the units included in the searching apparatus of the neural network need to perform, or performs the searching method of the neural network shown in fig. 11 of the method embodiment of the present application.

The communication interface 1203 uses a transceiver device, such as, but not limited to, a transceiver, to enable communication between the device 1200 and other devices or communication networks.

The bus 1204 may include a path to transfer information between various components of the device 1200 (e.g., the memory 1201, the processor 1202, the communication interface 1203).

Fig. 30 is a schematic hardware configuration diagram of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 1300 shown in fig. 30 includes a memory 1301, a processor 1302, a communication interface 1303, and a bus 1304. The memory 1301, the processor 1302, and the communication interface 1303 implement communication connection therebetween through the bus 1304.

The memory 1301 may be a ROM, a static storage device, and a RAM. The memory 1301 may store a program, and when the program stored in the memory 1301 is executed by the processor 1302, the processor 1302 and the communication interface 1303 are used to perform respective steps of the image processing method of the embodiment of the present application, for example, the respective steps of the image processing method shown in fig. 25 and 26 may be performed.

The processor 1302 may employ a general-purpose, CPU, microprocessor, ASIC, GPU, or one or more integrated circuits for performing the procedures required for performing the functions of the elements in the image processing apparatus of an embodiment of the application or for performing the image processing method of an embodiment of the method of the application.

The processor 1302 may also be an integrated circuit chip with signal processing capabilities. In implementation, various steps of the image processing method of the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 1302 or an instruction in the form of software.

The processor 1302 may also be a general purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1301, and the processor 1302 reads information in the memory 1301, and in combination with its hardware, performs functions required to be performed by units included in the image processing apparatus of the embodiment of the present application, or performs the image processing method of the embodiment of the method of the present application.

The communication interface 1303 enables communication between the apparatus 1300 and other devices or communication networks using a transceiver apparatus such as, but not limited to, a transceiver. For example, the image to be processed may be acquired through the communication interface 1303.

Bus 1304 may include a path to transfer information between various components of device 1300 (e.g., memory 1301, processor 1302, communication interface 1303).

It should be noted that although the apparatus 1200 and apparatus 1300 described above only illustrate memories, processors, communication interfaces, those skilled in the art will appreciate that in a particular implementation, 1200 and apparatus 1300 may also include other devices necessary to achieve proper operation. Also, those skilled in the art will appreciate that the apparatus 1200 and the apparatus 1300 described above may also include hardware devices that implement other additional functions, as desired. Furthermore, those skilled in the art will appreciate that the apparatus 1200 and apparatus 1300 described above may also include only the necessary devices to implement embodiments of the present application, and not necessarily all of the devices shown in fig. 29 or 30.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for searching a neural network, comprising:

Constructing a basic unit, wherein the basic unit is a network structure obtained by connecting basic modules through basic operation of a neural network, the basic module comprises a first module, the first module is used for performing dimension reduction operation and residual connection operation on a first input feature map, the dimension reduction operation is used for transforming the dimension of the first input feature map from an original first dimension to a second dimension, the second dimension is smaller than the first dimension, the residual connection operation is used for performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimension of the feature map processed by the first module is the same as that of the first input feature map;

Constructing a search space according to the basic unit and network structure parameters, wherein the network structure parameters comprise the types of basic modules used for constructing the basic unit, and the search space is used for searching an image super-resolution network structure;

Searching an image super-resolution network structure in the search space to determine a target image super-resolution network, wherein the target image super-resolution network is used for performing super-resolution processing on an image to be processed, the target image super-resolution network at least comprises the first module, and the target image super-resolution network is a network with calculated amount smaller than a first preset threshold value and image super-resolution precision larger than a second preset threshold value.

2. The search method of claim 1, wherein the dimension reduction operation comprises at least one of a pooling operation and a convolution operation with a step size Q, Q being a positive integer greater than 1.

3. The search method according to claim 1 or 2, wherein the feature map processed by the first module is a feature map subjected to an up-scaling operation, the up-scaling operation is to restore the scale of the feature map subjected to the down-scaling operation to the first scale, and the residual connection operation is to perform a feature addition process on the first input feature map and the feature map subjected to the up-scaling operation.

4. The search method of any one of claims 1 to 3, wherein the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to feature stitching an output feature map of each of i-1 convolution layers with the first input feature map as an input feature map of an i-th convolution layer, and i is a positive integer greater than 1.

5. The search method of claim 4, wherein the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation is feature stitching processing on the first input feature map after channel compression processing.

6. The search method of any one of claims 1 to 5, wherein the first module is further configured to perform a rearrangement operation, where the rearrangement operation is to combine the plurality of first channel features of the first input feature map according to a preset rule to generate a second channel feature, where a resolution of the second channel feature is higher than a resolution of the first channel feature.

7. The searching method according to any one of claims 4 to 6, wherein the basic module further comprises a second module and/or a third module, wherein the second module is configured to perform a channel compression operation, the residual connection operation, and a dense connection operation on a second input feature map, where the channel compression operation is a convolution operation for performing a convolution kernel of 1×1 on the second input feature map; the third module is configured to perform a channel switching operation, the residual error connection operation, and the dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.

8. The search method of any one of claims 1 to 7, wherein said performing an image super-resolution network structure search in the search space to determine a target image super-resolution network, comprises:

And carrying out back propagation iterative training on the first image super-resolution network through a multi-stage weighted joint loss function to determine the target image super-resolution network, wherein the multi-stage weighted joint loss function is determined according to loss between a predicted super-resolution image and a sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network.

9. The search method of claim 8, wherein the multi-level weighted joint loss function is derived from the equation,

10. The searching method according to claim 8 or 9, wherein the determining the first image super-resolution network by performing an image super-resolution network structure search in the search space by an evolutionary algorithm comprises:

Evaluating a performance parameter of each candidate network structure in the P candidate network structures after training, wherein the performance parameter comprises a peak signal-to-noise ratio, and the peak signal-to-noise ratio is used for indicating the difference between a predicted superdivision image and a sample superdivision image obtained through each candidate network structure;

11. A search apparatus for a neural network, comprising:

A memory for storing a program;

a processor for executing the program stored in the memory, the processor for performing the following processes when the program stored in the memory is executed:

12. The search apparatus of claim 11, wherein the dimension reduction operation comprises at least one of a pooling operation and a convolution operation with a step size Q, Q being a positive integer greater than 1.

13. The search apparatus according to claim 11 or 12, wherein the feature map processed by the first module is a feature map subjected to an up-scaling operation, the up-scaling operation being to restore the scale of the feature map subjected to the down-scaling operation to the first scale, and the residual connection operation being to perform a feature addition process on the first input feature map and the feature map subjected to the up-scaling operation.

14. The search apparatus of any one of claims 11 to 13, wherein the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation is to perform feature stitching on an output feature map of each of i-1 convolution layers and the first input feature map as an input feature map of an i-th convolution layer, and i is a positive integer greater than 1.

15. The search apparatus of claim 14, wherein the dense connection operation is a cyclic dense connection operation, and the cyclic dense connection operation is a feature stitching process performed on the first input feature map after the channel compression process.

16. The search apparatus of any one of claims 11 to 15, wherein the first module is further configured to perform a rearrangement operation, where the rearrangement operation is to combine the plurality of first channel features of the first input feature map according to a preset rule to generate a second channel feature, where a resolution of the second channel feature is higher than a resolution of the first channel feature.

17. The search device of any one of claims 14 to 16, wherein the base module further comprises a second module and/or a third module, wherein the second module is configured to perform a channel compression operation, the residual connection operation, and a dense connection operation on a second input feature map, and the channel compression operation is a convolution operation that performs a convolution kernel on the second input feature map to be 1 x 1; the third module is configured to perform a channel switching operation, the residual error connection operation, and the dense connection operation on a third input feature map, where the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature maps includes at least two adjacent channel features, and the channel switching process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.

18. The search apparatus of any one of claims 11 to 17, wherein said performing an image super-resolution network structure search in the search space to determine a target image super-resolution network, comprises:

19. The search apparatus of claim 18, wherein the multi-level weighted joint loss function is derived from the equation,

20. The search apparatus of claim 18 or 19, wherein said performing an image super-resolution network structure search in the search space by an evolutionary algorithm to determine a first image super-resolution network comprises:

21. A computer readable storage medium storing program code for device execution, the program code comprising instructions for performing the search method of any one of claims 1 to 10.

22. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the search method of any one of claims 1 to 10.