WO2021018163A1 - Procédé et appareil de recherche de réseau neuronal - Google Patents

Procédé et appareil de recherche de réseau neuronal Download PDF

Info

Publication number
WO2021018163A1
WO2021018163A1 PCT/CN2020/105369 CN2020105369W WO2021018163A1 WO 2021018163 A1 WO2021018163 A1 WO 2021018163A1 CN 2020105369 W CN2020105369 W CN 2020105369W WO 2021018163 A1 WO2021018163 A1 WO 2021018163A1
Authority
WO
WIPO (PCT)
Prior art keywords
resolution
feature map
image
super
network
Prior art date
Application number
PCT/CN2020/105369
Other languages
English (en)
Chinese (zh)
Inventor
宋德华
贾旭
王云鹤
许春景
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021018163A1 publication Critical patent/WO2021018163A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution

Definitions

  • This application relates to the field of artificial intelligence, and more specifically, to a neural network search method and device.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
  • image super-resolution reconstruction technology refers to the reconstruction of low-resolution images to obtain high-resolution images.
  • Image super-resolution reconstruction processing through deep neural networks has obvious advantages.
  • the enhancement of the deep neural network model is also getting bigger.
  • the computing performance and storage space of mobile devices are very limited, this greatly limits the application of super-division models on mobile devices. Therefore, people are committed to designing lightweight super-division network models to ensure certain recognition accuracy. In this case, reduce the network scale as much as possible.
  • the neural architecture search (NAS) method is applied to the image super-resolution reconstruction technology.
  • the search space in the NAS method is usually a search space constructed by basic convolutional units.
  • the search space can include candidate neural network models constructed by multiple basic units.
  • the multiple basic units are based on the size of the input feature map in the same feature.
  • the size of the input feature map is nonlinearly transformed, which results in the amount of parameters in the neural network model being proportional to the amount of calculation, that is, the larger the parameter amount, the greater the amount of calculation of the network model.
  • the present application provides a neural network search method and device, which can improve the accuracy of the super-resolution network in image super-resolution processing when the computing performance of the mobile device is limited.
  • a search method for a neural network structure including: constructing a basic unit, which is a network structure obtained by connecting basic modules through basic operations of the neural network, the basic module including the first module
  • the first module is used to perform a dimensionality reduction operation and a residual connection operation on the first input feature map, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale,
  • the second scale is smaller than the first scale
  • the residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the feature map processed by the first module
  • the scale of is the same as the scale of the first input feature map
  • a search space is constructed according to the basic unit and network structure parameters, where the network structure parameter includes the type of basic module used to construct the basic unit, and the search space is used to search for images Super-resolution network structure; search for the image super-resolution network structure in the search space to determine the target image super-resolution network, the target image super-resolution
  • the basic unit may be a network structure obtained by connecting basic modules through the basic operations of a neural network.
  • the above-mentioned network structure may include a preset basic operation or a combination of basic operations in a convolutional neural network. , These basic operations or combinations of basic operations can be collectively referred to as basic operations.
  • basic operations can refer to convolution operations, pooling operations, residual connections, etc.
  • connections between basic modules can be made to obtain the network structure of the basic unit.
  • the above feature addition may refer to adding different channel features for feature maps of the same scale.
  • the first module can perform residual connection on the input feature map, that is, can perform feature addition processing on the first input feature map and the feature map processed by the first module, so as to realize that the first More local detailed information in an input feature map is passed to the subsequent convolutional layer.
  • the emcee module can be used to reduce the dimensionality of the first input feature map.
  • the scale of the input feature map can be reduced to reduce the amount of model calculation.
  • the residual connection operation can transfer the information of the previous layer to the subsequent layer, which makes up for the defect of the loss of information of the dimensionality reduction operation.
  • the dimensionality reduction operation can also quickly expand the receptive field of features, allowing the prediction of high-resolution pixels to better consider contextual information, thereby improving the accuracy of super-resolution.
  • the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the pooling operation may be an average pooling operation, or the pooling operation may also be a maximum pooling operation.
  • the scale of the first input feature map can be reduced through the dimensionality reduction operation, thereby reducing the calculation amount of the target image super-resolution network under the condition that the parameter amount is unchanged.
  • the feature map processed by the first module is the feature map after the dimension upgrade operation
  • the dimension upgrade operation refers to the feature map after the dimension reduction process
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
  • the above-mentioned dimension increase operation may refer to an up-sampling operation, or the dimension increase operation may also refer to a reverse convolution operation, where the up-sampling operation may refer to the use of an interpolation method, that is, based on the original image pixels
  • the deconvolution operation can refer to the inverse process of the convolution operation, also known as transposed convolution.
  • the scale of the first input feature map after the dimensionality reduction operation can be transformed from the second scale to the original first scale through the dimensionality reduction operation.
  • the scale increases to ensure that the residual connection operation is realized at the same scale.
  • the first module is further configured to perform a dense connection operation on the first input feature map, wherein the dense connection operation refers to the i-1
  • the output feature map of each convolutional layer in each convolutional layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the aforementioned feature splicing may refer to splicing M feature maps of the same scale into a feature map with K channels, where K is a positive integer greater than M.
  • the largest information circulation in the network can be achieved by adopting dense connection operations, and each layer is connected to all layers before that layer, that is, the input of each layer is the splicing of the outputs of all the previous layers.
  • dense connection operation the information in the input feature map is better maintained in the entire network, which can better compensate for the defect of information loss caused by the dimensionality reduction operation.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
  • the depth of the super-resolution network of the target image can be deepened by adopting the cyclic operation, that is, the cyclic dense connection operation.
  • the cyclic operation that is, the cyclic dense connection operation.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than that of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map,
  • the channel compression operation refers to a convolution operation with a convolution kernel of 1 ⁇ 1 on the second input feature map;
  • the third module is used to perform the third input feature map
  • the channel exchange operation, the residual connection operation, and the dense connection operation the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, and M is an integer greater than 1, the The first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the image super-resolution network structure search in the search space to determine the target image super-resolution network includes:
  • the first image super-resolution network is determined by searching the image super-resolution network structure through an evolutionary algorithm
  • the training image may refer to the sample image, that is, the low-resolution image and the sample super-resolution corresponding to the low resolution. image.
  • the first image super-resolution network determined by the evolutionary algorithm can be trained twice through the multi-level weighted joint loss function, and finally the parameters of the target image super-resolution network can be determined to obtain the target image super-resolution network. , Thereby improving the accuracy of the target image super-resolution network processing image.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the k-th basic unit of the first image super-resolution network
  • the loss value refers to the output feature of the k-th basic unit
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first image super-resolution network included
  • N is an integer greater than or equal to 1.
  • the weight of each intermediate layer image loss in the multi-level weighted joint loss function may change with time (or the number of iterations).
  • the loss function can combine the predicted image loss of each intermediate layer, and reflect the importance of different layers by weighting. Among them, the weight value of each intermediate layer image loss can change over time, which is conducive to more fully training the bottom layer
  • the parameters of the basic unit to improve the performance of the super-resolution network may change with time (or the number of iterations).
  • the search for the image super-resolution network structure through an evolutionary algorithm in the search space to determine the first image super-resolution network includes:
  • the performance parameter includes the peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the predicted super-division image obtained by each candidate network structure Differences with sample super-divided images;
  • the first image super-resolution network is determined according to the performance parameters of the candidate network.
  • training images and multi-level weighted joint loss functions need to be used to train the candidate network structures, where the training images may refer to sample images, ie, low-resolution images and low-resolution images. The corresponding sample super-resolution image.
  • an image processing method including: acquiring an image to be processed; performing super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image of the image to be processed, wherein the target image Is the super-resolution image corresponding to the image to be processed, the target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, and the search space is constructed by basic units and network structure parameters, The search space is used to search the image super-resolution network structure.
  • the network structure parameters include the type of the basic module used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes a first module that is used to perform a residual connection operation and a dimensionality reduction operation on the first input feature map.
  • the residual connection operation refers to the first input feature map and the first input feature map.
  • the feature map processed by the module is subjected to feature addition processing.
  • the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale, and the
  • the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation includes at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is the feature map after the dimension upgrade operation
  • the dimension upgrade operation refers to the feature map after the dimensionality reduction process
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
  • the first module is also used to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolving i-1
  • the output feature map of each convolutional layer in the layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map,
  • the channel compression operation refers to a convolution operation with a convolution kernel of 1 ⁇ 1 on the second input feature map;
  • the third module is used to perform the third input feature map
  • the channel exchange operation, the residual connection operation, and the dense connection operation the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, and M is an integer greater than 1, the The first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the target image super-resolution network is a network determined by back-propagating iterative training of the first image super-resolution network through a multi-level weighted joint loss function, where The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network.
  • the first image super-resolution rate network refers to a network that is determined by searching the image super-resolution network structure through an evolutionary algorithm in the search space.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the k-th basic unit of the first image super-resolution network
  • the loss value refers to the output feature map of the k-th basic unit
  • the image loss between the corresponding predicted super-resolution image and the sample super-resolution image, ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first image super-resolution network included
  • the number of the basic unit, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in P candidate network structures, and the P candidate network structures are Randomly generated according to the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function.
  • the performance parameter includes the peak-to-noise ratio, and the peak-to-noise ratio. The ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure, and P is an integer greater than 1.
  • an image processing method is provided, which is applied to an electronic device with a display screen and a camera.
  • the method includes: detecting a user's first operation for turning on the camera; in response to the first operation, A photographing interface is displayed on the upper surface, the photographing interface includes a finder frame, the finder frame includes a first image; the second operation of the user instructing the camera is detected; in response to the second operation, the second image is displayed in the finder frame,
  • the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target image super-resolution network is applied to the super-resolution processing process, and the target image super-resolution network is
  • the search space is a network determined by searching for a network structure.
  • the search space is constructed by basic units and network structure parameters.
  • the search space is used to search for image super-resolution network structures.
  • the network structure parameters include the basic units used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes a first module that is used for residual connection of the first input feature map Operation and dimensionality reduction operation.
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimensionality reduction operation is used for the first input feature map
  • the scale of is transformed from the original first scale to a second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module The scale is the same as the first input feature map.
  • the above-mentioned basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is the feature map after the dimension upgrade operation
  • the dimension upgrade operation refers to the feature map after the dimensionality reduction process
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension-upgrading operation.
  • the first module is also used to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the feature splicing of the first input feature map are used as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after channel compression processing Splicing processing.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to that the multiple first channel features of the first input feature map are in accordance with the preset It is assumed that the merging process is performed regularly to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map,
  • the channel compression operation refers to a convolution operation with a convolution kernel of 1 ⁇ 1 on the second input feature map;
  • the third module is used to perform the third input feature map
  • the channel exchange operation, the residual connection operation, and the dense connection operation the third input feature map includes M sub-feature maps, each sub-feature map in the M sub-feature map includes at least two adjacent channel features, and the channel The exchange process refers to reordering at least two adjacent channel features corresponding to the M sub-feature maps, so that the channel features corresponding to different sub-feature maps in the M sub-feature maps are adjacent, where M is an integer greater than 1,
  • the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the target image super-resolution network is a network determined by back-propagating iterative training of the first image super-resolution network through a multi-level weighted joint loss function, where The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network.
  • the first image super-resolution rate network refers to a network that is determined by searching the image super-resolution network structure through an evolutionary algorithm in the search space.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the k-th basic unit of the first image super-resolution network
  • the loss value refers to the corresponding output feature map of the k-th basic unit
  • the image loss between the predicted super-resolution image and the sample super-resolution image, ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first image super-resolution network included in the The number of basic units, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in P candidate network structures, and the P candidate network structures are Randomly generated according to the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function.
  • the performance parameter includes the peak-to-noise ratio, and the peak-to-noise ratio. The ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure, and P is an integer greater than 1.
  • a neural network search device in a fourth aspect, includes: a memory for storing programs; a processor for executing programs stored in the memory.
  • the processor uses For execution: construct a basic unit, which is a network structure obtained by connecting basic modules through the basic operation of a neural network.
  • the basic module includes a first module, which is used to perform a first input feature map.
  • a dimensionality reduction operation and a residual connection operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale is smaller than the first scale, and the residual connection
  • the operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map;
  • the basic unit and network structure parameters construct a search space, where the network structure parameters include the type of basic modules used to construct the basic unit, and the search space is used to search for image super-resolution network structures; image super-resolution is performed in the search space.
  • the resolution network structure search determines the target image super-resolution network, the target image super-resolution network is used to perform super-resolution processing on the image to be processed, the target image super-resolution network includes at least the first module, and the target image super-resolution network
  • the resolution network is a network in which the amount of calculation is less than the first preset threshold and the image super-resolution accuracy is greater than the second preset threshold.
  • the processor included in the aforementioned neural network search device is further configured to execute the search method in any one implementation manner in the first aspect.
  • an image processing device includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute: Obtain the image to be processed; perform super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image of the image to be processed, where the target image is a super-resolution image corresponding to the image to be processed, and the target
  • the image super-resolution network is a network determined by searching the image super-resolution network structure in the search space.
  • the search space is constructed by basic units and network structure parameters. The search space is used to search for the image super-resolution network structure.
  • the network structure parameter includes the type of the basic module used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of a neural network.
  • the basic module includes a first module.
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the reduction
  • the dimensional operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module,
  • the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the processor included in the foregoing image processing apparatus is further configured to execute the method in any implementation manner in the second aspect.
  • an image processing device including: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processor is configured to execute: The first operation of the user to turn on the camera is detected; in response to the first operation, a photographing interface is displayed on the display screen, the photographing interface includes a finder frame, and the finder frame includes the first image; the user instruction is detected The second operation of the camera; in response to the second operation, a second image is displayed in the viewing frame, the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target The image super-resolution network is used in the super-resolution processing.
  • the target image super-resolution network is a network determined by the image super-resolution network structure search in the search space, and the search space is constructed by the basic unit and network structure parameters
  • the search space is used to search the image super-resolution network structure.
  • the network structure parameters include the type of basic module used to construct the basic unit.
  • the basic unit is a kind of basic module obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes a first module, the first module is used to perform residual connection operation and dimensionality reduction operation on the first input feature map, the residual connection operation refers to the first input feature map and the
  • the feature map processed by the first module is subjected to feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale
  • the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the processor included in the foregoing image processing apparatus is further configured to execute the method in any one implementation manner in the third aspect.
  • a computer-readable medium stores program code for device execution, and the program code includes the program code for executing the first aspect to the third aspect and the first aspect to the third aspect.
  • the method in any of the implementations.
  • a computer program product containing instructions is provided.
  • the computer program product runs on a computer, the computer executes any one of the first aspect to the third aspect and the first aspect to the third aspect. The method in the way.
  • a chip in a ninth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface, and executes the first to third aspects and the first to third aspects. The method in any one of the third aspects.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute the instructions stored in the memory.
  • the processor is configured to execute the method in any one of the foregoing first aspect to the third aspect and the first aspect to the third aspect.
  • FIG. 1 is a schematic diagram of an artificial intelligence main body framework provided by an embodiment of the present application
  • Figure 2 is a schematic diagram of an application scenario provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of another application scenario provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another application scenario provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of yet another application scenario provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a convolutional neural network structure provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of another convolutional neural network structure provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a neural network search method provided by the implementation of this application.
  • FIG. 12 is a schematic diagram of a target image super-resolution network provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a first module provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of another first module provided by an embodiment of the present application.
  • 15 is a schematic structural diagram of still another first module provided by an embodiment of the present application.
  • FIG. 16 is a schematic diagram of a rearrangement operation provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a second module provided by an embodiment of the present application.
  • FIG. 18 is a schematic structural diagram of a third module provided by an embodiment of the present application.
  • FIG. 19 is a schematic diagram of channel exchange processing provided by an embodiment of the present application.
  • FIG. 20 is a schematic diagram of a search image super-resolution network provided by an embodiment of the present application.
  • FIG. 21 is a schematic diagram of network training through a multi-level weighted joint loss function provided by an embodiment of the present application.
  • FIG. 22 is a schematic diagram of a network structure search based on an evolutionary algorithm provided by an embodiment of the present application.
  • FIG. 23 is a schematic diagram of an effect after image processing is performed through the target super-resolution network of an embodiment of the present application.
  • FIG. 24 is a schematic diagram of an effect after image processing is performed through the target super-resolution network of an embodiment of the present application.
  • FIG. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 26 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 27 is a schematic diagram of a group of display interfaces provided by an embodiment of the present application.
  • FIG. 28 is a schematic diagram of another set of display interfaces provided by an embodiment of the present application.
  • Fig. 29 is a schematic block diagram of a neural network search device according to an embodiment of the present application.
  • FIG. 30 is a schematic block diagram of an image processing device according to an embodiment of the present application.
  • Figure 1 shows a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of the artificial intelligence system and is suitable for general artificial intelligence field requirements.
  • Intelligent Information Chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • Infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and realizes support through the basic platform.
  • the infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
  • the smart chip here can be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processing unit (GPU), and an application specific integrated circuit (application specific).
  • Hardware acceleration chips such as integrated circuit (ASIC) and field programmable gate array (FPGA).
  • the basic platform of infrastructure can include distributed computing framework and network and other related platform guarantees and support, and can include cloud storage and computing, interconnection networks, etc.
  • data can be obtained through sensors and external communication, and then these data can be provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • This data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • the above-mentioned data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the decision-making process of intelligent information after reasoning, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical, smart security, autonomous driving, safe city, smart terminal, etc.
  • Application scenario 1 Smart terminal camera field
  • the method for searching the neural network structure of the embodiment of the present application can be applied to a smart terminal device (for example, a mobile phone) for real-time image super-resolution technology.
  • the method for searching the neural network structure of the embodiment of the present application can determine the target image super-resolution network applied to the field of smart terminal shooting.
  • the target image super-resolution network when a user uses a smart terminal to photograph long-distance objects or small objects, the resolution of the captured image is relatively low and the details are not clear.
  • the user can use the target image super-resolution network provided by the embodiments of the present application to implement image super-resolution processing on the smart terminal, so that low-resolution images can be converted into high-resolution images, so that the photographed objects are clearer.
  • this application proposes an image processing method applied to an electronic device with a display screen and a camera.
  • the method includes: detecting a user's first operation for turning on the camera; in response to the first operation, A photographing interface is displayed on the display screen, the photographing interface includes a finder frame, and the finder frame includes a first image; a second operation instructed by the user to the camera is detected; in response to the second operation, A second image is displayed in the viewing frame, and the second image is an image after super-resolution processing is performed on the first image collected by the camera, wherein the target super-resolution neural network is applied to the super-resolution Rate in the process.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation
  • the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing.
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  • the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
  • the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1 ⁇ 1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, the M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the second module is used to perform channel compression operations on the second input feature map
  • the channel compression operation refers to a convolution operation with
  • the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network,
  • the first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the kth basic unit of the first image super-resolution network
  • the loss value refers to the kth
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function
  • the performance parameter includes a peak-to-noise ratio
  • the peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
  • target image super-resolution network applied to the camera field of the smart terminal provided by the embodiments of the present application is also applicable to the expansion of the target image super-resolution network related content in the following related embodiments in FIGS. 10 to 22 , Definitions, explanations and descriptions, not repeated here.
  • FIG. 2 a schematic diagram of a target image super-resolution network applied to a smart terminal.
  • the smart terminal 210 for example, a mobile phone
  • the low-resolution image obtained is 220 or 220
  • the image 230, the super-resolution 240 (SR) shown in FIG. 2 may be the target image super-resolution network in the embodiment of the present application, and the target image can be obtained after the target image super-resolution network is processed, for example After the image 220 is subjected to super-resolution processing, the super-resolution image 221 can be obtained; after the image 230 is subjected to the super-resolution processing, the super-resolution image 231 can be obtained.
  • the smart terminal 210 may be an electronic device with a camera.
  • the smart terminal may be a mobile phone with image processing functions, a tablet personal computer (TPC), a media player, a smart TV, or a laptop.
  • TPC tablet personal computer
  • PDA personal digital assistant
  • PC personal computer
  • camera video camera
  • smart watch wearable device
  • WD wearable device
  • the neural network search method of the embodiment of the present application can be applied to the security field.
  • pictures (or videos) collected by monitoring equipment in public places are often affected by factors such as weather and distance, and have problems such as blurred images and low resolution.
  • the super-resolution network of the target image can perform super-resolution reconstruction of the collected pictures, which can restore important information such as license plate numbers and clear faces for public security personnel, and provide important clues for case detection.
  • this application provides an image processing method, the method includes: acquiring a street view image; performing super-resolution processing on the street view image according to the target image super-resolution network to obtain a super-resolution image of the street view image; The super-resolution image of the street view image, and the information in the super-resolution image is recognized.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • target image super-resolution network applied to the security field provided by the embodiments of the present application is also applicable to the expansion, limitation, and definition of the target image super-resolution network related content in the following related embodiments in FIGS. 10 to 22. Explanation and description are not repeated here.
  • the neural network search method of the embodiment of the present application can be applied to the field of medical imaging.
  • the target image super-resolution network can perform super-resolution reconstruction of medical images, which can reduce the requirements for the imaging environment without increasing the cost of high-resolution imaging technology, and realize the recovery of clear medical images through the restoration of clear medical images. Accurate detection of cells helps doctors make a better diagnosis of the patient's condition.
  • this application provides an image processing method, the method includes: acquiring a medical image frame; performing super-resolution processing on the medical image frame according to the target image super-resolution network to obtain the super-resolution of the medical image frame Image; according to the super-resolution image of the medical imaging screen to identify and analyze the information in the super-resolution image.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by the module is subjected to feature addition processing.
  • the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, and the second scale is smaller than the first scale.
  • a scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • target image super-resolution network applied to the medical imaging field provided by the embodiments of the present application is also applicable to the expansion and limitation of the related content of the target image super-resolution network in the following related embodiments in FIGS. 10 to 22. , Explanation and description, I won’t repeat them here.
  • the neural network search method of the embodiment of the present application can be applied to the field of image compression.
  • the picture can be compressed in advance before transmission, waiting for the transmission to be completed, and then decoded by the receiving end through the super-resolution reconstruction technology of the target image super-resolution network to restore the original
  • the image sequence greatly reduces the space required for storage and the bandwidth required for transmission.
  • the present application provides an image processing method, which includes: acquiring a compressed image; performing super-resolution processing on the compressed image according to the target image super-resolution network to obtain a super-resolution image of the compressed image;
  • the super-resolution image of the compressed image identifies the information in the super-resolution image.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search the image super-resolution network.
  • Resolution network structure the network structure parameters include the type of the basic module used to construct the basic unit, the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network, the basic module It includes at least a first module, and the first module is used to perform a residual connection operation and a dimensionality reduction operation on a first input feature map.
  • the residual connection operation refers to combining the first input feature map with the first input feature map.
  • the feature map processed by a module performs feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the The first scale, the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • target image super-resolution network applied in the field of image compression provided by the embodiments of this application is also applicable to the expansion and limitation of the related content of the target image super-resolution network in the following related embodiments in FIG. 10 to FIG. 22 , Explanation and description, I won’t repeat them here.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression: among them, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • the definition of these parameters in the DNN is as follows: Take the coefficient W as an example: Suppose that in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • the input layer has no W parameter.
  • more hidden layers make the network more capable of portraying complex situations in the real world. Theoretically speaking, a model with more parameters is more complex and has a greater "capacity", which means it can complete more complex learning tasks.
  • Training a deep neural network is also a process of learning a weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (a weight matrix formed by vectors W of many layers).
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolution layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way to extract image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. During the training of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the neural network can use an error back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the backpropagation algorithm is a backpropagation motion dominated by error loss, and aims to obtain the optimal neural network model parameters, such as the weight matrix.
  • NAS Neural Architecture Search
  • search space, search strategy and performance evaluation strategy are the core elements of NAS algorithm.
  • the search space can refer to the set of searched neural network structures, that is, the solution space. In order to improve search efficiency, sometimes the search space is limited or simplified.
  • the network is divided into basic units (cells, or blocks), and a more complex network is formed by stacking these units.
  • the basic unit is composed of multiple nodes (layers of the neural network), which appear repeatedly in the entire network but have different weight parameters.
  • the search strategy can refer to the process of finding the optimal network structure in the search space.
  • the search strategy defines how to find the optimal network structure. It is usually an iterative optimization process, which is essentially a hyperparameter optimization problem.
  • the performance evaluation strategy may refer to evaluating the performance of the searched network structure.
  • the goal of the search strategy is to find a neural network structure, and the performance of the searched network structure can be evaluated through the performance evaluation strategy.
  • Fig. 6 shows a system architecture 300 provided by an embodiment of the present application.
  • the data collection device 360 is used to collect training data.
  • the target image super-resolution network is determined by the neural network search method of the embodiment of the application, the target super-resolution network can be further trained through the training image, that is, data collection
  • the training data collected by the device 360 may be training images.
  • the training images may include sample images and super-resolution images corresponding to the sample images.
  • the sample images may refer to low-resolution images, for example, low-resolution images may refer to image paintings. Image with unclear quality and blurry picture.
  • the data collection device 360 stores the training data in the database 330, and the training device 320 obtains the target model/rule 301 based on the training data maintained in the database 330.
  • the training device 320 processes the input original image and compares the output image with the original image until the output image of the training device 320 is different from the original image. The difference is less than a certain threshold, thereby completing the training of the target model/rule 301.
  • the target image super-resolution network used for image super-resolution processing in the image processing method provided in this application can be obtained by training the loss between the predicted super-resolution image of the sample image and the sample super-resolution image.
  • the trained network makes the difference between the predicted super-resolution image and the sample super-resolution image obtained by inputting the sample image into the target image super-resolution network to be less than a certain threshold, thereby completing the training of the target image super-resolution network.
  • the above-mentioned target model/rule 301 can be used to implement the image processing method of the embodiment of the present application.
  • the target model/rule 301 in the embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 330 may not all come from the collection of the data collection device 360, and may also be received from other devices.
  • the training device 320 does not necessarily perform the training of the target model/rule 301 completely based on the training data maintained by the database 330. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to this application. Limitations of Examples.
  • the target model/rule 301 trained according to the training device 320 can be applied to different systems or devices, such as the execution device 310 shown in FIG. 6, which can be a terminal, such as a mobile phone terminal, a tablet computer, notebook computers, augmented reality (AR)/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers, or cloud, etc.
  • the execution device 310 is configured with an input/output (input/output, I/O) interface 312 for data interaction with external devices.
  • the user can input data to the I/O interface 312 through the client device 340.
  • the input data in this embodiment of the application may include: the image to be processed input by the client device.
  • the preprocessing module 313 and the preprocessing module 314 are used for preprocessing according to the input data (such as the image to be processed) received by the I/O interface 312.
  • the input data such as the image to be processed
  • the execution device 310 When the execution device 310 preprocesses the input data, or when the calculation module 311 of the execution device 310 performs calculations and other related processing, the execution device 310 can call data, codes, etc. in the data storage system 350 for corresponding processing , The data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 350.
  • the I/O interface 312 returns the processing result, such as the predicted depth image obtained as described above, to the client device 340 to provide it to the user.
  • the training device 320 can generate corresponding target models/rules 301 based on different training data for different goals or different tasks, and the corresponding target models/rules 301 can be used to achieve the above goals or complete The above tasks provide the user with the desired result.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 312.
  • the client device 340 can automatically send input data to the I/O interface 312. If the client device 340 is required to automatically send input data and the user's authorization is required, the user can set the corresponding authority in the client device 340.
  • the user can view the result output by the execution device 310 on the client device 340, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 340 can also be used as a data collection terminal to collect the input data of the input I/O interface 312 and the output result of the output I/O interface 312 as new sample data, and store it in the database 330 as shown.
  • the I/O interface 312 directly uses the input data input to the I/O interface 312 and the output result of the output I/O interface 312 as a new sample as shown in the figure.
  • the data is stored in the database 330.
  • FIG. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 350 is an external memory relative to the execution device 310. In other cases, the data storage system 350 may also be placed in the execution device 310.
  • the target model/rule 301 is trained according to the training device 320.
  • the target model/rule 301 may be the neural network in this application in the embodiment of this application.
  • the neural network provided in the embodiment of this application It can be CNN, deep convolutional neural networks (deep convolutional neural networks, DCNN), etc.
  • CNN is a very common neural network
  • the structure of CNN will be introduced in detail below in conjunction with Figure 7.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • a deep learning architecture refers to a machine learning algorithm. Multi-level learning is carried out on the abstract level of
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 400 may include an input layer 410, a convolutional layer/pooling layer 420 (wherein the pooling layer is optional), and a neural network 430.
  • the input layer 410 can obtain the image to be processed, and pass the obtained image to be processed to the convolutional layer/pooling layer 420 and the subsequent neural network layer 430 for processing, and the image processing result can be obtained.
  • the following describes the internal layer structure of CNN 400 in Fig. 7 in detail.
  • the convolutional layer/pooling layer 420 may include layers 421-426, for example: in one implementation, layer 421 is a convolutional layer, layer 422 is a pooling layer, and layer 423 is a convolutional layer. Build layers, 424 layers are pooling layers, 425 are convolutional layers, and 426 are pooling layers; in another implementation, 421 and 422 are convolutional layers, 423 are pooling layers, and 424 and 425 are convolutional layers. Layer, 426 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 421 can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. Etc., it depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices with the same size (row ⁇ column) are used. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract the edge information of the image, another weight matrix is used to extract the specific color of the image, and another weight matrix is used to correct the unwanted images in the image. The noise is blurred and so on.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are combined to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 400 can make correct predictions. .
  • the initial convolutional layer (such as 421) often extracts more general features, which can also be called low-level features;
  • the features extracted by the subsequent convolutional layers (such as 426) become more complex, for example, features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the only purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain a smaller size image.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • Neural network layer 430
  • the convolutional neural network 400 After processing by the convolutional layer/pooling layer 420, the convolutional neural network 400 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 420 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 400 needs to use the neural network layer 430 to generate one or a group of required classes of output. Therefore, the neural network layer 430 can include multiple hidden layers (431, 432 to 43n as shown in FIG. 7) and an output layer 440. The parameters contained in the hidden layers can be based on specific task types. The relevant training data of the, for example, the task type can include image recognition, image classification, image detection, and image super-resolution reconstruction.
  • the output layer 440 After the multiple hidden layers in the neural network layer 430, that is, the final layer of the entire convolutional neural network 400 is the output layer 440.
  • the output layer 440 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • a convolutional neural network (CNN) 500 may include an input layer 510, a convolutional layer/pooling layer 520 (wherein the pooling layer is optional), and a neural network 530.
  • CNN convolutional neural network
  • FIG. 7 multiple convolutional layers/pooling layers in the convolutional layer/pooling layer 520 in FIG. 8 are parallel, and the respectively extracted features are input to the full neural network layer 530 for processing.
  • the convolutional neural network shown in FIG. 7 and FIG. 8 is only used as an example of two possible convolutional neural networks in the image processing method of the embodiment of this application. In specific applications, this application implements The convolutional neural network used in the image processing method of the example can also exist in the form of other network models.
  • FIG. 9 is a hardware structure of a chip provided by an embodiment of the application.
  • the chip includes a neural network processor 600.
  • the chip can be set in the execution device 310 as shown in FIG. 6 to complete the calculation work of the calculation module 311.
  • the chip can also be set in the training device 320 as shown in FIG. 6 to complete the training work of the training device 320 and output the target model/rule 301.
  • the algorithms of each layer in the convolutional neural network as shown in FIG. 7 or FIG. 8 can all be implemented in the chip as shown in FIG. 9.
  • the neural network processor NPU 600 is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks.
  • the core part of the NPU 600 is the arithmetic circuit 603.
  • the controller 604 controls the arithmetic circuit 603 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 603 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 603 is a two-dimensional systolic array. The arithmetic circuit 603 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 603 is a general-purpose matrix processor.
  • the arithmetic circuit 603 fetches the data corresponding to matrix B from the weight memory 602 and caches it on each PE in the arithmetic circuit 603.
  • the arithmetic circuit 603 fetches the matrix A data and matrix B from the input memory 601 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 608 (accumulator).
  • the vector calculation unit 607 can perform further processing on the output of the arithmetic circuit 603, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 607 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 607 can store the processed output vector to the unified memory 606.
  • the vector calculation unit 607 may apply a nonlinear function to the output of the arithmetic circuit 603, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 607 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 603, for example for use in subsequent layers in a neural network.
  • the unified memory 606 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 601 and/or the unified memory 606 through the storage unit access controller 605 (direct memory access controller, DMAC), and stores the weight data in the external memory into the weight memory 602, And the data in the unified memory 606 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 610 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 609 through the bus.
  • An instruction fetch buffer 609 connected to the controller 604 is used to store instructions used by the controller 604.
  • the controller 604 is used to call the instructions cached in the instruction fetch memory 609 to control the working process of the computing accelerator.
  • the unified memory 606, the input memory 601, the weight memory 602, and the instruction fetch memory 609 are all on-chip memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory. Memory (double data rate, synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM) or other readable and writable memory.
  • each layer in the convolutional neural network shown in FIG. 7 or FIG. 8 can be executed by the arithmetic circuit 603 or the vector calculation unit 607.
  • the execution device 310 in FIG. 6 introduced above can execute the neural network search method or image processing method of the embodiment of the present application.
  • the CNN model shown in FIGS. 7 and 8 and the chip shown in FIG. 9 can also be It is used to execute each step of the neural network search method or the image processing method of the embodiment of the present application.
  • an embodiment of the present application provides a system architecture 700.
  • the system architecture includes a local device 720, a local device 730, an execution device 710, and a data storage system 750.
  • the local device 720 and the local device 730 are connected to the execution device 710 through a communication network.
  • the execution device 710 may be implemented by one or more servers.
  • the execution device 710 can be used in conjunction with other computing devices, such as data storage, routers, load balancers and other devices.
  • the execution device 710 may be arranged on one physical site or distributed on multiple physical sites.
  • the execution device 710 may use the data in the data storage system 750 or call the program code in the data storage system 750 to implement the method for searching the neural network structure of the embodiment of the present application.
  • execution device 710 may also be referred to as a cloud device, and in this case, the execution device 710 may be deployed in the cloud.
  • the execution device 710 may perform the following process: construct a basic unit, the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network, the basic module includes a first module, and the first module A module is used to perform a dimensionality reduction operation and a residual connection operation on the first input feature map.
  • the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale, so The second scale is smaller than the first scale, and the residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, and the first module
  • the scale of the processed feature map is the same as the scale of the first input feature map;
  • a search space is constructed according to the basic unit and network structure parameters, wherein the network structure parameter includes the basic module used to construct the basic unit Type, the search space is used to search the image super-resolution network structure, the basic unit is a basic module used to construct the image super-resolution network;
  • the image super-resolution network structure search is performed in the search space to determine the target image A super-resolution network, the target image super-resolution network is used to perform super-resolution processing on an image to be processed, the target image super-resolution network includes at least the first module, and the target image super-resolution network is A network where the amount of calculation is less than the first
  • a target neural network can be obtained through a network structure search (neural architecture search, NAS), and the target neural network can be used for image super-resolution processing.
  • NAS network structure search
  • the foregoing method for the execution device 710 to search the network structure may be an offline search method executed in the cloud.
  • the user can operate respective user devices (for example, the local device 720 and the local device 730) to interact with the execution device 710.
  • Each local device can represent any computing device, for example, a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other types of cellular phones, a media consumption device, a wearable device, a set-top box, a game console, etc.
  • the local device of each user can interact with the execution device 710 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, or any combination thereof.
  • the local device 720 and the local device 730 may obtain the relevant parameters of the target neural network from the execution device 710, deploy the target neural network on the local device 720 and the local device 730, and use the target neural network to perform image processing. Super resolution processing and so on.
  • the target neural network can be directly deployed on the execution device 710.
  • the execution device 710 obtains the image to be processed from the local device 720 and the local device 730, and performs image super-resolution processing on the image to be processed according to the target neural network.
  • the aforementioned target neural network may be the target image super-resolution network in the embodiment of the present application.
  • the neural network search method of the embodiment of the present application will be described in detail below in conjunction with FIG. 11.
  • the method shown in FIG. 11 can be executed by a neural network search device.
  • the neural network search device can be a computer, a server, and other devices with sufficient computing power for neural network search.
  • the method 800 shown in FIG. 11 includes steps 810 to 830, which will be described in detail below.
  • Step 810 Construct a basic unit.
  • the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network.
  • the basic module includes a first module, which is used to perform a dimensionality reduction operation on the first input feature map And the residual connection operation, the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to the second scale, the second scale is smaller than the first scale, and the residual connection operation is used to convert the first
  • the input feature map is subjected to feature addition processing with the feature map processed by the first module, and the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the basic unit may be a network structure obtained by connecting basic modules through the basic operations of a neural network.
  • the above-mentioned network structure may include a preset basic operation or a combination of basic operations in a convolutional neural network. , These basic operations or combinations of basic operations can be collectively referred to as basic operations.
  • basic operations can refer to convolution operations, pooling operations, residual connections, etc.
  • connections between basic modules can be made to obtain the network structure of the basic unit.
  • the above-mentioned basic unit may be a basic module used to construct an image super-resolution network.
  • the target image super-resolution network may include three major parts: The change part and the reconstruction part.
  • the feature extraction module is used to obtain the image features of the image to be processed.
  • the image to be processed may be a low resolution image (LR); the nonlinear transformation part is used to input
  • the image features of the image are transformed, and the image features are mapped from the first feature space to the second feature space.
  • the first feature space refers to the feature space where the image to be processed is extracted.
  • the second high-dimensional space is easier to reconstruct super Sub-image;
  • the reconstruction part is used to perform up-sampling and convolution processing on the image features output by the non-linear change part to obtain a super-resolution image corresponding to the image to be input.
  • the non-linear transformation part of the network structure can be searched in the search space by means of NAS.
  • the first input feature map input to the first module is at the first scale, and is transformed to the second scale after the dimensionality reduction operation, and the first input feature map of the second scale is subjected to the dimension upgrade operation Transform to the third scale, the third scale is located between the first scale and the second scale.
  • the dimension reduction operation of the feature map is performed to reduce the dimension to the same scale as the first input feature map of the third scale, that is, the scale of the feature map processed by the first module is the same as the scale of the first input feature map.
  • the above-mentioned basic unit cell may be a network obtained by connecting basic modules according to the basic operation of a neural network.
  • the basic module may include a first module, the first module may be a scale module (contextual residual dense block, CRDB), and the scale module may be used to perform dimensionality reduction and residual connection operations on the first input feature map, that is, scale
  • the module may include a pooling sub-module and residual connection for processing the first input feature map.
  • the dimensionality reduction operation can reduce the scale of the first input feature map, where the dimensionality reduction operation can refer to a pooling operation on the first input feature map, or the dimensionality reduction operation can also refer to the first input feature map.
  • the input feature map is subjected to a convolution operation with a step length of Q, where Q is a positive integer greater than 1.
  • the above-mentioned residual connection operation is used to perform feature addition processing on the first input feature map and the feature map processed by the first module, where the feature map processed by the first module may refer to the dimension upgrade operation After the feature map, the dimension increase operation refers to restoring the scale of the feature map after dimensionality reduction processing to the original first scale, and the residual connection operation can refer to the first input feature map and the dimension enhancement operation processed
  • the feature map performs feature addition processing.
  • the above-mentioned dimension increase operation may refer to an up-sampling operation, or the dimension increase operation may also refer to a backwards convolution operation (backwards strided convolution), where the up-sampling operation may refer to the use of an interpolation method, that is, in the original image
  • a suitable interpolation algorithm is used to insert new elements between pixels
  • the deconvolution operation can refer to the inverse process of the convolution operation, also known as transposed convolution.
  • feature addition may refer to adding information of different channel features for feature maps of the same scale.
  • the scale module can perform residual connection on the input feature map, that is, can perform feature addition processing on the first input feature map and the feature map processed by the first module, so as to realize that the first More local details in the input feature map are passed to the subsequent convolutional layer.
  • the scale module can be used to perform a dimensionality reduction operation on the first input feature map.
  • the dimensionality reduction operation can reduce the scale of the input feature map to reduce the amount of model calculation.
  • the residual connection operation can well transfer the information of the previous layer to the subsequent layer, which makes up for the dimensionality reduction operation information Missing defects.
  • the dimensionality reduction operation can also quickly expand the receptive field of features, allowing the prediction of high-resolution pixels to better consider contextual information, thereby improving the super-resolution accuracy.
  • the image super-resolution reconstruction technology refers to obtaining a high-resolution image by reconstructing a low-resolution image. Therefore, more local information of image features is needed in image super-resolution processing.
  • the commonly used image super-resolution network models do not use dimensionality reduction operations, mainly because dimensionality reduction operations will lose part of the local information of low-resolution input images. .
  • the information in the input feature map is better maintained in the entire network through residual connection operations and/or dense connection operations, that is, the information of the front layer can be well transmitted to the back This layer can compensate for the defect of information loss caused by dimensionality reduction operations.
  • the use of dimensionality reduction operations can not only reduce the amount of model calculations, but also expand the receptive field of features and improve the accuracy of the image super-resolution recognition network.
  • the specific form of the network structure of the scale module may be as shown in FIG. 13.
  • Three scale modules are shown in FIG. 13, which are the d-1th scale module, the dth scale module, and the d+1th scale module.
  • the d-th scale module may include a pooling sub-module, and the dimensionality reduction operation may be used to downsample the input feature map, thereby reducing the feature size.
  • the aforementioned dimensionality reduction operation may refer to a pooling operation, such as average pooling, or maximum pooling.
  • the residual connection can refer to the feature addition of the output feature map of the CRDB d-1 module and the processed feature map.
  • the processed feature map refers to the input feature
  • the graphs are feature maps obtained after pooling operation, 3 ⁇ 3 convolution operation, rectified linear unit (ReLU), dimension-up operation processing, and 1 ⁇ 1 convolution operation.
  • the scale module can also be used to perform dense connection operations on the first input feature map.
  • the dense connection operation can refer to the feature stitching of the output feature maps of each convolutional layer in the i-1 convolutional layers and the input feature maps. As the input feature map of the i-th convolutional layer.
  • the specific form of the network structure of the scale module may be as shown in FIG. 14.
  • the dense connection operation can achieve the largest information flow in the network, through each layer being connected to all layers before that layer, that is, the input of each layer is the splicing of the outputs of all the previous layers.
  • the dense connection operation the information in the input feature map (for forward calculation) or gradient (for backward calculation) is better maintained in the entire network, which can better compensate for the loss of information in the dimensionality reduction operation. That is, when performing image super-resolution processing, the residual connection operation and dense connection operation can ensure that the information in the feature map can be better transmitted to the later layers in the network structure.
  • the input feature map is processed by the dimensionality reduction operation. Downsampling is performed to reduce the feature size, so that the amount of model calculation can be reduced while ensuring the accuracy of image super-resolution processing.
  • feature splicing may refer to splicing M feature maps of the same scale into a feature map with K channels, where K is a positive integer greater than M.
  • the dense connection operation refers to transferring the output feature map of each layer to the subsequent layers, and the input of the latter layer is obtained by splicing the feature maps of the output of the previous layers.
  • the specific form of the network structure of the scale module may be as shown in FIG. 15.
  • the scale module can be used to perform residual connection operations, dimensionality reduction operations, convolution operations, and loop dense connection operations on the input feature map. That is, the scale module can include residual connections, pooling submodules, convolution submodules, and loops. Dense connection.
  • the cyclic dense connection operation can increase the depth of the scale module network structure, thereby improving the accuracy of super-resolution processing.
  • the recursive operation on the feature map of normal scale will quickly increase the amount of calculation, but the recursive operation on the feature map after the dimensionality reduction operation process increases the amount of calculation less.
  • the combination of a certain number of operations and dimensionality reduction operations can improve the super-resolution accuracy without increasing the amount of calculations and parameters.
  • the first module proposed in the embodiment of the application namely the scale module, can reduce the amount of calculation, reduce the parameter, expand the receptive field, and decouple the parameter amount and calculation amount.
  • the dimensionality reduction operation in the scale module can reduce the calculation amount of the network structure by reducing the scale of the feature map.
  • the calculation amount of the network model can be expressed by the number of floating point operations (FLOPs) per second.
  • FLOPs ori represents the calculation amount of the network model through normal convolution.
  • the pooling operation can reduce the calculation amount by 75%, and even if the three loop operations are added, the calculation amount is only restored to the original calculation amount.
  • the first module is also used to perform a rearrangement operation.
  • the rearrangement operation refers to combining multiple first channel features of the first input feature map according to preset rules to generate a first Two-channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the rearrangement operation as shown in FIG. 16 may refer to merging 4 different first feature channels according to the rules from left to right and top to bottom, and merge the 4 positive channel feature maps into one first feature map.
  • Two-channel feature map the resolution of the second channel feature map is higher than that of the first channel feature map.
  • the rearrangement operation can be seen as converting multiple low-scale feature channels into one high-scale feature channel, thereby reducing the number of channels.
  • the parameter quantity of normal convolution param ori is:
  • Param ori N conv ⁇ G ⁇ C out ;
  • the parameter quantity of the standard module is param up :
  • the basic modules that construct the basic unit include a scale module.
  • the scale module can expand the receptive field through dimensionality reduction operations, so that the prediction of high-resolution pixels can better consider contextual information; another
  • the common hyperdivision methods do not use dimensionality reduction operations, the scale of the input feature map in the entire non-linear change part will not change, resulting in a linear relationship between the parameter amount and the calculation amount.
  • the scale module proposed in the embodiment of the present application uses a dimensionality reduction operation to make the parameter amount and the calculation amount relatively independent, giving more possibilities for the search algorithm in the NAS.
  • the basic module that constructs the basic unit includes a second module and/or a third module in addition to the aforementioned first module, that is, the standard module.
  • the second module and the third module further included in the basic module will be described in detail below in conjunction with FIGS. 17 to 19.
  • the basic module may further include a second module, and the second module may be a compact module (shrink residual dense block, SRDB).
  • the compact module may refer to channel compression processing on the basis of the residual dense block (RDB), so as to achieve the retention of dense connections and effectively reduce the amount of model parameters.
  • the compaction module is used to perform channel compression operations, residual connection operations, and dense connection operations on the second input feature map.
  • the channel compression operation may refer to the second input feature map with a 1 ⁇ 1 convolution kernel.
  • Product operation may refer to the second input feature map with a 1 ⁇ 1 convolution kernel.
  • the second input characteristic map may refer to the characteristic map output by the previous basic unit of the basic unit; when the second module is not the first module in the basic unit In the case of a module, the second input feature map may refer to a feature map output after processing by the previous module of the module.
  • the first input feature map, the second input feature map, and the third input feature map all correspond to the same image to be processed.
  • the network structure of the compaction module can be as shown in Figure 17.
  • Figure 17 shows three compaction modules, namely the d-1th compaction module, the dth compaction module and the d+1th compaction module.
  • a compact module the 1x1 convolution kernel can be used to compress the number of channels of the feature map, and then the feature transformation of 3x3 convolution can be used to form a compact residual dense module can be referred to as a compact module, which can realize the condition of retaining dense links Significantly reduce the number of parameters.
  • the basic module may further include a third module, and the third module may refer to a group residual dense block (GRDB).
  • the grouping module may refer to dividing the convolution operation into multiple groups on the basis of the residual intensive module to calculate separately, thereby helping to reduce the model parameters.
  • the grouping module may be a module for performing channel switching operations, residual connection operations, and dense connection operations on the third input feature map.
  • the third input feature map includes M sub-feature maps, and each sub-feature map is
  • the feature map includes at least two adjacent channel features, and the channel exchange processing may refer to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the M sub feature maps correspond to different sub feature maps.
  • the channel characteristics of are adjacent, and M is an integer greater than 1.
  • the network structure of the grouping module may be as shown in FIG. 18, which shows three grouping modules, namely, the d-1th grouping module, the dth grouping module, and the d+1th grouping module.
  • the group convolution is directly used, the single channel feature of the output layer can only accept the features of the previous convolutional layer, which is not conducive to the channel feature Collaborate with each other. Therefore, in the embodiment of the present application, a channel shuffle operation is added to the residual intensive module, so that the residual intensive module constituting the packet may be referred to as the packet module, thereby effectively reducing the amount of network parameters.
  • the third input feature map includes three sub-feature maps 1, 2, and 3, and each sub-feature map includes 3 adjacent channel features.
  • the channel exchange can be to make the same sub-feature
  • the originally adjacent channel features in the figure are reordered, so that the channel features corresponding to different sub-feature maps are adjacent.
  • the basic unit is a network structure obtained by connecting basic modules through the basic operation of a neural network.
  • a cell as shown in FIG. 12 can be a basic unit, and the basic unit is used for The basic module for constructing image super-resolution network.
  • the basic module is used to construct the basic unit.
  • each basic unit cell can be obtained by connecting different basic modules through the basic operation of the neural network.
  • the basic module can include the above-mentioned first module, second module, and third module. One or more of the modules.
  • Step 820 Construct a search space according to the basic unit and network structure parameters, where the network structure parameter includes the type of the basic module used to construct the basic unit, and the search space is a search space for searching the image super-resolution network structure.
  • the network parameters may include:
  • the types of basic modules can include three different types: the first module, the second module, and the third module.
  • C represents the first module or standard module
  • S represents the second module or compact module
  • G represents the third module.
  • the module is the grouping module.
  • the number of convolutional layers may be ⁇ 4, 6, 8 ⁇ .
  • the number of channels can be ⁇ 16,24,32,48 ⁇ .
  • the number of output channels of a basic unit can be ⁇ 16,24,32,48 ⁇ .
  • the status of the basic unit 1 means that the current node is connected to the network, and 0 means that the current node is not connected to the network.
  • the search space obtained by constructing the basic unit of the basic module can select the candidate network structure in the type of the given basic module, which is equivalent to discretizing the continuous search space, which can effectively reduce the search space. the size of.
  • Step 830 Perform an image super-resolution network structure search in the search space to determine the target image super-resolution network.
  • the target image super-resolution network is used to perform super-resolution processing on the image to be processed, and the target image super-resolution network includes at least the first A module, the target image super-resolution network is a network whose calculation amount is less than a first preset threshold and the image super-resolution accuracy is greater than a second preset threshold.
  • image super-resolution network structure search in the search space to determine the target image super-resolution network may refer to the search in the search space through algorithmic search to determine the network structure that meets the constraints, or it may also refer to Manual search selects the network structure that meets the constraints in the search space.
  • the constraint condition may mean that the amount of calculation is less than the first preset threshold and the image super-resolution accuracy is greater than the second preset threshold, so that when the computing performance of the mobile device is limited, the image The accuracy of the super-resolution network of the target image processed by the super-resolution is higher.
  • the constraint condition may mean that the amount of calculation is less than the first preset threshold, the image super-resolution accuracy is greater than the second preset threshold, and the parameter amount is less than the third preset threshold.
  • common search algorithms may include but are not limited to the following algorithms: random search, Bayesian optimization, evolutionary algorithm, reinforcement learning, gradient-based algorithm, and so on.
  • random search for the specific process of the method for searching the image super-resolution network structure in the search space, reference may be made to the prior art. For brevity, detailed descriptions of all search methods are omitted in this application.
  • an evolutionary algorithm can be used to search for a lightweight, fast, and high-precision super-divided network structure by targeting the parameter amount, calculation amount, and model effect (PSNR) of the network model.
  • PSNR model effect
  • the process of performing a network search in the search space to determine the target image super-resolution network includes the following steps: performing a network search in the search space through an evolutionary algorithm to determine the first image super-resolution network; using a multi-level weighted joint loss function Perform back-propagation iterative training on the first image super-resolution network to determine the target image super-resolution network, wherein the multi-level weighted joint loss function is based on the output of each basic unit in the first image super-resolution network The loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map is determined.
  • the first image super-resolution network determined by the evolutionary algorithm can be subjected to secondary training through the multi-level weighted joint loss function, and finally the parameters of the target image super-resolution network can be determined to obtain the target Image super-resolution network.
  • searching for the target image super-resolution network in the search space through the evolutionary algorithm includes the following steps: randomly generating P candidate network structures according to the basic unit; training the P candidate network structures using a multi-level weighted joint loss function; evaluating training The performance parameters of each candidate network structure in the next P candidate network structures, the performance parameters include peak-to-noise ratio, and the peak signal-to-noise ratio is used to indicate the predicted super-division image and sample super-division obtained by each candidate network structure The difference between images; the first image super-resolution network is determined according to the performance parameters of the candidate network.
  • the evolutionary algorithm execution process may include the following steps:
  • the first step randomly generate P individuals (ie candidate network structures), and the P candidate network structures are the initial population;
  • Step 2 Evaluate the fitness (ie performance parameters) of each network structure, including the amount of parameters, calculations, and accuracy.
  • the accuracy can be measured by the peak signal-to-noise ratio (PSNR);
  • Step 3 Select and update elite individuals, which can be regarded as network structures whose performance parameters meet preset conditions;
  • Step 4 Generate the next generation of individuals through crossover and mutation
  • Step 5 Repeat steps 2 to 4 until the evolutionary algorithm converges, and return to the last elite individual (that is, the first image super-resolution network).
  • the above-mentioned elite individuals may refer to a target network structure determined by an algorithm.
  • the multi-level weighted joint loss function proposed in this application can be used to train the network structure to be evaluated, and the peak value of the network structure is evaluated after the multi-level weighted joint loss function is trained. Signal-to-noise ratio.
  • the multi-level weighted joint loss function is obtained according to the following equation,
  • L can represent a multi-level weighted joint loss function
  • L k can represent the loss value of the kth layer of the first image super-resolution network
  • the loss value can refer to the predicted super-resolution image corresponding to the output feature map of the kth layer
  • the image loss between the super-resolution image and the sample, ⁇ k,t can represent the weight of the loss value of the k-th layer at time t.
  • the training level of the underlying basic unit may vary.
  • this embodiment of the application proposes a multi-level weighted joint loss function, that is, during training, the prediction super can be obtained according to the output feature map of each basic unit.
  • the resolution image is divided, and the loss value between the predicted super resolution image and the sample super resolution image is calculated, and the image loss value of each basic unit is weighted and the network is trained.
  • each intermediate layer image loss may change with time (or the number of iterations).
  • the loss function can combine the predicted image loss of each intermediate layer, and reflect the importance of different layers by weighting. Among them, the weight value of each intermediate layer image loss can change over time, which is conducive to more fully training the bottom layer
  • the parameters of the basic unit to improve the performance of the super-resolution network may change with time (or the number of iterations).
  • Table 1 shows the performance of the basic module of the basic unit proposed in this application by testing on the standard super-division data set.
  • Table 1 shows the experimental results of several image super-resolution network models constructed with the basic modules proposed in this application.
  • the number of floating point operations per second represents the amount of calculation of the network model, which is The number of floating-point operations per second can be used to describe the calculation amount of the neural network and evaluate the calculation efficiency of the model; parameters can be used to describe the parameters included in the neural network to evaluate the size of the model; SET5, SET14, B100 , Urban100 represents the name of different data sets.
  • the peak signal-to-noise ratio (PSNR) of the network model can be evaluated;
  • Baseline represents a small residual dense network.
  • the basic modules proposed in the embodiments of this application for example, include the standard module GRDN, compact module SRDN, and packet module CRDN) and the target image super-resolution network, that is, the efficient super-resolution network.
  • -resolution network, ESRN can effectively improve the accuracy of the model without changing the amount of parameters and calculations.
  • Table 2 is the test result of the multi-level weighted joint loss function proposed in the embodiment of the application.
  • Table 2 shows the experimental results of the deep convolutional network after applying the multi-level weighted joint loss function, where Joint loss represents the network model trained by the multi-level weighted loss function proposed in the embodiment of this application. It can be seen from Table 2 that training the image super-resolution network through the multi-level weighted joint loss function provided in the embodiment of the present application can effectively improve the accuracy of the image super-resolution network.
  • Table 3 is the result statistics of the image super-resolution network provided in the embodiment of the present application on the standard data set.
  • type 1 means that the running time of the image super-resolution model is Fast; the running time of the type 2 image super-resolution model is Very Fast; the model includes selecting deep networks with SUs, SelNet, and cascaded residual networks ( cascading residual network (CARN), mini cascading residual network (CARN-M), lightweight fast accurate and light super-resolution network (FALSR), FALSR-A and FALSR -B represents different network models;
  • ESRN represents the target image super-resolution network in the embodiment of this application, that is, an efficient super-resolution network, for example, it can be a fast efficient super-resolution network (ESRN-F) ), small efficient super-resolution network (ESRN-M). It can be seen from Table 3 that the target image super-resolution network provided by the embodiment of the present application and the calculation amount of the basic module and the image super-resolution accuracy are better than other network
  • Table 4 is the test results of the target image super-resolution network provided by the embodiments of the present application on different super-resolution scales.
  • a multiple of ⁇ 3 means that the output super-resolution image is 720p (1280 ⁇ 720) based on the super-resolution test of 3 times the scale
  • a multiple of ⁇ 4 means that the output super-resolution image is 720p (1280 ⁇ 720) 4 times the scale of the super-resolution test
  • models include super-resolution convolutional neural network (SRCNN), deep super-resolution network (very deep convolutional super-resolution network, VDSR), SelNet, CARN , CARN-M, ESRN, ESRN-F, ESRN-M.
  • the calculation of FLOPs in the above table 1 to table 3 is based on the output super-division image of 720p (1280x720) as an example, the x2 scale image super-resolution processing test results, from the data in Table 1 to Table 3 can be seen, the The neural network search method provided in the application embodiment can find a model with better super-resolution accuracy under different parameters.
  • the dimensionality reduction operation is introduced in the image super-resolution network provided by the embodiment of this application, it is also possible to search for a fast medium-parameter model by constraining the calculation amount FLOPs of the model to ensure the image super-resolution effect In the case of higher than the FALSR-A model, the calculation amount can be reduced by nearly half.
  • Table 5 is the test result of the running time of the target image super-resolution network provided by the embodiment of the present application. It can be seen from Table 5 that the super-resolution network obtained by the neural network search method in the embodiment of the present application not only has high accuracy, but also has high operating efficiency.
  • FIG. 23 and FIG. 24 are effect diagrams of image super-resolution processing performed by the target image super-resolution network determined by the neural network search method of the embodiment of the present application.
  • FIG. 23 and FIG. 24 show the image effect after the image super-resolution network constructed by the basic module proposed in this application performs image resolution processing.
  • FIG. 23 shows the visual effect diagram of the images in the Set14 data set after the super-resolution processing.
  • Figure 24 shows the visual effect of the images in the Urban100 dataset after super-resolution processing.
  • HR high resolution networks
  • LapSRN deep laplacian pyramid networks for super-resolution
  • LapSRN bicubic interpolation networks
  • CARN-M CARN
  • VDSR ESRN-M
  • ESRN ESRN
  • the image super-resolution network obtained by the neural network search method proposed in this application can not only reduce the amount of network parameters and the amount of calculation, but also can effectively improve the visual effect of the image super-division, making the edge of the super-division image clearer.
  • FIG. 25 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the method 900 shown in FIG. 25 includes step 910 and step 920, and step 910 and step 920 will be described in detail below.
  • Step 910 Obtain an image to be processed.
  • the image to be processed may be an image captured by the electronic device through a camera, or the image to be processed may also be an image obtained from within the electronic device (for example, an image stored in an album of the electronic device, or the electronic device from the cloud Image obtained).
  • Step 920 Perform super-resolution processing on the image to be processed according to the target image super-resolution network to obtain a target image, where the target image is a super-resolution image corresponding to the image to be processed.
  • the aforementioned target image super-resolution network may be obtained according to the method shown in FIG. 11.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in the search space.
  • the search space is constructed by basic units and network structure parameters.
  • the search space is used to search for the image super-resolution network structure.
  • the structural parameters include the type of the basic module used to construct the basic unit.
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network.
  • the basic module includes the first module, which is used to A residual connection operation and a dimensionality reduction operation are performed on an input feature map.
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the first module, and the dimensionality reduction operation is used To transform the scale of the first input feature map from the original first scale to the second scale, the second scale is smaller than the first scale, the target image super-resolution network includes at least the first module, and the first The scale of the feature map processed by the module is the same as the scale of the first input feature map.
  • the basic unit is a basic module for constructing an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation
  • the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing.
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  • the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
  • the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1 ⁇ 1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the second module is used to perform channel compression operations on the second input feature map
  • the channel compression operation refers to a convolution operation with
  • the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network,
  • the first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the kth basic unit of the first image super-resolution network
  • the loss value refers to the kth
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the For the number of the basic units included in the first image super-resolution network, N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function
  • the performance parameter includes a peak-to-noise ratio
  • the peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
  • FIG. 26 is a schematic flowchart of an image display method provided by an embodiment of the present application.
  • the method 1000 shown in FIG. 26 includes steps 1010 to 1040, and these steps will be described in detail below.
  • Step 1010 The first operation used by the user to turn on the camera is detected.
  • Step 1020 In response to the first operation, display a photographing interface on the display screen, and display a photographing interface on the display screen.
  • the photographing interface includes a viewfinder frame, and the viewfinder frame includes a first image.
  • the user's shooting behavior may include a first operation of the user to turn on the camera; in response to the first operation, displaying a shooting interface on the display screen.
  • FIG. 27 shows a graphical user interface (GUI) of the mobile phone, and the GUI is the desktop 1110 of the mobile phone.
  • GUI graphical user interface
  • the electronic device detects that the user has clicked the icon 1120 of the camera application (application, APP) on the desktop 1110, it can start the camera application and display another GUI as shown in (b) in Figure 27, which can be called It is the shooting interface 1130.
  • the shooting interface 1130 may include a viewing frame 1140. In the preview state, the preview image can be displayed in the viewfinder frame 1140 in real time.
  • a first image may be displayed in the view frame 1140, and the first image is a color image.
  • the shooting interface may also include a control 1150 for indicating the shooting mode, and other shooting controls.
  • the user's shooting behavior may include a first operation of the user to turn on the camera; in response to the first operation, displaying a shooting interface on the display screen.
  • the shooting interface may include a viewfinder frame. It is understandable that the size of the viewfinder frame may be different in the photo mode and the video mode.
  • the viewfinder frame may be the viewfinder frame in the photo mode. In video mode, the viewfinder frame can be the entire display screen.
  • the preview state that is, before the user turns on the camera and does not press the photo/video button, the preview image can be displayed in the viewfinder in real time.
  • the preview image may be a color image
  • the preview image may be an image displayed when the camera is set to automatic resolution
  • Step 1030 Detect the second operation of the camera instructed by the user.
  • the first processing mode may be a professional shooting mode (for example, a super-resolution shooting mode).
  • the shooting interface includes a shooting option 1160.
  • the electronic device displays a shooting mode interface.
  • the electronic device detects that the user clicks on the shooting mode interface to indicate the professional shooting mode 1161, the mobile phone enters the professional shooting mode.
  • the electronic device detects a second operation 1170 used by the user to instruct shooting in a low-light environment.
  • the second operation used by the user to instruct the shooting behavior may include pressing the shooting button in the camera of the electronic device, or may include the user equipment instructing the electronic device to perform the shooting behavior through voice, or may also include other instructions from the user.
  • the device performs shooting behavior.
  • Step 1040 In response to the second operation, display a second image in the viewing frame, where the second image is an image after super-resolution processing is performed on the first image collected by the camera, where , The target image is a super-resolution image corresponding to the image to be processed.
  • the aforementioned target image super-resolution network may be obtained according to the method shown in FIG. 11.
  • the above-mentioned target image super-resolution network is a network determined by searching the image super-resolution network structure in a search space, the search space is constructed by basic units and network structure parameters, and the search space is used to search for image super-resolution
  • the network structure, the network structure parameter includes the type of the basic module used to construct the basic unit
  • the basic unit is a network structure obtained by connecting the basic modules through the basic operation of the neural network
  • the basic module includes the first A module
  • the first module is used to perform residual connection operations and dimensionality reduction operations on a first input feature map
  • the residual connection operation refers to combining the first input feature map with the first module processing
  • the latter feature map is subjected to feature addition processing, and the dimensionality reduction operation is used to transform the scale of the first input feature map from the original first scale to a second scale, the second scale being smaller than the first scale
  • the target image super-resolution network includes at least the first module, and the scale of the feature map processed by the first module is the same as the scale
  • the second image is displayed in the viewfinder frame in Figure 28(d), and the first image is displayed in the viewfinder frame in Figure 28(c).
  • the content of the second image and the first image are the same or substantially the same , but the quality of the second image is better than that of the first image. For example, the resolution of the second image is higher than that of the first image.
  • the basic unit is a basic module used to construct an image super-resolution network.
  • the dimensionality reduction operation may include at least one of a pooling operation and a convolution operation with a step size of Q, where Q is a positive integer greater than 1.
  • the feature map processed by the first module is a feature map that has undergone a dimensionality upgrade operation
  • the dimensionality upgrade operation refers to the feature map that has undergone the dimensionality reduction processing.
  • the scale of the map is restored to the first scale
  • the residual connection operation refers to performing feature addition processing on the first input feature map and the feature map processed by the dimension upscaling operation.
  • the first module is further configured to perform a dense connection operation on the first input feature map, where the dense connection operation refers to convolution of i-1
  • the output feature map of each convolutional layer in the layer and the first input feature map are feature spliced as the input feature map of the i-th convolutional layer, and i is a positive integer greater than 1.
  • the dense connection operation is a cyclic dense connection operation
  • the cyclic dense connection operation refers to characterizing the first input feature map after the channel compression processing. Splicing processing.
  • the first module is also used to perform a rearrangement operation, and the rearrangement operation refers to performing multiple first channel features of the first input feature map according to a preset It is assumed that the merging process is performed on a rule to generate a second channel feature, wherein the resolution of the second channel feature is higher than the resolution of the first channel feature.
  • the basic module further includes a second module and/or a third module, wherein the second module is used to perform channel compression operations on the second input feature map, and In the residual connection operation and the dense connection operation, the channel compression operation refers to a convolution operation with a 1 ⁇ 1 convolution kernel on the second input feature map; the third module is used to Input the feature map to perform the channel exchange operation, the residual connection operation, and the dense connection operation, the third input feature map includes M sub-feature maps, and each sub-feature map in the M sub-feature map includes at least two Adjacent channel features, the channel exchange processing refers to reordering at least two adjacent channel features corresponding to the M sub feature maps, so that the channel features corresponding to different sub feature maps in the M sub feature maps Adjacent, the M is an integer greater than 1, and the first input feature map, the second input feature map, and the third input feature map correspond to the same image.
  • the second module is used to perform channel compression operations on the second input feature map
  • the channel compression operation refers to a convolution operation with
  • the target image super-resolution network is a network determined by performing back-propagation iterative training on the first image super-resolution network through a multi-level weighted joint loss function, wherein The multi-level weighted joint loss function is determined according to the loss between the predicted super-resolution image and the sample super-resolution image corresponding to the feature map output by each basic unit in the first image super-resolution network,
  • the first image super-resolution variable rate network refers to a network that is searched and determined in the search space through an evolutionary algorithm for an image super-resolution network structure.
  • the multi-level weighted joint loss function is obtained according to the following equation:
  • L represents the multi-level weighted joint loss function
  • L k represents the loss value of the kth basic unit of the first image super-resolution network
  • the loss value refers to the kth The image loss between the predicted super-resolution image and the sample super-resolution image corresponding to the output feature map of the basic unit
  • ⁇ k,t represents the weight of the loss value of the k-th layer at time t
  • N represents the first The number of the basic units included in the image super-resolution network, where N is an integer greater than or equal to 1.
  • the first image super-resolution network is determined by the performance parameters of each candidate network structure in the P candidate network structures, and the P candidate network structures are determined based on Randomly generated by the basic unit
  • the performance parameter refers to a parameter that evaluates the performance of the P candidate network structures trained by using the multi-level weighted joint loss function
  • the performance parameter includes a peak-to-noise ratio
  • the peak signal-to-noise ratio is used to indicate the difference between the predicted super-division image and the sample super-division image obtained through each candidate network structure.
  • the neural network search device in the embodiment of this application can execute the various neural network search methods of the aforementioned embodiment of this application
  • the image processing device can execute the aforementioned various image processing methods of the embodiment of this application, namely the following For the specific working process of this product, refer to the corresponding process in the foregoing method embodiment.
  • FIG. 29 is a schematic diagram of the hardware structure of a neural network search device provided by an embodiment of the present application.
  • the neural network search device 1200 shown in FIG. 29 (the device 1200 may specifically be a computer device) includes a memory 1201, a processor 1202, a communication interface 1203, and a bus 1204. Among them, the memory 1201, the processor 1202, and the communication interface 1203 implement communication connections between each other through the bus 1204.
  • the memory 1201 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 1201 may store a program.
  • the processor 1202 is configured to execute each step of the neural network search method of the embodiment of the present application, for example, execute each step shown in FIG. 11 .
  • the neural network search device shown in the embodiment of the present application may be a server, for example, it may be a cloud server, or may also be a chip configured in a cloud server.
  • the processor 1202 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to implement the neural network search method in the method embodiment of the present application.
  • the processor 1202 may also be an integrated circuit chip with signal processing capability.
  • the various steps of the neural network search method of the present application can be completed by hardware integrated logic circuits in the processor 1202 or instructions in the form of software.
  • the above-mentioned processor 1202 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an application specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1201, and the processor 1202 reads the information in the memory 1201, and combines its hardware to complete the functions required by the units included in the search device of the neural network, or execute the method shown in FIG. 11 of the method embodiment of the application.
  • the neural network search method shown shown.
  • the communication interface 1203 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 1200 and other devices or a communication network.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 1200 and other devices or a communication network.
  • the bus 1204 may include a path for transferring information between various components of the device 1200 (for example, the memory 1201, the processor 1202, and the communication interface 1203).
  • FIG. 30 is a schematic diagram of the hardware structure of an image processing apparatus according to an embodiment of the present application.
  • the image processing apparatus 1300 shown in FIG. 30 includes a memory 1301, a processor 1302, a communication interface 1303, and a bus 1304.
  • the memory 1301, the processor 1302, and the communication interface 1303 implement communication connections between each other through the bus 1304.
  • the memory 1301 may be ROM, static storage device and RAM.
  • the memory 1301 may store a program.
  • the processor 1302 and the communication interface 1303 are used to execute each step of the image processing method of the embodiment of the present application. For example, FIG. 25 and FIG. 26 shows the various steps of the image processing method.
  • the processor 1302 may adopt a general-purpose CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits to execute related programs to realize the functions required by the units in the image processing apparatus of the embodiment of the present application. Or execute the image processing method in the method embodiment of this application.
  • the processor 1302 may also be an integrated circuit chip with signal processing capability.
  • each step of the image processing method in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 1302 or instructions in the form of software.
  • the aforementioned processor 1302 may also be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1301, and the processor 1302 reads the information in the memory 1301, and combines its hardware to complete the functions required by the units included in the image processing apparatus of the embodiment of the present application, or perform the image processing of the method embodiment of the present application method.
  • the communication interface 1303 uses a transceiving device such as but not limited to a transceiver to implement communication between the device 1300 and other devices or communication networks. For example, the image to be processed can be acquired through the communication interface 1303.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device 1300 and other devices or communication networks.
  • the image to be processed can be acquired through the communication interface 1303.
  • the bus 1304 may include a path for transferring information between various components of the device 1300 (for example, the memory 1301, the processor 1302, and the communication interface 1303).
  • the above-mentioned apparatus 1200 and apparatus 1300 only show a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that 1200 and apparatus 1300 may also include those necessary for normal operation. Other devices. At the same time, according to specific needs, those skilled in the art should understand that the above-mentioned apparatus 1200 and apparatus 1300 may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the above-mentioned apparatus 1200 and apparatus 1300 may also only include the components necessary to implement the embodiments of the present application, and not necessarily include all the components shown in FIG. 29 or FIG. 30.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé et un appareil de recherche de réseau neuronal dans le domaine de la vision par ordinateur en intelligence artificielle. Le procédé de recherche consiste : à construire une unité de base, l'unité de base étant une structure de réseau obtenue par connexion de modules de base au moyen d'une opération de base d'un réseau neuronal, les modules de base comprenant un premier module, le premier module servant à effectuer une opération de réduction de dimension et une opération résiduelle de connexion sur une première carte de caractéristiques d'entrée, l'opération de réduction de dimension servant à convertir la dimension de la première carte de caractéristiques d'entrée d'une première dimension d'origine en une seconde dimension, la seconde dimension étant inférieure à la première dimension et l'opération résiduelle de connexion servant à effectuer un traitement de sommation de caractéristiques sur la première carte de caractéristiques d'entrée et sur une carte de caractéristiques traitée par le premier module ; à construire un espace de recherche, selon l'unité de base et les paramètres de structure de réseau ; et à rechercher une structure de réseau dans l'espace de recherche, pour déterminer un réseau à super-résolution d'image cible. La présente invention peut améliorer la précision d'un réseau à super-résolution dans le cas d'une certaine performance informatique.
PCT/CN2020/105369 2019-07-30 2020-07-29 Procédé et appareil de recherche de réseau neuronal WO2021018163A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910695706.7A CN112308200B (zh) 2019-07-30 2019-07-30 神经网络的搜索方法及装置
CN201910695706.7 2019-07-30

Publications (1)

Publication Number Publication Date
WO2021018163A1 true WO2021018163A1 (fr) 2021-02-04

Family

ID=74230275

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105369 WO2021018163A1 (fr) 2019-07-30 2020-07-29 Procédé et appareil de recherche de réseau neuronal

Country Status (2)

Country Link
CN (1) CN112308200B (fr)
WO (1) WO2021018163A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990053A (zh) * 2021-03-29 2021-06-18 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN113033422A (zh) * 2021-03-29 2021-06-25 中科万勋智能科技(苏州)有限公司 基于边缘计算的人脸检测方法、系统、设备和存储介质
CN113065426A (zh) * 2021-03-19 2021-07-02 浙江理工大学 基于通道感知的手势图像特征融合方法
CN113869395A (zh) * 2021-09-26 2021-12-31 大连理工大学 基于特征融合及神经网络搜索的轻量型水下目标检测方法
CN114998958A (zh) * 2022-05-11 2022-09-02 华南理工大学 一种基于轻量化卷积神经网络的人脸识别方法
CN115131727A (zh) * 2022-06-12 2022-09-30 西北工业大学 一种基于残差单元结构搜索的行人重识别方法
CN115841587A (zh) * 2022-10-24 2023-03-24 智慧眼科技股份有限公司 图像分类任务的特征提取方法、装置、设备及存储介质
CN116416468A (zh) * 2023-04-11 2023-07-11 安徽中科星联信息技术有限公司 一种基于神经架构搜索的sar目标检测方法
CN116703729A (zh) * 2023-08-09 2023-09-05 荣耀终端有限公司 一种图像处理方法、终端、存储介质及程序产品

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580639B (zh) * 2021-03-01 2021-08-13 四川大学 一种基于进化神经网络模型压缩的早期胃癌图像识别方法
CN113706530A (zh) * 2021-10-28 2021-11-26 北京矩视智能科技有限公司 基于网络结构的表面缺陷区域分割模型生成方法及装置
CN115273129B (zh) * 2022-02-22 2023-05-05 珠海数字动力科技股份有限公司 基于神经架构搜索的轻量级人体姿态估计方法及装置
CN115601792A (zh) * 2022-12-14 2023-01-13 长春大学(Cn) 一种牛脸图像增强方法
CN117058000B (zh) * 2023-10-10 2024-02-02 苏州元脑智能科技有限公司 用于图像超分辨率的神经网络架构搜索方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132147A1 (fr) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Amélioration de données visuelles
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN109862370A (zh) * 2017-11-30 2019-06-07 北京大学 视频超分辨率处理方法及装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10803378B2 (en) * 2017-03-15 2020-10-13 Samsung Electronics Co., Ltd System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN108985457B (zh) * 2018-08-22 2021-11-19 北京大学 一种受优化算法启发的深度神经网络结构设计方法
CN109284820A (zh) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 一种深度神经网络的结构搜索方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132147A1 (fr) * 2015-02-19 2016-08-25 Magic Pony Technology Limited Amélioration de données visuelles
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN109862370A (zh) * 2017-11-30 2019-06-07 北京大学 视频超分辨率处理方法及装置

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065426A (zh) * 2021-03-19 2021-07-02 浙江理工大学 基于通道感知的手势图像特征融合方法
CN113065426B (zh) * 2021-03-19 2023-10-17 浙江理工大学 基于通道感知的手势图像特征融合方法
CN112990053B (zh) * 2021-03-29 2023-07-25 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN113033422A (zh) * 2021-03-29 2021-06-25 中科万勋智能科技(苏州)有限公司 基于边缘计算的人脸检测方法、系统、设备和存储介质
CN112990053A (zh) * 2021-03-29 2021-06-18 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN113869395A (zh) * 2021-09-26 2021-12-31 大连理工大学 基于特征融合及神经网络搜索的轻量型水下目标检测方法
CN113869395B (zh) * 2021-09-26 2024-05-24 大连理工大学 基于特征融合及神经网络搜索的轻量型水下目标检测方法
CN114998958A (zh) * 2022-05-11 2022-09-02 华南理工大学 一种基于轻量化卷积神经网络的人脸识别方法
CN114998958B (zh) * 2022-05-11 2024-04-16 华南理工大学 一种基于轻量化卷积神经网络的人脸识别方法
CN115131727B (zh) * 2022-06-12 2024-03-15 西北工业大学 一种基于残差单元结构搜索的行人重识别方法
CN115131727A (zh) * 2022-06-12 2022-09-30 西北工业大学 一种基于残差单元结构搜索的行人重识别方法
CN115841587A (zh) * 2022-10-24 2023-03-24 智慧眼科技股份有限公司 图像分类任务的特征提取方法、装置、设备及存储介质
CN115841587B (zh) * 2022-10-24 2023-11-24 智慧眼科技股份有限公司 图像分类任务的特征提取方法、装置、设备及存储介质
CN116416468A (zh) * 2023-04-11 2023-07-11 安徽中科星联信息技术有限公司 一种基于神经架构搜索的sar目标检测方法
CN116416468B (zh) * 2023-04-11 2023-10-03 安徽中科星联信息技术有限公司 一种基于神经架构搜索的sar目标检测方法
CN116703729A (zh) * 2023-08-09 2023-09-05 荣耀终端有限公司 一种图像处理方法、终端、存储介质及程序产品
CN116703729B (zh) * 2023-08-09 2023-12-19 荣耀终端有限公司 一种图像处理方法、终端、存储介质及程序产品

Also Published As

Publication number Publication date
CN112308200A (zh) 2021-02-02
CN112308200B (zh) 2024-04-26

Similar Documents

Publication Publication Date Title
WO2021018163A1 (fr) Procédé et appareil de recherche de réseau neuronal
CN110188795B (zh) 图像分类方法、数据处理方法和装置
WO2020177651A1 (fr) Procédé de segmentation d'image et dispositif de traitement d'image
CN111402130B (zh) 数据处理方法和数据处理装置
US20220215227A1 (en) Neural Architecture Search Method, Image Processing Method And Apparatus, And Storage Medium
WO2020177607A1 (fr) Procédé et appareil de débruitage d'image
WO2022042713A1 (fr) Procédé d'entraînement d'apprentissage profond et appareil à utiliser dans un dispositif informatique
WO2022116856A1 (fr) Structure de modèle, procédé de formation de modèle, et procédé et dispositif d'amélioration d'image
WO2021043273A1 (fr) Procédé et appareil d'amélioration d'image
WO2021164731A1 (fr) Procédé d'amélioration d'image et appareil d'amélioration d'image
WO2022001805A1 (fr) Procédé et dispositif de distillation de réseau neuronal
CN112236779A (zh) 基于卷积神经网络的图像处理方法和图像处理装置
CN111914997B (zh) 训练神经网络的方法、图像处理方法及装置
CN112070664B (zh) 一种图像处理方法以及装置
WO2021008206A1 (fr) Procédé de recherche d'architecture neuronale, et procédé et dispositif de traitement d'images
US20220157041A1 (en) Image classification method and apparatus
CN110222718B (zh) 图像处理的方法及装置
CN113011562A (zh) 一种模型训练方法及装置
WO2021103731A1 (fr) Procédé de segmentation sémantique et procédé et appareil d'apprentissage de modèle
WO2021042774A1 (fr) Procédé de récupération d'image, procédé d'entraînement de réseau de récupération d'image, dispositif, et support de stockage
CN113191489B (zh) 二值神经网络模型的训练方法、图像处理方法和装置
WO2022021938A1 (fr) Procédé et dispositif de traitement d'image, et procédé et dispositif d'apprentissage de réseau neutre
WO2021018251A1 (fr) Procédé et dispositif de classification d'image
CN112561028A (zh) 训练神经网络模型的方法、数据处理的方法及装置
WO2024002211A1 (fr) Procédé de traitement d'image et appareil associé

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20848622

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20848622

Country of ref document: EP

Kind code of ref document: A1