CN117710786A

CN117710786A - Image processing method, optimization method of image processing model and related equipment

Info

Publication number: CN117710786A
Application number: CN202310979725.9A
Authority: CN
Inventors: 姚万欣
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2024-03-15

Abstract

The application provides an image processing method, an optimization method of an image processing model and related equipment. According to the image processing method, the electronic equipment processes the image based on the first neural network model after acquiring the image, and outputs the image processed by the first neural network model. The first neural network model is obtained by optimizing a second neural network model adaptive to hardware of the electronic equipment, and the channels in the model are screened under the condition that the structure of the model is unchanged through expansion and structural pruning of the channels of the convolution layer, so that the structure of the first neural network model is identical to that of the second neural network model, and the image processing effect is improved under the condition that the image processing time of the electronic equipment is not prolonged. According to the optimization method of the image processing model, the first neural network model is obtained by processing the second neural network model, and the accuracy and effect of the model can be optimized on the premise that the performance of the model is not deteriorated.

Description

Image processing method, optimization method of image processing model and related equipment

Technical Field

The present disclosure relates to the field of terminal technologies, and in particular, to an image processing method, an optimization method of an image processing model, and related devices.

Background

Convolutional neural network models are commonly used for image processing. In general, the effect of image processing can be improved by optimizing the accuracy and effect of the convolutional neural network model. However, optimizing the accuracy and effect of the convolutional neural network model is likely to increase the complexity of the convolutional neural network model, resulting in deterioration of the performance of the algorithm, i.e., a slower reasoning speed of the model, and a longer time required for image processing. Many scenes needing image processing, such as mobile phone photographing, smoke and fire detection, face recognition, etc., have requirements on the effect of image processing and the time required for image processing.

Disclosure of Invention

The application provides an image processing method, an optimization method of an image processing model and related equipment. According to the image processing method, the image processing effect can be improved under the condition that the time for processing the image by the electronic equipment is not prolonged. According to the optimization method of the image processing model, the accuracy and effect of the model can be optimized on the premise that the performance of the neural network model is not deteriorated.

In a first aspect, an image processing method is provided, which is applied to an electronic device. The method may include: acquiring an image; processing the image based on the first neural network model; outputting an image processed by the first neural network model; the first neural network model is obtained according to a second neural network model, the second neural network model is used for image processing, the structure of the second neural network model is matched with the hardware structure of the electronic device, the structure of the first neural network model is identical with that of the second neural network model, the first neural network model is obtained according to the second neural network model, and the method comprises the following steps: the first neural network model comprises Y third convolution layers, the second neural network model comprises Y first convolution layers, the first neural network model is obtained by pruning channels of Y second convolution layers in the third neural network model, and the Y third convolution layers are in one-to-one correspondence with the Y second convolution layers; the third neural network model is obtained by respectively increasing the number of channels in Y first convolution layers in the second neural network model, the Y first convolution layers and the Y second convolution layers are in one-to-one correspondence, Y is more than or equal to 2, and Y is an integer.

In the image processing method provided by the application, after the electronic device acquires the image, the electronic device processes the image based on the first neural network model obtained according to the second neural network model. And performing channel expansion on the second neural network model to obtain a third neural network model which is more complex than the second neural network model. The third neural network model processes the image more efficiently than the second neural network model, but the model complexity is higher and thus the inference speed is slower. And pruning is carried out on the third neural network model, so that the complexity of the third neural network model can be reduced on the premise of not obviously reducing the model precision, and the reasoning speed of the model can be improved. The first neural network model and the second neural network model which are finally obtained have the same structure, so that the image processing effect can be improved under the condition that the time for processing the image by the electronic equipment is not prolonged by the method. In addition, since the structure of the second neural network model is adapted to the hardware structure of the electronic device, the structure of the first neural network model is also adapted to the hardware structure of the electronic device. According to the scheme, the software and hardware adaptation process can be saved, the cost of model optimization is reduced, and meanwhile, the utilization rate of the chip is not affected.

It will be appreciated that the image processed by the first neural network model may also need to undergo a series of subsequent processes to be displayed on the interface of the electronic device, where the subsequent processes may include highlighting, sharpening, color correction, etc., but the effects of these subsequent processes on the sharpness and noise of the final output image may be almost negligible compared to the effects of the model #a or the third neural network on sharpness and noise.

It is also understood that the first neural network model is identical in structure to the second neural network model, i.e., the number of convolution layers of the two models is identical, and the number of channels in the corresponding convolution layers in the two models is equal, e.g., the number of channels in the ith convolution layer of the first neural network model is equal to the number of channels in the ith convolution layer of the second neural network model. In other words, increasing the number of channels for the ith convolution layer of the second neural network model to obtain the third neural network model requires deleting the same number of channels as the increased number of channels for the ith convolution layer in the third neural network model to make a neural network model have the same structure as the second neural network model.

The structure of the first neural model in the present application is designed according to the characteristics of the deployed hardware platform, or the structure of the first neural model is adapted to the hardware structure of the electronic device, which can be understood by referring to the following examples: when a chip in the electronic equipment runs the convolutional neural network model, the chip allocates resources for processing channels for each convolutional layer in the convolutional neural network model, and the number of channels which can be used for the resources is an integer multiple of a numerical value (for example, G is represented by G, G is more than or equal to 1, and G is an integer). G is determined by the hardware architecture design of the chip, e.g. G here may be 16, 64 or 48. For example, if the number of channels in a certain convolution layer is G, then the chip invokes the resources corresponding to the G channels to process the channels in the convolution layer. For another example, if the number of channels in a certain convolution layer is 2G, then the chip invokes the resources corresponding to the 2G channels to process the channels in the convolution layer. For another example, if the number of channels in a certain convolution layer is G ', and G' < G, the chip still calls the resources corresponding to G channels to process the channels in the convolution layer, and in this case, the utilization of the chip may be reduced. Under the condition that the number of channels of each convolution layer in the second neural network model is an integer multiple of G, it can be understood that the structure of the first neural model is designed according to the characteristics of the deployed hardware platform, or that the structure of the first neural model is matched with the hardware structure of the electronic device, and under the condition, the utilization rate of the chip can be ensured as high as possible.

With reference to the first aspect, in one possible implementation manner, processing the image based on the first neural network model includes: obtaining image data according to the image, wherein the image data is used for indicating all pixels in the image; inputting the image data into a first neural network model; performing convolution calculation on the image data by an ith third convolution layer in the Y third convolution layers, and outputting an ith calculation result; and (3) inputting the ith calculation result into the (i+1) th third convolution layer after calculation of other non-convolution layers, wherein i is more than or equal to 1 and less than or equal to Y-1, and i is an integer.

It will be understood that, after each convolution layer in the neural network model performs convolution calculation on the portion of the image data #1 corresponding to each pixel, the calculation result is input to the next layer, and finally the processed image is output by the model. The network structure also includes other layers; for example, the result of the convolution calculation is first passed through the activation layer and then input to the next convolution layer, or two results obtained by the convolution calculation of two convolution layers are respectively passed through the activation layer, and then added and input to the next convolution layer.

As one example, after the electronic apparatus acquires an image, image data (hereinafter referred to as image data #1 for convenience of explanation) is obtained from the image, and the image data #1 is used to indicate all pixels in the image. For example, the image data #1 may be in an original (RAW) format or a red, green, and blue (RGB) format. In one possible implementation, the image processed by the first neural network model may be image data in RAW format or RGB format (hereinafter referred to as image data #3 for convenience of description).

In the mobile phone photographing scene, the pixels in the image processed by the first neural network model are in one-to-one correspondence with the pixels indicated by the image data #1, or the pixels indicated by the image data #3 are in one-to-one correspondence with the pixels indicated by the image data #1, for example, the image output by the image data #3 after a series of subsequent processing is improved in definition and reduced in noise relative to the image acquired by the electronic device; in the face recognition scene, the image data #3 is coordinate data of the five sense organs of the person in the image composed of all pixels of the image data # 1; in the smoke detection scene, the image data #3 is coordinate data of smoke and fire in an image composed of all pixels of the image data # 1.

With reference to the first aspect, in one possible implementation manner, the third neural network model is obtained by increasing the number of channels in Y first convolution layers in the second neural network model, including: the third neural network model is obtained by training the fourth neural network model to converge based on a data set, wherein the data set is a data set used for training the second neural network model; the fourth neural network model is obtained by increasing the number of channels of each of the Y first convolutional layers, respectively.

With reference to the first aspect, in one possible implementation manner, the first neural network model is obtained by performing pruning processing on channels of Y second convolution layers in the third neural network model, and the method includes: the first neural network model is obtained by fine tuning the fifth neural network based on a data set, wherein the data set is a data set for training the second neural network model; the fifth neural network model is obtained by pruning the channels of the Y second convolution layers in the third neural network model respectively.

It is understood that in the present application, in the same scenario, the data sets used for training different neural network models are all the same data set. Wherein the dataset comprises input data and target data. Taking the face recognition scenario as an example, the input data may be image data obtained by processing an image acquired by a camera of the electronic device, for example, may be in an original (RAW) format or a red, green and blue (RGB) format. The target data is coordinate data of five sense organs on the face in the image, such as coordinates of eyes and corners of mouth. Taking a mobile phone photographing scene as an example, the input data may be image data #1 obtained by processing an image #1 acquired by a camera of the electronic device, for example, may be in a RAW format or an RGB format, where the image data #1 is used to indicate all pixels of the image # 1. The target data is image data #2, which may be, for example, a RAW format or an RGB format, and the image data #2 is used to indicate all pixels of the image #2, and the image #2 is substantially identical to the content displayed by the image #1, except that the image #2 has higher sharpness and fewer noise points relative to the image #1, or the image #2 does not include noise points and has high sharpness.

It can be further understood that the model is trained in the present application, so that, after input data is input to the model, the result output by the model can be as close to the target data as possible, so that the better the learning of the parameters in the model is, the better the effect of the model in processing the image is.

It is also understood that the first neural network model and the second neural network model, while being identical in structure, have different parameters of the convolution layers within the models. The convolution calculation is performed on the image data by using parameters in the convolution layer. In the process of obtaining the first neural network model according to the second neural network model, the first neural network model inherits part of convolutional layer parameters from a third neural network model with a more complex structure for retraining, and better parameter initialization is achieved compared with the process of directly training the second neural network model, so that the effect of processing the image by the first neural network model is obviously optimized relative to the effect of processing the image by the second neural network model.

With reference to the first aspect, in one possible implementation manner, the first neural network model is obtained by performing pruning processing on channels of Y second convolution layers in the third neural network model, which includes: the first neural network model is obtained by calculating importance scores of channels in the Y second convolution layers respectively, and pruning the channels in the Y second convolution layers according to the importance scores respectively.

According to the scheme, the channels are trimmed according to the importance scores, so that unimportant channels can be better screened and deleted, and the accuracy of the model is not reduced as much as possible.

With reference to the first aspect, in one possible implementation manner, pruning the channels in the Y second convolution layers according to the importance scores includes: sorting importance scores of channels of an ith second convolution layer of the Y second convolution layers; deleting X channels in the ith second convolution layer, wherein the importance scores of the X channels are lower than the importance scores of other channels in the ith second convolution layer, i is not less than 1 and not more than Y, i is an integer, X is not less than 1, and X is an integer.

According to the scheme, for each second convolution layer, the channels in the convolution layers are ordered according to the importance scores of the channels, unimportant channels are screened out and deleted, and the accuracy of the model can be kept as low as possible.

With reference to the first aspect, in one possible implementation manner, the ith second convolution layer includes Z channels, an importance score of the jth channel in the Z channels is determined according to an L1 norm or an L2 norm of a jth channel weight parameter, Z > X and Z are integers, 1+.ltoreq.j+.z and j is an integer.

Wherein, the L1 norm may also be called L1 value, that is, the sum of absolute values of all weight parameters in the channel; the L2 norm may also be referred to as the weight decay. The scheme is used for scoring importance of the channels in the convolution layer better.

With reference to the first aspect, in one possible implementation manner, the number of channels is increased with respect to the ith second convolution layer in the Y first convolution layers by yi, and x=yi.

According to the scheme, the number of channels of Y second convolution layers in the third neural network model, which is increased relative to Y first convolution layers in the second neural network model, is equal to the number of channels of Y third convolution layers in the first neural network model, which are deleted relative to Y second convolution layers in the third neural network model, so that the structure of the first neural network model is identical to that of the second neural network model.

With reference to the first aspect, in one possible implementation manner, yi is determined according to at least one of the following: the importance of the ith first convolution layer in all convolution layers of the second neural network model, the number of channels within the ith first convolution layer.

With reference to the first aspect, in one possible implementation manner, yi is greater than or equal to the number of channels in the ith first convolution layer And/or, yi is less than or equal to the number of channels in the ith first convolution layer.

In a second aspect, a method for optimizing an image processing model is provided, the method may include: respectively increasing the number of channels in Y first convolution layers in the second neural network model to obtain a third neural network model, wherein the second neural network model is used for image processing, and the third neural network model comprises Y second convolution layers, and the Y first convolution layers are in one-to-one correspondence with the Y second convolution layers; pruning is carried out on channels of Y second convolution layers in the third neural network model respectively to obtain a first neural network model, the first neural network model is used for image processing, the structure of the first neural network model is identical to that of the second neural network model, the first neural network model comprises Y third convolution layers, the Y third convolution layers are in one-to-one correspondence with the Y second convolution layers, Y is larger than or equal to 1, and Y is an integer.

In the scheme provided by the application, the third neural network model obtained by adding the number of channels to part or all of the convolution layers in the second neural network model is a model more complex than the second neural network model. The third neural network model processes the image more efficiently than the second neural network model, but the model complexity is higher and thus the inference speed is slower. And pruning is carried out on the third neural network model, so that the complexity of the third neural network model can be reduced on the premise of not obviously reducing the model precision, and the reasoning speed of the model can be improved. Therefore, by the method, the accuracy and effect of the model can be optimized on the premise that the performance of the model is not deteriorated.

It can be understood that the pruning processing for channels in the convolution layer according to the present application belongs to one kind of structured pruning, namely pruning convolution kernels of certain convolution layers in the neural network model, so as to reduce the number of channels in the convolution layers, and reduce the number of parameters of the model by simplifying the structure of the neural network model, thereby reducing the complexity of the neural network model and improving the reasoning speed of the neural network model.

Advantageous effects of possible implementations of the second aspect may be seen from the corresponding description in the first aspect.

With reference to the second aspect, in one possible implementation manner, pruning processing is performed on channels of Y second convolution layers in the third neural network model, including: respectively calculating importance scores of channels in Y second convolution layers; and pruning the channels in the Y second convolution layers respectively according to the importance scores.

With reference to the second aspect, in one possible implementation manner, pruning the channels in the Y second convolution layers according to the importance scores includes: sorting importance scores of channels of an ith second convolution layer of the Y second convolution layers; deleting X channels in the ith second convolution layer, wherein the importance scores of the X channels are lower than the importance scores of other channels in the ith second convolution layer, i is not less than 1 and not more than Y, i is an integer, X is not less than 1, and X is an integer.

With reference to the second aspect, in one possible implementation manner, the ith second convolution layer includes Z channels, an importance score of the jth channel in the Z channels is determined according to an L1 norm or an L2 norm of a jth channel weight parameter, Z > X and Z are integers, 1+.ltoreq.j+.z and j is an integer.

With reference to the second aspect, in one possible implementation manner, the number of channels is increased by Y with respect to the ith second convolution layer in the Y first convolution layers _i And x=y _i 。

With reference to the second aspect, in a possible implementation manner, y _i Is determined according to at least one of the following: the importance of the ith first convolution layer in all convolution layers of the second neural network model, the number of channels within the ith first convolution layer.

With reference to the second aspect, in a possible implementation manner, y _i Greater than or equal to the number of channels in the ith first convolution layerAnd/or, y _i Less than or equal to the number of channels in the ith first convolution layer.

With reference to the second aspect, in one possible implementation manner, increasing the number of channels in Y first convolution layers in the second neural network model, to obtain a third neural network model includes: respectively increasing the number of channels of each first convolution layer in the Y first convolution layers to obtain a fourth neural network model; training the fourth neural network model to converge based on a data set to obtain a third neural network model, wherein the data set is a data set for training the second neural network model.

With reference to the second aspect, in one possible implementation manner, pruning is performed on channels of Y second convolution layers in the third neural network model to obtain a first neural network model, where the pruning includes: pruning treatment is respectively carried out on Y channels of the second convolution layers in the third neural network model, so as to obtain a fifth neural network model; and fine-tuning the fifth neural network based on a data set to obtain a first neural network model, wherein the data set is used for training a second neural network model.

It will be appreciated that the first neural network model is structurally the same as the second neural network model, but the parameters of the models are different.

With reference to the second aspect, in one possible implementation manner, in a case that the structure of the second neural network model is adapted to the hardware structure of the electronic device, the structure of the first neural network model is adapted to the hardware structure of the electronic device.

In the scheme provided by the application, the obtained first neural network model is still adapted to the hardware platform of the mobile phone, so that the step of further optimizing the structure of the first neural network model or performing performance evaluation according to the hardware platform characteristics of the mobile phone is saved; therefore, the image processing method provided by the application can be suitable for more electronic equipment, the process of software and hardware adaptation is saved, the cost of model optimization is reduced, and meanwhile, the utilization rate of the chip is not influenced.

It can be appreciated that the structure of the second neural network model is designed according to the characteristics of the hardware platform deployed by the second neural network model in order to fully exploit the performance of the hardware platform of the electronic device. The second neural network model may be understood as the structure of the neural network model for image processing that is currently adapted by the hardware platform of the electronic device. In view of the fact that the first neural network model is structurally identical to the second neural network model, the current hardware platform of the electronic device still adapts to the first neural network model.

In a third aspect, the present application provides an electronic device comprising one or more processors and one or more memories; wherein the one or more memories are coupled to the one or more processors, the one or more memories being operable to store computer program code comprising computer instructions that, when executed by the one or more processors, cause the electronic device to perform the method as described in the first or second aspect and any possible implementation of the first or second aspect.

In a fourth aspect, embodiments of the present application provide a chip system applied to an electronic device, the chip system including one or more processors configured to invoke computer instructions to cause the electronic device to perform a method as described in the first aspect or the second aspect and any possible implementation of the first aspect or the second aspect.

In a fifth aspect, a computer readable storage medium is provided, in which instructions are stored which, when run on a computer, cause the computer to perform the method of optimizing an image processing model of the first or second aspect described above.

In a sixth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of optimizing an image processing model according to the first or second aspect above.

The technical effects obtained by the third, fourth, fifth and sixth aspects are similar to the technical effects obtained by the corresponding technical means in the first or second aspects, and are not described in detail herein.

Drawings

FIG. 1 is a schematic diagram showing an example of neural network channel pruning technique according to the present application;

FIG. 2 is a schematic diagram illustrating yet another example of neural network channel pruning techniques in accordance with the present application;

FIG. 3A is a schematic diagram showing a change curve of a loss function in a model training process provided by the present application;

fig. 3B and fig. 3C respectively show an output graph of the model #a and an output graph of the first neural network model provided in the present application in a scene of photographing in a night environment of the mobile phone;

FIG. 3D shows a partial contrast diagram of FIGS. 3C and 3B;

FIG. 4 is a schematic diagram of an optimization method of an image processing model provided in the present application;

FIGS. 5A and 5B are schematic diagrams illustrating another method of optimizing an image processing model provided herein;

FIG. 6 is a schematic diagram of an image processing method provided herein;

fig. 7 shows a schematic hardware structure of an electronic device according to an embodiment of the present application;

fig. 8 shows a block diagram of a software system of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be understood that reference herein to "a plurality" means two or more. In the description of the present application, "/" means or, unless otherwise indicated, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, for the purpose of facilitating the clear description of the technical solutions of the present application, the words "first", "second", etc. are used to distinguish between the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be noted that, the image processing method and the optimization method of the image processing model provided in the embodiments of the present application are applicable to any electronic device having a shooting function, for example, a mobile phone, a tablet computer, a camera, an intelligent wearable device, etc., which is not limited in this embodiment of the present application. In addition, the image processing method and the optimization method of the image processing model provided by the embodiment of the application can be applied to various shooting scenes, such as mobile phone shooting, smoke detection, face recognition and the like, and can be applied to other shooting scenes, and the embodiment of the application is not limited to the above.

Taking a mobile phone photographing scene as an example, the image can be obtained by photographing under a scene with extremely low illumination intensity, for example, the image is obtained by photographing under the condition that the illumination intensity is equal to or less than 5 Lux (Lux), and for example, the image is obtained by photographing under the night environment. Compared to an image captured under a normal light environment, when the electronic device captures an image at night or in a poor light environment, the electronic device can adjust the brightness of the captured image (i.e. auto exposure) by automatically adjusting the exposure time and the sensitivity (also called ISO value), which results in more serious noise of the image. Therefore, the electronic apparatus generally performs noise reduction processing and brightness adjustment processing on the image when shooting a night scene. The color noise is particularly easy to occur when shooting a night scene, and therefore, the processing for the color noise is very important in the noise reduction processing.

Image noise or image noise (image noise) is a random variation of brightness or color information (the photographed object itself is not present) in an image, and is usually a representation of electronic noise. Color noise or color noise (color noise for short) is one of image noise, and refers to a phenomenon that in digital image processing, random fluctuation occurs in some color channels in an image due to nonlinear characteristics of a sensor, noise and the like. This phenomenon is often manifested as colored noise or speckles. Image noise can degrade the sharpness of the photograph, and especially color noise can also interfere with color purity. As an example, in a photograph taken by a mobile phone in a night environment, an area not irradiated by the light, or an area having an illumination intensity equal to or less than 5 Lux (Lux), color patches or dots of other colors than colors seen by the naked eye of the area, which are color noise spots or dots, may appear. For example, blue, red colored noise may appear in the sky at night in a photograph, and purple or green colored noise may appear on a very poorly lit ground. In addition, the more colored noise or noise spots in the photograph, the lower the sharpness of the photograph.

At present, noise of an image, particularly color noise, is reduced mainly through a convolutional neural network model. In general, to enhance the effect of reducing noise of an image, it can be achieved by optimizing the accuracy and effect of a convolutional neural network model. However, optimizing the accuracy and effect of the convolutional neural network model may require increasing the complexity of the convolutional neural network model, which may result in performance degradation of the algorithm, and increasing the complexity of the convolutional neural network model may generally result in performance degradation of the algorithm, that is, the reasoning speed of the model is slower, the time for the electronic device to take a picture is longer, and the user experience of taking a picture is also degraded.

Similarly, convolutional neural networks can also be used in other scenarios where image processing is required, such as face recognition and security tasks. Taking the face recognition scene as an example, the accuracy of face detection and recognition can be improved by optimizing the precision and effect of the convolutional neural network model. However, optimizing the accuracy and effect of the convolutional neural network model may require increasing the complexity of the convolutional neural network model, which may result in performance degradation of the algorithm, and increasing the complexity of the convolutional neural network model may generally result in performance degradation of the algorithm, that is, the reasoning speed of the model is slow, the time required for face recognition is longer, and the experience of the user for face recognition is also poor. Taking smoke and fire detection scenes in security tasks as an example, the accuracy of detecting and identifying smoke and fire can be improved, and the smoke and fire detection scenes can be realized by optimizing the accuracy and effect of a convolutional neural network model. However, optimizing the accuracy and effect of the convolutional neural network model may require increasing the complexity of the convolutional neural network model, which may result in performance degradation of the algorithm, and increasing the complexity of the convolutional neural network model may generally result in performance degradation of the algorithm, i.e., the reasoning speed of the model is slow, the time required for smoke and fire detection is longer, and the timely requirements of security tasks may not be satisfied.

In the optimization method of the image processing model, the number of channels of part or all convolution layers in the second neural network model for image processing is expanded to obtain a model more complex than the first neural network, and then the model is trained to be converged to obtain a third neural network model. Then carrying out structural pruning on the third neural network model to obtain a first neural network model; and, the first neural network model has the same structure as the second neural network model.

It can be appreciated that the structure of the second neural network model is designed according to the characteristics of the hardware platform deployed by the second neural network model in order to fully exploit the performance of the hardware platform of the electronic device. The second neural network model may be understood as the structure of the neural network model for image processing that is currently adapted by the hardware platform of the electronic device. In view of the same structure of the first neural network model and the second neural network model, the current hardware platform of the electronic device still adapts to the first neural network model.

It can be further understood that the neural network channel pruning technology related to the application belongs to one of structured pruning, namely pruning convolution kernels of certain convolution layers in the neural network model to reduce the number of channels of the convolution layers, and reducing the parameter number of the model by simplifying the structure of the neural network model, so that the complexity of the neural network model is reduced, and the reasoning speed of the neural network model is improved. The number of channels of a certain convolution layer is equal to the number of convolution kernels of the convolution layer, and the channel pruning referred to in the application can be understood as the channel pruning of the convolution layer or the convolution kernel pruning the convolution layer. In the process of realizing neural network channel pruning, importance evaluation is required to be carried out on each channel in a convolution layer to be pruned, and then channel pruning is carried out according to the order from small importance to large importance. The importance evaluation of each channel in the convolution layer to be trimmed can be understood as judging the importance of each channel in the convolution layer according to certain standards; the importance of a channel may be determined, for example, from the L1 norm or the L2 norm of the channel. Wherein, the L1 norm may also be called L1 value, that is, the sum of absolute values of all weight parameters in the channel; the L2 norm may also be referred to as the weight decay. The neural network channel pruning technique according to the present application is described in further detail below with reference to fig. 1 and 2.

Fig. 1 is a schematic diagram showing an example of the neural network channel pruning technique according to the present application. One convolution layer shown in FIG. 1 comprises n convolution kernels, namely convolution kernels #1 to #n, wherein n is equal to or greater than 3 and n is an integer; each convolution kernel has several weight parameters. Taking importance evaluation according to the L1 value of each convolution kernel as an example, the L1 values of n convolution kernels are arranged from small to large, the first m convolution kernels are trimmed, m is more than or equal to 1 and less than n, and m is an integer. Assuming that m=2, after the L1 values of the n convolution kernels are arranged from small to large, the convolution kernels #1 and #n are the first 2 convolution kernels, and need to be pruned.

Fig. 2 is a schematic diagram illustrating still another example of the neural network channel pruning technique according to the present application. As shown in fig. 2, channel pruning is performed on the neural network model #1 to obtain a neural network model #2. Suppose channels of convolutional layers #2 through #4 in neural network model #1 are pruned. Specifically, convolutional layer #2, convolutional layer #3, and convolutional layer #4 of neural network model #1 have 64, 96, 128 channels, respectively; after 16, 32, 48 channels are trimmed for convolutional layer #2, convolutional layer #3, and convolutional layer #4, respectively, convolutional layer #2, convolutional layer #3, and convolutional layer #4 of neural network model #2 have 48, 65, 80 channels, respectively. In addition, in the neural network model #1 and the neural network model #2, the number of channels of the convolution layer that is not trimmed is the same. Compared with the neural network model #1, the neural network model #2 subjected to channel pruning is lower in structural complexity and better in performance.

In a possible related art, the second neural network model for image processing is directly trained to obtain a model (hereinafter referred to as model #a for convenience of explanation) with better image processing effect. The optimization method of the image processing model provided by the application firstly trains the neural model after the channel expansion of the second neural network model, and trains the third neural network model with a larger convergent structure. And then, inheriting a part of weight parameters from the third neural network model with a larger structure which is trained and converged by a pruning method to obtain a fifth neural network model, and fine-tuning the fifth neural network model to obtain a first neural network model with the same structure as the second neural network model. Compared with the related art, the optimization method of the image processing model provided by the application has the advantages that the parameters in the neural network model are subjected to weight inheritance, and finally the accuracy and the effect of the obtained model are more excellent, and the optimization method is described below with reference to fig. 3A to 3D.

FIG. 3A is a schematic diagram showing a change curve of a loss function in a model training process provided by the application. As shown in fig. 3A, the horizontal axis is training round (epoch) and the vertical axis is the value of the loss function (loss); the value of the corresponding loss function for each training round is the value of the average loss function for all iterations within a single training round. The loss function #2 is a loss function in the training process of directly training the second neural network model to obtain the model # a in the related art, and the curve pointed by the loss function #2 in fig. 3A is a change curve of the numerical value of the loss function # 2. The loss function #1 is a loss function in the training process of obtaining the first neural network model based on the optimization method of the image processing model provided by the application, and a curve pointed by the loss function #1 in fig. 3A is a change curve of the numerical value of the loss function # 1.

It will be appreciated that the loss function is used to measure the accuracy of the model. The smaller the value of the corresponding loss function of a model, the better the model converges on the dataset, indicating that the smaller the difference between the model output and the true value, i.e. the higher the accuracy of the model. As can be seen from fig. 3A, the loss function #1 is reduced by about 10% from the value of the loss function # 2. Therefore, the accuracy and effect of the first neural network model obtained by the optimization method of the image processing model are superior to those of the model #a.

In the image processing method provided by the application, after the electronic equipment acquires the image, the image is processed based on the first neural network model provided by the application, and then the processed image is output. For a specific implementation, reference may be made to the description of method 300 below.

Fig. 3B and fig. 3C respectively show an output graph of the model #a and an output graph of the first neural network model provided in the present application in a scene where the mobile phone photographs in a night environment. Specifically, after the electronic device acquires the image, the image is processed based on the model #a and the first neural network model provided by the application, the output processed image is subjected to the same subsequent processing, and the finally output image is shown in fig. 3B and fig. 3C respectively. For example, the subsequent processing herein may include highlighting, sharpening, color correction, etc., but the effect of these subsequent processes on the sharpness and noise of the final output map is almost negligible compared to the effect of model #a or the third neural network on sharpness and noise. It should be noted that only the gray scale is given here to assist the explanation, and the effect contrast of the color image is more visual in reality. As shown in fig. 3B and 3C, the output graph includes the night sky, the building, the stones, the soil, the stone road, and the trees. In comparison to fig. 3B, both color noise and sharpness are significantly optimized in fig. 3C. Fig. 3D shows a partial contrast of fig. 3C and 3B. As shown in fig. 3D, the upper right corner regions of fig. 3C and 3B are selected for comparison, respectively. Specifically, as shown in the area in the two solid circles in fig. 3D, many noise points with different gray levels still exist in the solid circles corresponding to fig. 3B, and the gray levels in the solid circles corresponding to fig. 3C are almost the same. This illustrates that when fig. 3B and 3C are presented as color images, the noise of the color in the sky is significantly reduced and the color in the dark area is more uniform compared to fig. 3B, so that the color blocks are significantly reduced and the sky at night is cleaner, thus the color noise is significantly reduced. Specifically, as shown in fig. 3D, the fine branches and the texture on the trunk are clearer in the area circled by the dotted line in fig. 3C than in the area circled by the dotted line in fig. 3B. This illustrates that when fig. 3B and 3C are presented as color images, the sharpness of fig. 3C is significantly improved compared to fig. 3B. Therefore, the optimization method of the image processing model can reduce the noise point of the night scene photographing picture and improve the definition, and meanwhile, the structure of the model is kept unchanged, namely, the noise point and the definition of a night scene photographing algorithm are optimized on the premise that the performance of the model is not deteriorated.

Taking the face recognition scene as an example, compared with the output graph of the model #a, in the output graph of the first neural network model of the application, the recognition and the positioning of each part of the face and the micro expression corresponding to each part are more accurate, so that the face recognition success rate can be improved, the model performance is not deteriorated, and the face recognition time is not increased.

Taking smoke and fire detection in a security task scene as an example, compared with an output graph of a model #a, in the output graph of the first neural network model of the application, the recognition of whether smoke or fire source exists is more accurate, and the positioning of the smoke or fire source is more accurate under the condition of existence of the smoke or fire source, so that the smoke and fire recognition success rate can be improved, the model performance is not deteriorated more, and the time for finding dangerous cases is not delayed.

The optimization method of the image processing model provided by the application can have an alternative scheme. Training a sixth neural network model with a more complex structure than the second neural network model until convergence, and then carrying out structured pruning to obtain a seventh neural network model with a more simplified structure. The sixth neural network model is not obtained by expanding the second neural network model through a channel, so that the sixth neural network model is different from the sixth neural network model in structure; the sixth neural network model may be a model for image processing obtained by other means. Also, the seventh neural network model is likely to be different in structure from the second neural network model. This alternative can improve the effect of image processing to some extent. However, taking a mobile phone in practical application as an example in a night environment, as described above, the structure of the second neural network model is designed according to the characteristics of the hardware platform deployed by the second neural network model in order to fully play the performance of the hardware platform of the electronic device. The structure of the first neural model referred to in the present application is designed according to the characteristics of the deployed hardware platform, or the structure of the first neural model is adapted to the hardware structure of the electronic device, which can be understood by referring to the following examples: when a chip in the electronic equipment runs the convolutional neural network model, the chip allocates resources for processing channels for each convolutional layer in the convolutional neural network model, and the number of channels which can be used for the resources is an integer multiple of a numerical value (for example, G is represented by G, G is more than or equal to 1, and G is an integer). G is determined by the hardware architecture design of the chip, e.g. G here may be 16, 64 or 48. For example, if the number of channels in a certain convolution layer is G, then the chip invokes the resources corresponding to the G channels to process the channels in the convolution layer. For another example, if the number of channels in a certain convolution layer is 2G, then the chip invokes the resources corresponding to the 2G channels to process the channels in the convolution layer. For another example, if the number of channels in a certain convolution layer is G ', and G' < G, the chip still calls the resources corresponding to G channels to process the channels in the convolution layer, and in this case, the utilization of the chip may be reduced. Under the condition that the number of channels of each convolution layer in the second neural network model is an integer multiple of G, it can be understood that the structure of the first neural model is designed according to the characteristics of the deployed hardware platform, or that the structure of the first neural model is matched with the hardware structure of the electronic device, and under the condition, the utilization rate of the chip can be ensured as high as possible.

Therefore, if the seventh neural network model is different from the second neural network model in structure, that is, the structure of the model for image processing is adjusted, even if the seventh neural network model structure is equivalent to the structure of the second neural network model in complexity, the performance benefit of the second neural network model may not be obtained, and further optimization or performance evaluation of the structure of the seventh neural network model according to the hardware platform characteristics of the mobile phone is required. Therefore, compared with the alternative scheme, the first neural network model obtained by the optimization method of the image processing model still adapts to the hardware platform of the mobile phone, and the step of further optimizing the structure of the first neural network model or performing performance evaluation according to the hardware platform characteristics of the mobile phone is saved; therefore, the method is suitable for more electronic equipment, and software and hardware adaptation processes are saved.

Thus, in the image processing method provided by the application, after the electronic device acquires the image, the electronic device processes the image based on the first neural network model obtained according to the second neural network model. Under the condition that the structure of the second neural network model is matched with the hardware structure of the electronic equipment, the software and hardware adaptation process can be saved, the model optimization cost is reduced, and meanwhile, the utilization rate of the chip is not influenced.

As can be seen from the description related to fig. 3A, in the method for optimizing an image processing model provided in the present application, compared with the effect of processing an image by a second neural network model, the effect of processing an image by a first neural network model is improved, which is not or not only realized by training the neural network model obtained in the optimization process, but also realized by a technical means that the number of channels is first expanded in the optimization method, and then the number of channels which is the same as the number of channels expanded is trimmed by the structural pruning. Meanwhile, the effect of the first neural network model on processing the image is improved, and meanwhile, performance is not deteriorated, and the effect is brought by the technical means. In other words, the optimization method of the image processing model provided by the application is not an abstract model algorithm itself, but is not a means for improving the algorithm simply by training the model, but is a technical means for optimizing the trained second neural network model. Accordingly, the technical effect of the optimization method of the image processing model provided by the application is brought by the technical means. Furthermore, according to the image processing method provided by the application, the first neural network model is applied to the electronic equipment, on one hand, the effect of image processing of the electronic equipment is improved due to the technical means corresponding to the optimization method of the first neural network model, and the time of image processing is not prolonged. On the other hand, the optimization method of the image processing model is combined with the hardware characteristics of the electronic equipment, so that the number of channels is expanded firstly, and then the technical means of pruning the number of channels which is the same as the number of the channels expanded through the structural pruning brings other technical effects, such as saving the process of software and hardware adaptation, reducing the cost of model optimization, and ensuring that the utilization rate of chips is not influenced.

The following describes in detail the optimization method of the image processing model provided in the present application with reference to fig. 4, 5A and 5B. Fig. 4 shows a schematic diagram of an optimization method 100 of an image processing model provided in the present application. As shown in fig. 4, the method 100 includes S101 to S102.

S101, respectively increasing the number of channels in Y first convolution layers in the second neural network model to obtain a third neural network model.

The second neural network model is used for image processing, the third neural network model comprises Y second convolution layers, and the Y first convolution layers and the Y second convolution layers are in one-to-one correspondence. It is understood that the Y first convolution layers are some or all of the convolution layers in the second neural network model.

In one possible example of S101, increasing the number of channels of each of the Y first convolution layers, respectively, to obtain a fourth neural network model; training the fourth neural network model to converge based on a data set to obtain a third neural network model, wherein the data set is a data set for training the second neural network model.

It is understood that in the present application, in the same scenario, the data sets used for training different neural network models are all the same data set. Wherein the dataset comprises input data and target data. Taking the face recognition scenario as an example, the input data may be image data obtained by processing an image acquired by a camera of the electronic device, for example, may be in an original (RAW) format or a red, green and blue (RGB) format. The target data is coordinate data of five sense organs on the face in the image, such as coordinates of eyes and corners of mouth. Taking a mobile phone photographing scene as an example, the input data may be image data #1 obtained by processing an image #1 acquired by a camera of an electronic device, for example, may be an original (RAW) format or a red, green and blue (RGB) format, where the image data #1 is used to indicate all pixels of the image # 1. The target data is image data #2, which image data #2 is used to indicate all pixels of image #2, and image #2 is substantially identical to the content displayed by image #1, except that image #2 is higher in sharpness and fewer in noise relative to image # 1.

Illustratively, an ith second convolution layer of the Y second convolution layers increases by a number Y relative to an ith first convolution layer of the Y first convolution layers _i 。

Specifically, y _i Determining factors include, but are not limited to, any one or more of the following: the importance of the ith first convolution layer in all convolution layers of the second neural network model, the number of channels within the ith first convolution layer.

Illustratively, y _i Greater than or equal to the number of channels in the ith first convolution layerAnd/or, y _i Less than or equal to the number of channels in the ith first convolution layer. Illustratively, for convolutional layers of higher importance in the first neural network, the number of channels that can be increased appropriately; for convolutional layers of lower importance in the first neural network, the number of channels that can be increased can be suitably reduced.

S102, pruning is carried out on channels of Y second convolution layers in the third neural network model respectively, and a first neural network model is obtained.

The structure of the first neural network model is the same as that of the second neural network model, the first neural network model comprises Y third convolution layers, the Y third convolution layers and the Y second convolution layers are in one-to-one correspondence, Y is more than or equal to 1, and Y is an integer.

Pruning is carried out on the channels of the Y second convolution layers in the third neural network model respectively, which can be understood as calculating importance scores of the channels of the Y second convolution layers respectively; and pruning the channels in the Y second convolution layers respectively according to the importance scores.

Illustratively, the importance scores of the channels of the ith second convolutional layer of the Y second convolutional layers are ordered; and pruning X channels in the ith second convolution layer, wherein the importance scores of the X channels are lower than the importance scores of other channels in the ith second convolution layer, i is more than or equal to 1 and less than or equal to Y, i is an integer, X is more than or equal to 1, and X is an integer.

Illustratively, x=y _i That is, pruning is performed for the convolution layers whose number of channels has been increased in S101, and pruning is not performed for the convolution layers whose number of channels has not been increased in S101.

The ith second convolution layer comprises Z channels, the importance scores of the jth channels in the Z channels are determined according to the L1 norm or the L2 norm of the jth channel weight parameter, Z is more than X, Z is an integer, 1 is less than or equal to j is less than or equal to Z, and j is an integer.

In one possible example of S102, pruning is performed on channels of Y second convolution layers in the third neural network model, to obtain a fifth neural network model; and fine-tuning the fifth neural network based on a data set to obtain a first neural network model, wherein the data set is used for training a second neural network model.

According to the embodiment of the application, the channels of the part or all of the convolution layers in the second neural network model for image processing are expanded, namely the number of the channels of the part or all of the convolution layers is increased, so that a large model with more channels and more complex structures is obtained, then the large model is trained to be converged, a third neural network model is obtained, channel structured pruning is carried out based on the third neural network model, channels with lower importance in the part or all of the convolution layers are pruned according to importance evaluation, the number of the increased channels is the same as the number of the pruned channels, a first neural network model is obtained, and the structure of the first neural network model is the same as that of the second neural network model. Through the scheme, the second neural network model is optimized to obtain the first neural network model, and the accuracy and effect of the model are obviously improved. And compared with the mode that the second neural network model is directly trained to obtain a more complex structure, the first neural network model keeps unchanged from the structure of the second neural network model, and the accuracy and effect of the model can be optimized on the premise that the performance of the model is not deteriorated.

Fig. 5A and 5B are schematic diagrams illustrating a method 200 for optimizing an image processing model provided in the present application. Method 200 may be understood as a specific example of method 100. As shown in fig. 5A, the method 200 includes S201 to S205.

S201 and S202 may be taken as one possible example of S101.

S201, selecting part of convolution layers to be added with a certain quantity of convolution kernels for the neural network model M to be processed, and obtaining a model structure M with more parameters and more complex structures.

Wherein, it is assumed that the convolution layer in the neural network model m to be processed is l= [ L ] ₁ ,L ₂ ,L ₃ ,……，L _K-1 ,L _K ]The number of convolution kernels (i.e., the number of channels) is N _m ＝[N ₁ ,N ₂ ,N ₃ ,……,N _K-1 ,N _K ]. It can be understood that the neural network model m has K first convolution layers, K is greater than or equal to 1, K is an integer, and the number of convolution kernels corresponding to the ith first convolution layer is N _m N of (a) _mi I is more than or equal to 1 and less than or equal to K, and i is an integer. The number of convolution kernels increased by the K first convolution layers is n= [ n ] ₁ ,n ₂ ,n ₃ ,……,n _K-1 ,n _K ]I.e. the ith first convolution layer increases the number of convolution kernels by n _i . It can be understood that Y first convolution layers in the K first convolution layers are used for channel expansion, Y is more than or equal to 1 and less than or equal to K, and Y is an integer. When y=k, n ₁ To n _k Are all greater than 0; when Y is less than K, n of the K first convolution layers which do not carry out channel expansion _i ＝0。

Thus, the convolution layer of the neural network model M is l= [ L ₁ ,L ₂ ,L ₃ ,……，L _K-1 ,L _K ]The number of convolution kernels of the neural network model M is N _M ＝[N ₁ +n ₁ ,N ₂ +n ₂ ,N ₃ +n ₃ ,……,N _K-1 +n _K-1 ,N _K +n _K ]. It can be understood that the neural network model M has K second convolution layers, K is more than or equal to 1, K is an integer, and the number of convolution kernels corresponding to the ith second convolution layer is N _M In (a) and (b)I.e. N _i +n _i I is more than or equal to 1 and less than or equal to K, and i is an integer. It will be appreciated that K second convolution layers are in one-to-one correspondence with K first convolution layers, and that Y second convolution layers are also in one-to-one correspondence with Y first convolution layers.

Exemplary, as shown in FIG. 5B, L of model m ₁ The number of convolution kernels is 32, n ₁ L of model M =16 ₁ The number of convolution kernels is 48; l of model m ₂ The number of convolution kernels is 64, n ₂ L of model M =32 ₂ The number of convolution kernels is 96; l of model m ₃ The number of convolution kernels is 64, n ₃ L of model M =32 ₃ The number of convolution kernels is 96; l of model m ₄ The number of convolution kernels is 96, n ₄ L of model M =48 ₄ The number of convolution kernels is 144; l of model … … m _K-1 The number of convolution kernels is 32, n _K-1 L of model M =48 _K-1 The number of convolution kernels is 32; l of model m _K The number of convolution kernels of 3, n _K L of model M =0 _K The number of convolution kernels is 3.

S202, training the neural network model M until convergence is achieved on the data set, and training the converged neural network model to be M'.

Wherein the data sets used in training the model in method 200 are all the same data set.

Exemplary, as shown in FIG. 5B, L of model M ₁ The number of convolution kernels is 48; l of model M ₂ The number of convolution kernels is 96; l of model M ₃ The number of convolution kernels is 96; l of model M ₄ The number of convolution kernels is 144; l of model M' … … _K-1 The number of convolution kernels is 32; l of model M _K The number of convolution kernels is 3.

It is understood that the neural network models M, M' in S201 and S202 correspond to the second neural network model, the third neural network model in S101, respectively.

S203 to S205 may be one possible example of S102.

S203, for the channels added in the neural network model MAll channels in the second convolution layer are subjected to importance assessment, and L is calculated _i L1 value for each channel in the set.

Wherein the second convolution layer with added channels is n _i >Second convolution layer L of 0 _i Or Y second convolution layers or the second convolution layers expanded by the channels.

Let n be _i >Second convolution layer L of 0 _i The importance of each channel of (a) is thatIt will be appreciated that the ith second convolutional layer L _i The number of channels in the inner part is- >At->Among the channels, the j-th channel has an importance of p _j Wherein->And j is an integer.

S204, for n in the neural network model M _i >Second convolution layer L of 0 _i For the second convolution layer L _i Importance P of the individual channels within _i According to the importance p _j The corresponding channels are trimmed from the order of small to large until the number of trimmed channels reaches n _i . For all n _i >Second convolution layer L of 0 _i After the channel pruning is completed, the neural network model is obtained as m'.

Exemplary, as shown in FIG. 5B, L of model M ₁ Trimming the first 16 channels in the order from small to large in importance; l to model M ₂ Trimming off the first 32 channels in the order from small to large in importance; l to model M ₃ Trimming off the first 32 channels in the order from small to large in importance; l to model M ₄ The first 48 channels … … pairs of models M 'in the order of importance from small to large are trimmed off'L of (2) _K-1 The trimming channel is not needed; l to model M _K No trimming of the channels is required.

It is understood that the structure of the model m ' is the same as that of the model m, that is, the number of convolution layers of the model m ' and the model m is K, and the number of channels of the ith convolution layer of the model m ' is the same as that of the ith convolution layer of the model m. Through the above processing procedure, the channel parameters of each convolution layer of the model m' and the model m are different.

Exemplary, as shown in FIG. 5B, L of model m ₁ The number of convolution kernels is 32; l of model m ₂ The number of convolution kernels is 64; l of model m ₃ The number of convolution kernels is 64; l of model m ₄ The number of convolution kernels is 96; l of model m' … … _K-1 The number of convolution kernels is 32; l of model m _K The number of convolution kernels is 3.

S205, model m' is fine tuned (trained) on the dataset, i.e. trained to converge.

It is understood that the neural network model M' in S203 to S205 corresponds to the third neural network model, the first neural network model in S101, respectively.

Subsequently, the electronic apparatus performs image processing using the neural network model m', and the effects that can be achieved can be seen in particular from the description related to the description of fig. 3A to 3D.

Fig. 6 shows a schematic diagram of an image processing method 300 provided herein. As shown in fig. 6, the method 300 includes S301 to S302. The method 300 is performed by an electronic device.

S301, acquiring an image.

It may be appreciated that the electronic device may acquire the image processed by the first neural network model in a plurality of manners, for example, the electronic device may capture the image by an image capturing device (such as a camera); as another example, the electronic device may download images, etc.; as another example, an electronic device may interact with other electronic devices through some applications for information such as images. The present application is not limited in this regard.

S302, processing the image based on a first neural network model.

The first neural network model is obtained according to a second neural network model, the structure of the second neural network model is matched with the hardware structure of the electronic equipment, and the structure of the first neural network model is identical with that of the second neural network model.

S303, outputting the image processed by the first neural network model.

As one example of the method 300, after the electronic device acquires the image, image data (hereinafter referred to as image data #1 for convenience of explanation) is obtained from the image, and the image data #1 is used to indicate all pixels in the image. For example, the image data #1 may be in an original (RAW) format or a red, green, and blue (RGB) format. Inputting the image data #1 into a first neural network model, specifically, carrying out convolution calculation on the image data #1 by an ith third convolution layer in the Y third convolution layers, and outputting an ith calculation result; and (3) inputting the ith calculation result into the (i+1) th third convolution layer after calculation of other non-convolution layers, wherein i is more than or equal to 1 and less than or equal to Y-1, and i is an integer. It can be understood that, after each convolution layer in the neural network model carries out convolution calculation on the feature map input into the convolution layer, the calculation result is calculated by other non-convolution layers and then is input into the next convolution layer, and finally the processed image is output by the model. In one possible implementation, the image processed by the first neural network model may be image data in RAW format or RGB format (hereinafter referred to as image data #2 for convenience of description).

In the mobile phone photographing scene, the pixels in the image processed by the first neural network model are in one-to-one correspondence with the pixels indicated by the image data #1, or the pixels indicated by the image data #2 are in one-to-one correspondence with the pixels indicated by the image data #1, for example, the image output by the image data #2 after a series of subsequent processing is improved in definition and reduced in noise relative to the image acquired by the electronic device; in the face recognition scene, the image data #2 is coordinate data of the five sense organs of the person in the image composed of all pixels of the image data # 1; in the smoke detection scene, the image data #2 is coordinate data of smoke and fire in an image composed of all pixels of the image data # 1.

In particular, the method of deriving the first neural network model from the second neural network model may be found in the relevant description of the optimization methods 100 and 200 of the image processing model hereinabove.

In the image processing method provided by the application, after the electronic device acquires the image, the electronic device processes the image based on the first neural network model obtained according to the second neural network model. And performing channel expansion on the second neural network model to obtain a third neural network model which is more complex than the second neural network model. The third neural network model processes the image more efficiently than the second neural network model, but the model complexity is higher and thus the inference speed is slower. And pruning is carried out on the third neural network model, so that the complexity of the third neural network model can be reduced on the premise of not obviously reducing the model precision, and the reasoning speed of the model can be improved. Therefore, by the method, the image processing effect can be improved under the condition that the time for processing the image by the electronic equipment is not prolonged. In addition, since the structure of the second neural network model is adapted to the hardware structure of the electronic device, and the structure of the first neural network model is the same as the structure of the second neural network model, the structure of the first neural network model is also adapted to the hardware structure of the electronic device. According to the scheme, the software and hardware adaptation process can be saved, the cost of model optimization is reduced, and meanwhile, the utilization rate of the chip is not affected.

Referring to fig. 7, fig. 7 shows a schematic hardware structure of an electronic device 1000 according to an embodiment of the present application. Referring to fig. 7, the electronic device 1000 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, a user identification module (subscriber identification module, SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 1000. In other embodiments of the present application, electronic device 1000 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-networkprocessing unit, NPU), etc.

The controller may be a neural hub and a command center of the electronic device 1000, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces, such as may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universalasynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processorinterface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing functions of electronic device 1000. The processor 110 and the display 194 communicate via the DSI interface to implement the display functionality of the electronic device 1000.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and does not limit the structure of the electronic device 1000. In other embodiments of the present application, the electronic device 1000 may also employ different interfacing manners in the foregoing embodiments, or a combination of multiple interfacing manners.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.

The wireless communication function of the electronic device 1000 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 1000 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. Such as: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 1000. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local areanetworks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 1000. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module.

The electronic device 1000 implements display functions through a GPU, a display screen 194, and an application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light emittingdiode (AMOLED), a flexible light-emitting diode (flex), a mini, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 1000 may include 1 or N display screens 194, N being an integer greater than 1.

The electronic device 1000 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 1000 may include 1 or N cameras 193, N being an integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 1000 is selecting a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, etc.

Video codecs are used to compress or decompress digital video. The electronic device 1000 may support one or more video codecs. Thus, the electronic device 1000 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, such as referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the electronic device 1000 can be realized through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 1000. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. Such as storing files of music, video, etc. in an external memory card.

The internal memory 121 may be used to store computer-executable program code that includes instructions. The processor 110 executes various functional applications of the electronic device 1000 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created by the electronic device 1000 during use (e.g., audio data, phonebook, etc.), and so forth. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The electronic device 1000 may implement audio functions such as music playing, recording, etc. through the audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, and application processor, etc. The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys or touch keys. The electronic device 1000 may receive key inputs, producing key signal inputs related to user settings of the electronic device 1000 as well as function controls. The motor 191 may generate a vibration cue. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc. The SIM card interface 195 is used to connect a SIM card.

The sensor module 180 may include 1 or more sensors, which may be of the same type or different types. It is to be understood that the sensor module 180 shown in fig. 7 is only an exemplary division, and other divisions are possible, which are not limited in this application.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. When a touch operation is applied to the display screen 194, the electronic apparatus detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions.

The gyro sensor 180B may be used to determine a motion gesture of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (i.e., x, y, and z axes) may be determined by the gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device in various directions (typically three axes). The magnitude and direction of gravity can be detected when the electronic device is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device may measure the distance by infrared or laser. In some embodiments, the scene is photographed and the electronic device can range using the distance sensor 180F to achieve quick focus.

The touch sensor 180K, also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device at a different location than the display 194.

The air pressure sensor 180C is used to measure air pressure. The magnetic sensor 180D includes a hall sensor. The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The electronic device uses a photodiode to detect infrared reflected light from nearby objects. The ambient light sensor 180L is used to sense ambient light level. The fingerprint sensor 180H is used to acquire a fingerprint. The temperature sensor 180J is for detecting temperature. The bone conduction sensor 180M may acquire a vibration signal.

Next, a software system of the electronic apparatus 1000 will be described.

By way of example, the electronic device 1000 may be a cell phone. The software system of the electronic device 1000 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In this embodiment, a software system of the electronic device 1000 is exemplarily described by taking an Android (Android) system with a hierarchical architecture as an example.

Fig. 8 shows a block diagram of a software system of an electronic device 1000 according to an embodiment of the present application. Referring to fig. 8, the hierarchical architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run time) and system layer, a kernel layer and a hardware abstraction layer (Hardware Abstraction Layer, HAL), respectively.

The application layer may include a series of application packages. As shown in fig. 8, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.

The application framework layer provides an application programming interface (application programminginterface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions. As shown in fig. 8, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like. The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data, which may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc., and make such data accessible to the application. The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to construct a display interface for an application, which may be comprised of one or more views, such as a view that includes displaying a text notification icon, a view that includes displaying text, and a view that includes displaying a picture. The telephony manager is used to provide communication functions of the electronic device 1000, such as management of call status (including on, off, etc.). The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like. The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. For example, a notification manager is used to inform that the download is complete, a message alert, etc. The notification manager may also be a notification that appears in the system top status bar in the form of a chart or a scroll bar text, such as a notification of a background running application. The notification manager may also be a notification that appears on the screen in the form of a dialog window, such as a text message being prompted in a status bar, a notification sound being emitted, the electronic device vibrating, a flashing indicator light, etc.

Android runtimes include core libraries and virtual machines. Android run time is responsible for scheduling and management of the Android system. The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules, such as: surface manager (surface manager), media library (MediaLibraries), three-dimensional graphics processing library (e.g., openGL ES), 2D graphics engine (e.g., SGL), etc. The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications. Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as: MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc. The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. The 2D graphics engine is a drawing engine for 2D drawing.

A Hardware Abstraction Layer (HAL) is an interface layer located between the operating system kernel and upper layer software, which aims at abstracting the hardware. The hardware abstraction layer is a device kernel driven abstraction interface for enabling application programming interfaces that provide higher level Java API frameworks with access to the underlying devices. HAL contains a plurality of library modules such as cameras, display screens, bluetooth, audio, etc. Wherein each library module implements an interface for a particular type of hardware component. When the system framework layer API requires access to the hardware of the portable device, the Android operating system will load the library module for that hardware component.

The kernel layer is a layer between hardware and software. The kernel layer at least comprises camera drivers, processor drivers, display drivers, audio drivers and other device drivers. The device driver is an interface between the I/O system and related hardware for driving the corresponding hardware device.

It should be noted that, the software structure schematic diagram of the electronic device shown in fig. 8 provided in the present application is only an example, and is not limited to specific module division in different layers of the Android operating system, and the description of the software structure of the Android operating system in the conventional technology may be referred to specifically.

The workflow of the software and hardware of the electronic device 1000 is illustrated below in conjunction with a mobile phone taking a photo of a scene in a night environment.

As one possible example of S301: when the touch sensor receives a touch operation, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the touch operation into the original input event (including information such as touch coordinates, time stamp of touch operation, etc.). The original input event is stored at the kernel layer. The application framework layer acquires an original input event from the kernel layer, and identifies a control corresponding to the original input event. Taking the touch operation as a clicking operation, taking a control corresponding to the clicking operation as a control of a camera application icon as an example, calling an interface of an application program framework layer by a camera application, starting the camera application, then calling a kernel layer to start a camera driver, and capturing an image through the camera. As a possible example of S302, after capturing an image by the camera, the camera application may also call the processor through the HAL layer, and perform image processing on the image captured by the camera through the neural network processor in the processor. The neural network processor performs image processing by using the first neural network model provided by the application. As one possible example of S303, the image processed by the first neural network model is input to another neural model in the processor for subsequent processing.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, data subscriber line (digitalsubscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium such as a floppy disk, a hard disk, a magnetic tape, an optical medium such as a digital versatile disk (digital versatiledisc, DVD), or a semiconductor medium such as a Solid State Disk (SSD), etc.

The above embodiments are not intended to limit the present application, and any modifications, equivalent substitutions, improvements, etc. within the technical scope of the present disclosure should be included in the protection scope of the present application.

Claims

1. An image processing method applied to an electronic device, comprising:

acquiring an image;

processing the image based on a first neural network model;

outputting an image processed by the first neural network model;

wherein the first neural network model is derived from a second neural network model for image processing, the second neural network model having a structure that is adapted to a hardware structure of the electronic device, the first neural network model having a structure that is identical to the structure of the second neural network model, wherein,

the first neural network model is derived from a second neural network model, comprising:

the first neural network model comprises Y third convolution layers, the second neural network model comprises Y first convolution layers, the first neural network model is obtained by pruning channels of Y second convolution layers in the third neural network model, and the Y third convolution layers are in one-to-one correspondence with the Y second convolution layers;

The third neural network model is obtained by respectively increasing the number of channels in Y first convolution layers in the second neural network model, the Y first convolution layers and the Y second convolution layers are in one-to-one correspondence, Y is more than or equal to 2, and Y is an integer.

2. The method of claim 1, wherein the processing the image based on the first neural network model comprises:

obtaining image data according to the image, wherein the image data is used for indicating all pixels in the image;

inputting the image data into the first neural network model;

performing convolution calculation on the image data by an ith third convolution layer in the Y third convolution layers, and outputting an ith calculation result;

and (3) inputting the ith calculation result into an (i+1) th third convolution layer after calculation of other non-convolution layers, wherein i is more than or equal to 1 and less than or equal to Y-1, and i is an integer.

3. The method of claim 1, wherein the first neural network model is obtained by pruning channels of Y second convolutional layers in a third neural network model, respectively, comprising:

the first neural network model is obtained by respectively calculating importance scores of channels in the Y second convolution layers and respectively pruning the channels in the Y second convolution layers according to the importance scores.

4. The method of claim 3, wherein pruning the channels in the Y second convolution layers, respectively, based on the importance scores, comprises:

sorting importance scores of channels of an ith second convolution layer of the Y second convolution layers;

deleting X channels in the ith second convolution layer, wherein the importance scores of the X channels are lower than the importance scores of other channels in the ith second convolution layer, i is not less than 1 and not more than Y, i is an integer, X is not less than 1, and X is an integer.

5. The method of claim 4, wherein,

the ith second convolution layer comprises Z channels, the importance scores of the jth channels in the Z channels are determined according to the L1 norm or the L2 norm of the jth channel weight parameters, Z is larger than X and Z is an integer, 1 is smaller than or equal to j is smaller than or equal to Z and j is an integer.

6. The method of claim 4 or 5, wherein,

the ith second convolution layer increases the number of channels by Y relative to the ith first convolution layer of the Y first convolution layers _i And x=y _i 。

7. The method of claim 6, wherein,

y _i is determined according to at least one of the following: importance of the ith first convolution layer in all convolution layers of the second neural network model, the ith first convolution layer The number of channels in the inner wall.

8. The method of claim 7, wherein,

y _i greater than or equal to the number of channels in the ith first convolution layerAnd/or, y _i Less than or equal to the number of channels in the ith first convolution layer.

9. The method of claim 2, wherein the third neural network model is derived by increasing the number of channels within Y first convolution layers in the second neural network model, respectively, comprising:

the third neural network model is obtained by training a fourth neural network model to converge based on a dataset, wherein the dataset is a dataset used to train the second neural network model;

the fourth neural network model is obtained by increasing the number of channels of each of the Y first convolutional layers, respectively.

10. The method of claim 2 or 9, wherein the first neural network model is obtained by pruning channels of Y second convolution layers in a third neural network model, respectively, comprising:

the first neural network model is obtained by fine tuning a fifth neural network based on a dataset, wherein the dataset is a dataset used to train the second neural network model;

The fifth neural network model is obtained by pruning the channels of the Y second convolution layers in the third neural network model respectively.

11. A method for optimizing an image processing model, comprising:

respectively increasing the number of channels in Y first convolution layers in a second neural network model to obtain a third neural network model, wherein the second neural network model is used for image processing, the third neural network model comprises Y second convolution layers, and the Y first convolution layers are in one-to-one correspondence with the Y second convolution layers;

pruning is carried out on channels of the Y second convolution layers in the third neural network model respectively to obtain a first neural network model, the structure of the first neural network model is identical to that of the second neural network model, the first neural network model is used for image processing, the first neural network model comprises Y third convolution layers, the Y third convolution layers are in one-to-one correspondence with the Y second convolution layers, Y is larger than or equal to 1, and Y is an integer.

12. The method of claim 11, wherein pruning the channels of the Y second convolutional layers in the third neural network model, respectively, comprises:

Respectively calculating importance scores of channels in the Y second convolution layers;

and pruning the channels in the Y second convolution layers according to the importance scores.

13. The method of claim 12, wherein pruning the channels in the Y second convolution layers, respectively, based on the importance scores comprises:

14. The method of claim 13, wherein,

15. The method of claim 13 or 14, wherein,

16. The method of claim 15, wherein,

y _i is determined according to at least one of the following: the importance of the ith first convolution layer in all convolution layers of the second neural network model, the number of channels within the ith first convolution layer.

17. The method of claim 16, wherein,

18. The method of claim 11, wherein the increasing the number of channels in Y first convolution layers in the second neural network model, respectively, results in a third neural network model, comprising:

respectively increasing the number of channels of each first convolution layer in the Y first convolution layers to obtain a fourth neural network model;

training the fourth neural network model to converge based on a data set, wherein the data set is a data set used to train the second neural network model, to obtain the third neural network model.

19. The method of claim 11 or 18, wherein pruning the channels of the Y second convolutional layers in the third neural network model to obtain a first neural network model includes:

Pruning the channels of the Y second convolution layers in the third neural network model to obtain a fifth neural network model;

and fine-tuning the fifth neural network based on a data set to obtain the first neural network model, wherein the data set is used for training the second neural network model.

20. The method of claim 11, wherein the structure of the first neural network model is adapted to a hardware structure of an electronic device in the case that the structure of the second neural network model is adapted to the hardware structure of the electronic device.

21. An electronic device comprising a memory, and one or more processors, wherein the memory is configured to store a computer program; the processor is configured to invoke the computer program to cause the electronic device to perform the method of any of claims 1 to 10 or 11 to 20.

22. A computer storage medium, comprising: computer instructions; the computer instructions, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1 to 10 or 11 to 20.