CN115908149A

CN115908149A - Image shooting method, image shooting device, electronic equipment and computer readable storage medium

Info

Publication number: CN115908149A
Application number: CN202111163487.1A
Authority: CN
Inventors: 李逸群; 邢连萍; 凌健; 俞大海
Original assignee: TCL Technology Group Co Ltd
Current assignee: TCL Technology Group Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2023-04-04

Abstract

The embodiment of the application discloses an image shooting method, an image shooting device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a first shooting RAW image corresponding to a target shooting scene based on a shooting instruction; and inputting the first shooting RAW image into an image noise reduction model for processing, and outputting a noise reduction RAW image which is a result image of a shooting instruction. The method can improve the quality of the image shot by the electronic equipment.

Description

Image shooting method, image shooting device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image capturing method and apparatus, an electronic device, and a computer-readable storage medium.

Background

At present, electronic devices such as mobile phones and tablet computers are generally configured with a shooting component, so as to provide a shooting function for users, and enable the users to record things happening around the users, scenes seen and the like anytime and anywhere through the electronic devices. However, due to the hardware limitation of the shooting component of the electronic device, some noise is often present in the image shot by the electronic device.

Disclosure of Invention

The embodiment of the application provides an image shooting method and device, electronic equipment and a computer readable storage medium, which can improve the quality of images shot by the electronic equipment.

In a first aspect, an embodiment of the present application provides an image capturing method, including:

acquiring a first shooting RAW image corresponding to a target shooting scene based on a shooting instruction;

and inputting the first shooting RAW image into an image noise reduction model for processing, and outputting a noise reduction RAW image which is a result image of a shooting instruction.

In a second aspect, an embodiment of the present application further provides an image capturing apparatus, including:

the shooting module is used for acquiring a first shooting RAW image corresponding to a target shooting scene based on a shooting instruction;

and the noise reduction module is used for inputting the first shooting RAW image into an image noise reduction model for processing and outputting a noise reduction RAW image, wherein the noise reduction RAW image is a result image of the shooting instruction.

In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the image capturing method provided in any embodiment of the present application.

In a fourth aspect, an embodiment of the present application further provides an electronic device, where the electronic device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the steps in the image capturing method provided in any embodiment of the present application.

As can be seen from the above, as long as the training of the image noise reduction model is completed, when the electronic device performs subsequent shooting, the electronic device may shoot a shooting scene based on the shooting instruction, and input the RAW image obtained by shooting into the image noise reduction model for processing, so as to reduce noise in the image shot by the electronic device, output the noise reduction RAW image, and use the noise reduction RAW image as a result image of the shooting instruction. Therefore, the hardware limitation of the shooting component of the electronic equipment can be broken through, and the quality of the image shot by the electronic equipment is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an image capturing system according to an embodiment of the present application.

Fig. 2 is a schematic flowchart of an image capturing method according to an embodiment of the present application.

Fig. 3 is an exemplary diagram of a noise-reduced RAW image obtained by performing noise reduction processing on a first captured RAW image through an image noise reduction model according to the embodiment of the present application.

Fig. 4 is a schematic structural diagram of a student model in an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a first downsampling submodule in an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a first convolution sub-module in an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a second convolution sub-module in the embodiment of the present application.

Fig. 8 is a schematic structural diagram of an image capturing apparatus according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

It is to be appreciated that the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Relational terms such as first and second, and the like may be used solely to distinguish one object or operation from another object or operation without necessarily limiting the actual sequential relationship between the objects or operations. In the description of the embodiments of the present application, "a plurality" means two or more unless specifically defined otherwise.

Artificial Intelligence (AI) is a theory, method, technique and application system that utilizes a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly includes Machine Learning (ML) technology, in which Deep Learning (DL) is a new research direction in Machine Learning, and is introduced into Machine Learning to make it closer to the original target, i.e., artificial intelligence. At present, deep learning is mainly applied to the fields of computer vision, natural language processing and the like.

Deep learning is the intrinsic regularity and expression hierarchy of learning sample data, and the information obtained in these learning processes is of great help to the interpretation of data such as text, images and sound. By utilizing the deep learning technology and the corresponding training data set, network models for realizing different functions can be obtained through training, for example, a gender classification model for gender classification can be obtained through training based on one training data set, an image optimization model for image optimization can be obtained through training based on another training data set, and the like.

In order to reduce noise in an image shot by electronic equipment and improve the image quality of the image shot by the electronic equipment, the application introduces a deep learning technology into the image shooting of the electronic equipment, and correspondingly provides an image shooting method, an image shooting device, the electronic equipment and a computer readable storage medium. Wherein the image capturing method may be performed by an electronic device.

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, the present application further provides an image capturing system, which includes an electronic device 100 as shown in fig. 1. For example, when the electronic device 100 receives an input shooting instruction, in response to the shooting instruction, a shooting scene is shot through a shooting component to obtain a first shooting RAW image, then the first shooting RAW image is input into a trained image noise reduction model for noise reduction, the first shooting RAW image is processed through the image noise reduction model, a noise reduction RAW image is correspondingly output, and the noise reduction RAW image is used as a result image of the shooting instruction. The electronic device 100 may be any device equipped with a shooting component and having shooting capability, such as a mobile electronic device equipped with a shooting component, such as a smart phone, a tablet computer, a palmtop computer, and a notebook computer, or a stationary electronic device equipped with a shooting component, such as a desktop computer, a television, and an advertisement machine.

In addition, as shown in fig. 1, the image capturing system may further include a storage device 200, and the storage device 200 is used for storing data, for example, the electronic device 100 stores the captured first captured RAW image and the noise-reduced RAW image processed from the first captured RAW image in the storage device 200.

It should be noted that the scene schematic diagram of the image capturing system shown in fig. 1 is only an example, and the image capturing system and the scene described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not form a limitation on the technical solution provided in the embodiment of the present application, and it is known by a person skilled in the art that as the image capturing system evolves and a new service scene appears, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

Referring to fig. 2, fig. 2 is a schematic flowchart of an image capturing method according to an embodiment of the present disclosure, where the image capturing method is applied to an electronic device, and a process may be as follows:

s210, a first captured RAW image corresponding to the target captured scene is acquired based on the capture instruction.

In this embodiment, the electronic device is configured with a shooting component, and the shooting component is configured to collect an image, and at least includes a lens and an image sensor, where the lens is configured to project an external optical signal to the image sensor, and the image sensor is configured to perform photoelectric conversion on the optical signal projected by the lens, and convert the optical signal into an available electrical signal, so as to obtain a digitized image. After the capture component is enabled, the capture scene can be captured in real-time. The shooting scene can be understood as a real area aligned with the shooting component after being enabled, that is, an area in which the shooting component can convert the optical signal into a corresponding image. For example, after the electronic device enables the shooting component according to the user operation, if the user controls the shooting component of the electronic device to be aligned with an area including a certain object, the area including the object is a shooting scene of the shooting component.

In this embodiment, the shooting instruction may be directly input by the user, including but not limited to input by touching, voice, and physical keys. For example, after a user operates the electronic device to start a photographing application (e.g., a system application "camera" of the electronic device), the user may input a photographing instruction to the electronic device by clicking a "photographing" button provided in a "camera" preview interface, or may input a photographing instruction to the electronic device by speaking a voice instruction "photograph".

In the embodiment of the application, after receiving an input shooting instruction, the electronic device controls the shooting component to shoot a target shooting scene according to configured shooting parameters in response to the shooting instruction, so that an unprocessed RAW image is obtained and recorded as a first shooting RAW image. The shooting parameters include, but are not limited to, exposure parameters, white balance parameters, whether to turn on a flash, and the like.

And S220, inputting the first shot RAW image into an image noise reduction model for processing, and outputting a noise reduction RAW image which is a result image of the shooting instruction.

In this embodiment, the image noise reduction model is configured to perform noise reduction processing on the input RAW image, and the noise reduction processing may be understood in a popular way to reduce noise existing in the RAW image, so as to obtain a RAW image with lower noise intensity than the RAW image. The image noise reduction model is obtained by training based on a first sample RAW image and a second sample RAW image, the first sample RAW image is obtained by shooting of an external device, the second sample RAW image is obtained by adding noise to the first sample RAW image according to the noise intensity of the image shot by the electronic device, and the noise intensity of the image shot by the external device is smaller than that of the image shot by the electronic device. The model structure and the training mode of the image noise reduction model are not specifically limited, and can be selected by those skilled in the art according to actual needs. For example, the image noise reduction model can be obtained by training using an auto-encoder as a base model.

It should be noted that the fact that the noise intensity of the image captured by the external device is less than the noise intensity of the image captured by the electronic device means that the noise intensity of the image captured by the external device is less than the noise intensity of the image captured by the electronic device when the external device and the electronic device capture the same shooting scene. The entity display form of the external device is not particularly limited, and the external device may be any other electronic device configured with a shooting component with a shooting capability stronger than that of the shooting component of the electronic device, so as to ensure that the noise intensity of the image shot by the external device is less than that of the image shot by the electronic device, for example, the electronic device may be a smartphone, and the external device may be a single lens reflex camera.

The evaluation method of the noise intensity can be selected by those skilled in the art according to actual needs, and is not limited in particular here.

Illustratively, the noise evaluation is performed in the present embodiment as follows.

It should be noted that additive noise generally refers to thermal noise, shot noise, and the like, which are additive with respect to a signal, and such noise is always present regardless of the presence or absence of the signal. Multiplicative noise is generally caused by channel imperfections, which are multiplied by the signal, which is present and absent. In contrast, for an image captured by an electronic device, multiplicative noise and additive noise generally exist at the same time. Therefore, the noise intensity can be divided into a multiplicative noise intensity and an additive noise intensity.

In this embodiment, the multiplicative noise intensity and the additive noise intensity are evaluated at the same time.

In this case, the multiplicative noise intensity and the additive noise intensity of the electronic device are evaluated. The electronic equipment controls the shooting assembly to obtain a preset number of gray-scale graphs under set light sensitivity, firstly, the multi-frame average value of each pixel point is calculated, then, the variance of all frames of pixel positions with the same average value is calculated, linear fitting is conducted on the average value and the variance, the slope of a linear function obtained through fitting is used as multiplicative noise intensity, and the intercept of the linear function obtained through fitting is used as additive noise intensity. Accordingly, the noise intensity of the image captured by the external device is smaller than the noise intensity of the image captured by the electronic device, which can be understood as follows: when the external device and the electronic device shoot the same shooting scene, the multiplicative noise intensity and the additive noise intensity of the image shot by the external device are respectively smaller than those of the image shot by the electronic device.

In this embodiment, after the first captured RAW image of the target captured scene is captured, the electronic device further inputs the captured first captured RAW image into a trained image noise reduction model, and performs noise reduction processing on the first captured RAW image through the image noise reduction model to obtain a noise-reduced RAW image of the first captured RAW image.

For example, referring to fig. 3, the left side of fig. 3 shows a first captured RAW image obtained by capturing a target captured scene by the electronic device, as shown in fig. 3, the first captured RAW image has some noise, and the electronic device inputs the first captured RAW image into an image noise reduction model for noise reduction processing to obtain a noise-free noise-reduced RAW image, as shown in the right side of fig. 3.

In this embodiment, after the electronic device performs noise reduction processing on the first captured RAW image obtained by actual capturing through the image noise reduction model and accordingly obtains the noise-reduced RAW image, the electronic device replaces the first captured RAW image with the noise-reduced RAW image as a result image of the capturing instruction.

In an optional embodiment, before acquiring the first captured RAW image corresponding to the target captured scene based on the capturing instruction, the image capturing method provided by the present application may further include:

acquiring a trained teacher model for noise reduction;

acquiring a student model, wherein the network parameter number of the student model is smaller than that of the teacher model;

acquiring a first sample RAW image and a second sample RAW image;

and carrying out distillation training on the student model according to the teacher model, the first sample RAW image and the second sample RAW image to obtain an image noise reduction model.

It can be understood that, in order to ensure the noise reduction performance of the image noise reduction model, the image noise reduction model is required to have a relatively complex structure, so that the image noise reduction model has more network parameters, and the noise reduction efficiency of the image noise reduction model is reduced. In order to enable the trained image noise reduction model to obtain an effective balance between noise reduction performance and noise reduction efficiency, the embodiment provides an optional training mode for the image noise reduction model.

The electronic equipment acquires a trained teacher model which is also used for carrying out noise reduction on the image and acquires a student model, wherein the structure of the student model is similar to that of the teacher model, but the network parameter quantity of the student model is smaller than that of the teacher model. For example, the obtained teacher model and the obtained student model are both U-shaped residual error network models.

In addition, the electronic equipment also acquires the first sample RAW image and the second sample RAW image, carries out distillation training on the student model according to the trained teacher model and the first sample RAW image and the second sample RAW image, and takes the trained student model as an image noise reduction model.

In the embodiment, a knowledge distillation mode is adopted, the complex model trained firstly is used as a teacher model to guide the training of the small model, namely the student model, and the feature expression capability of the student model can be improved in a targeted manner, so that the image noise reduction model obtained by training has higher noise reduction performance and noise reduction efficiency.

It should be noted that the present embodiment guides the training of the student model through the feature expression of the teacher model, and mainly achieves the purpose through the design of the loss function. Determining the prediction loss of the student model according to the first sample RAW image and the second sample RAW image; determining the characteristic matching loss of the student model and the teacher model according to the RAW image of the second sample; fusing the prediction loss and the characteristic matching loss to obtain the distillation loss of the student model; and adjusting the network parameters of the student model according to the distillation loss to obtain the image noise reduction model.

For the prediction loss, denoising the second sample RAW image through a student model to obtain a sample denoised RAW image; and determining the prediction loss of the student model according to the sample noise reduction RAW image and the first sample RAW image.

For the feature matching loss, it can be determined from the difference of the features of the second sample RAW image extracted by the student model and the teacher model.

It should be noted that the embodiment does not specifically limit how to fuse the prediction loss and the feature matching loss, and can be configured by those skilled in the art according to actual needs. For example, in this embodiment, the distillation loss of the student model obtained by fusing the prediction loss and the feature matching loss can be expressed as:

Loss＝w _stu L _stu +w _tea L _tea ；

wherein Loss represents the distillation Loss of the student model, L _stu Representing the predicted loss of the student model, L _tea Representing loss of feature matching, w, of student model and teacher model _stu Indicating the allocation of predicted loss L _stu Weight of (1), w _tea Representing the assigned feature matching penalty L _tea The weight of (c). It should be noted that to predict the loss L _stu Sum-feature matching penalty L _tea The weighted orders of magnitude being the same as the constraint, w _stu And w _tea The value can be obtained by the person skilled in the art according to the actual needs.

In an optional embodiment, referring to fig. 4, the student model is composed of three parts, which are a first coding network, a first decoding network and a first convolution output network, respectively, where the first coding network is configured to perform feature coding processing on an input RAW image to obtain a feature map, the first decoding network is configured to perform feature decoding processing on the feature map obtained by the first coding network to obtain a feature decoding map, and the first convolution output network is configured to perform convolution processing on the feature decoding map to obtain a noise-reduced RAW image.

Correspondingly, when the noise reduction processing is carried out on the second sample RAW image through the student model to obtain a sample noise reduction RAW image, the feature coding processing can be carried out on the second sample RAW image through the first coding network to obtain a first feature map; performing characteristic decoding processing on the first characteristic diagram through a first decoding network to obtain a first characteristic decoding diagram; and performing convolution processing on the first characteristic decoding image through a first convolution output network, and then adding the convolution processed first characteristic decoding image and the second sample RAW image to obtain a sample noise reduction RAW image.

In an optional embodiment, the first coding network includes 1 convolutional layer and N first coding modules, the first coding module includes a first downsampling sub-module and a first convolution sub-module, and when the first coding network performs feature coding processing on the RAW image of the second sample to obtain the first feature map, the convolutional layer in the first coding network may perform convolution processing on the RAW image of the second sample to obtain an initial feature map; and performing downsampling processing and convolution processing on the initial feature map sequentially through a first downsampling submodule and a first convolution submodule in the N first coding modules to obtain a first feature map.

It should be noted that N is a positive integer greater than 1, and can be taken by those skilled in the art according to actual needs. For example, in this embodiment, N takes a value of 4, that is, the first encoding network includes 4 first encoding modules. In addition, the present embodiment does not specifically limit the structures of the first downsampling submodule and the first convolution submodule, and may be set by a person skilled in the art according to actual needs.

Illustratively, assume that the size of the feature map input to the first downsampling submodule is HxWxC, where H denotes high, W denotes wide, and C denotes the number of channels. Referring to fig. 5, the first downsampling submodule includes two branches, wherein one branch includes a depth separable convolution layer with step size 2 and convolution kernel 5x5x (C/2), a feature map of output (H/2) x (W/2) x (C/4), an activation function layer (an activation function, such as a ReLU function, can be configured according to actual needs by those skilled in the art), one step size 1, and convolution kernel 5x5xC, and a feature map of output (H/2) x (W/2) x (C). The other branch includes a depth separable convolutional layer with a step size of 2 and a convolution kernel of 3x3x (2C), outputting a (H/2) x (W/2) x (C) feature map. The two branches are connected by an adding unit (denoted #infig. 5) for adding the output results of the two branches as the output result of the first downsampling submodule (a characteristic diagram with the size of (H/2) x (W/2) x (C)).

Referring to fig. 6, the first convolution submodule includes a depth separable convolution layer with a step size of 1 and a convolution kernel of 5x5x (C/4), a feature map of output (H/2) x (W/2) x (C/4), an activation function layer (an activation function such as a ReLU function may be configured according to actual needs by those skilled in the art), a separable convolution layer of 5x5xC with stride =1, and a feature map of output (H/2) x (W/2) x (C), which is added to the feature map output by the first lower sampling submodule by an adding unit (denoted by ·infig. 6) to obtain an output result (a feature map with a size of (H/2) x (W/2) x (C)) of the first convolution submodule.

In addition, the number of the first convolution sub-modules in each first encoding module is not particularly limited in this embodiment, and may be configured by those skilled in the art according to actual needs. For example, in other embodiments, the other first encoding modules may include only the first downsampling sub-module, except for the last first encoding module.

In an optional embodiment, the first decoding network includes N first decoding modules and N depth-separable convolutional layers, the first decoding module includes a second convolution sub-module and a first upsampling sub-module, and when the first feature map is subjected to feature decoding processing by the first decoding network to obtain a first feature decoded map, the first feature map may be subjected to convolution processing and upsampling processing by the second convolution sub-module and the first upsampling sub-module in the 1 st first decoding module to obtain a 1 st first upsampled feature map; convolution processing is carried out on the process characteristic diagram output by the (N-i) th first coding module through the (i) th depth separable convolution layer, then addition processing is carried out on the process characteristic diagram output by the (N-i) th first decoding module and the first up-sampling characteristic diagram of the (i) th first decoding module to obtain an (i) th first sum value characteristic diagram, convolution processing and up-sampling processing are carried out on the (i) th first sum value characteristic diagram through a second convolution submodule and a first up-sampling submodule in the (i + 1) th first decoding module to obtain an (i + 1) th first up-sampling characteristic diagram, and i belongs to [1, N-1]; and acquiring an Nth first up-sampling feature map, performing convolution processing on the initial feature map through an Nth depth separable convolution layer, and then adding the initial feature map and the Nth first up-sampling feature map to obtain a first feature decoding map.

It should be noted that the present embodiment does not specifically limit the structures of the first upsampling sub-module and the second convolution sub-module, and can be set by those skilled in the art according to actual needs.

Illustratively, assume that the size of the signature graph input into the second convolution sub-module is HxWxC, where H denotes high, W denotes wide, and C denotes the number of channels. Referring to fig. 7, the second convolution sub-module may include a depth separable convolution layer with a step size of 1 and a convolution kernel of 3x3xC, and output a characteristic diagram of HxWxC, an activation function layer (an activation function, such as a ReLU function, may be configured by those skilled in the art according to actual needs), another step size of 1, a separable convolution layer with a convolution kernel of 3x3xC, and an output characteristic diagram of HxWxC, which are added to the characteristic diagram of the input second convolution sub-module by an adding unit (denoted by &infig. 7) to obtain an output result (a characteristic diagram with a size of HxWxC) of the second convolution sub-module.

The first upsampling sub-module may comprise a transposed convolutional layer with a convolution kernel of 2x2, outputting an upsampled feature map with a size of (2H) x (2W) xC.

In an alternative embodiment, the first convolutional output network may include 1 second convolutional submodule and 1 common convolutional layer, for example, the common convolutional layer may be a common convolutional layer with a convolutional kernel of 3 × 4.

In an optional embodiment, the teacher model is also composed of three parts, which are a second coding network, a second decoding network and a second convolution output network, respectively, where the second coding network is configured to perform feature coding processing on an input RAW image to obtain a feature map, the second decoding network is configured to perform feature decoding processing on the feature map obtained by the first coding network to obtain a feature decoding map, and the second convolution output network is configured to perform addition processing on the feature decoding map and the input RAW image after convolution processing is performed on the feature decoding map to obtain a RAW image after noise reduction. Correspondingly, when the feature matching loss of the student model and the teacher model is determined according to the second sample RAW image, feature coding processing can be performed on the second sample RAW image through a second coding network to obtain a second feature map; performing characteristic decoding processing on the second characteristic diagram through a second decoding network to obtain a second characteristic decoding diagram; acquiring a first difference of a second coding network and a first coding network for coding a second sample RAW image; acquiring a second difference between the feature decoding processing of the second feature map by the second decoding network and the feature decoding processing of the first feature map by the first decoding network; and determining the feature matching loss of the student model and the teacher model according to the first difference and the second difference.

The second coding network may include M second coding modules and 1 third convolution sub-module, M is a positive integer greater than 1, the second coding module includes a fourth convolution sub-module and a pooling sub-module, and when the second coding network performs feature coding on the RAW image of the second sample to obtain a second feature map, the second coding network may sequentially perform convolution processing and pooling processing on the RAW image of the second sample through the fourth convolution sub-module and the pooling sub-module of the M second coding modules to obtain an intermediate feature map; and carrying out convolution processing on the intermediate characteristic diagram through a third convolution submodule to obtain a second characteristic diagram.

It should be noted that the values of M and N may be the same or different, for example, the values of M and N are the same in this embodiment. In addition, the present embodiment does not specifically limit the structures of the third convolution sub-module, the fourth convolution sub-module, and the pooling sub-module, and can be configured by those skilled in the art according to actual needs.

Illustratively, the third convolution submodule and the fourth convolution submodule may have the same structure, and each convolution submodule includes two general convolution layers with a step size of 1 and a convolution kernel of 3 × 3. The pooling sub-module may include a 2x2 pooling layer including, but not limited to, a maximum pooling layer or an average pooling layer, etc.

In addition, the structure of the second convolution output network can also be configured by those skilled in the art according to actual needs, and this embodiment does not specifically limit this, for example, the configuration of the second convolution output network in this embodiment may include a common convolution layer (without an activation function) with a convolution kernel of 3 × 4.

In an optional embodiment, the second decoding network includes M second decoding modules, each second decoding module includes a second upsampling sub-module and a fifth convolution sub-module, when the feature map is subjected to feature decoding processing by the second decoding network to obtain a second feature decoded map, the second feature map may be subjected to upsampling processing by the second upsampling sub-module in the 1 st second decoding module to obtain a 1 st second upsampling feature map, the 1 st second upsampling feature map is added to the process feature map output by the nth second encoding module to obtain a 1 st second sum value feature map, and the 1 st second sum value feature map is subjected to convolution processing by the fifth convolution sub-module of the 1 st second decoding module to obtain a 1 st convolution feature map; performing up-sampling processing on the jth convolution feature map through a second up-sampling sub-module of the jth second decoding module to obtain a jth second up-sampling feature map, adding the jth second up-sampling feature map and the process feature map output by the M- (j-1) th second coding module to obtain a jth second sum feature map, and performing convolution processing on the jth second sum feature map through a fifth convolution sub-module of the jth second decoding module to obtain a jth convolution feature map, j ∈ [2, M ]; and acquiring the Mth convolution characteristic diagram, and taking the Mth convolution characteristic diagram as a second characteristic decoding diagram.

In this embodiment, when a first difference that the second coding network and the first coding network perform coding processing on the second sample RAW image is obtained, differences between feature maps of the second sample RAW image output by the first coding module and the second coding module in all the same levels in the second coding network and the first coding network may be accumulated as the first difference. It should be noted that, when the output heights and widths of the first encoding module and the second encoding module in the same hierarchy are the same, but the channel numbers are not the same, the feature maps output by the first encoding module and the second encoding module may be averaged or maximized along the channel direction, respectively, to obtain a feature map of a single channel, and then the difference between the two feature maps is calculated.

When a second difference between the feature decoding processing performed on the second feature map by the second decoding network and the feature decoding processing performed on the first feature map by the first decoding network is obtained, a difference between the process feature map of the second feature map output by each second decoding module in the second decoding network and the process feature map of the first feature map output by the first decoding module in the same level in the first decoding network may be accumulated as the second difference. It should be noted that, when the output heights and widths of the first decoding module and the second decoding module at the same level are the same, but the channel numbers are not the same, the feature maps output by the first decoding module and the second decoding module may be averaged or maximized along the channel direction, respectively, to obtain a feature map of a single channel, and then the difference between the two feature maps is calculated.

When the feature matching loss is determined based on the first difference and the second difference, the difference sum of the first difference and the second difference may be directly determined as the feature matching loss.

In an optional embodiment, a second captured RAW image captured by an external device is acquired, and the second captured RAW image is cropped to obtain a first sample RAW image; determining a target sensitivity from a sensitivity range of the electronic device, and determining a target noise intensity corresponding to the target sensitivity according to a correspondence between the sensitivity and a noise intensity of an image captured by the electronic device; and adding noise to the first sample RAW image according to the target noise intensity to obtain a second sample RAW image.

In this embodiment, the electronic apparatus may acquire a second captured RAW image set made up of a plurality of second captured RAW images captured by the external apparatus in advance, thereby acquiring a pair of the first sample RAW image and the second sample RAW image using a second captured RAW in the second captured RAW image set.

Illustratively, at each epoch of the training, the electronic device rearranges the order of the second captured RAW images in the second captured RAW image set and sets the batch size to 1. Correspondingly, the electronic device sequentially acquires a second shooting RAW image from the second shooting RAW image set after the sequential rearrangement, performs clipping processing on the acquired second shooting RAW image, and takes the clipped sub RAW image as a first sample RAW image. The cutting process is not particularly limited, and may be configured by those skilled in the art according to actual needs.

For example, with the restriction that the clipped RAW image still conforms to the bayer array, the second captured RAW image may be randomly clipped to obtain 1024 × 1024 sub RAW images as the first sample RAW image.

It should be noted that the present embodiment also establishes in advance a correspondence between the sensitivity of the electronic apparatus and the noise intensity of an image captured by the electronic apparatus, where the correspondence describes the noise intensity of an image captured by a capturing component of the electronic apparatus at different sensitivities.

In this embodiment, the electronic device further determines a target sensitivity from a sensitivity range of the electronic device (that is, a sensitivity range of a camera module of the electronic device), wherein different sensitivities may be sequentially selected from the sensitivity range of the electronic device as the target sensitivity, or the target sensitivities may be randomly selected from the sensitivity range of the electronic device.

For example, assuming that the electronic apparatus has a sensitivity range of 100 to 6400, the electronic apparatus may randomly select one sensitivity as the target sensitivity in a uniform distribution of 100 to 6400.

As described above, after determining the target sensitivity, the electronic apparatus further determines the noise intensity corresponding to the target sensitivity, which is regarded as the target noise intensity, based on the correspondence between the sensitivity of the electronic apparatus and the noise intensity of the image captured by the electronic apparatus.

And then, the electronic equipment further adds noise to the first sample RAW image according to the determined target noise intensity to obtain a second sample RAW image, so that the acquired first sample RAW image and the acquired second sample RAW image are used as training pairs to realize distillation training of the student model until a preset training stopping condition is met. The configuration of the preset training stopping condition is not particularly limited, and may be configured by a person skilled in the art according to actual needs, for example, the preset training stopping condition may be configured to be student model convergence, or the number of epochs of training reaches a preset number, and the like.

In an optional embodiment, the target noise strength includes a target multiplicative noise strength and a target additive noise strength, and the noise adding is performed on the first sample RAW image according to the target noise strength to obtain the second sample RAW image, including:

adding multiplicative noise to the first sample RAW image according to the target multiplicative noise intensity, and adding additive noise to the first sample RAW image according to the target additive noise intensity to obtain a second sample RAW image.

Accordingly, in this embodiment, establishing a correspondence between the sensitivity of the electronic device and the noise intensity of the image captured by the electronic device includes: the first correspondence between the sensitivity of the electronic device and the multiplicative noise intensity of the image captured by the electronic device, and the second correspondence between the sensitivity of the electronic device and the additive noise intensity of the image captured by the electronic device.

Illustratively, the first correspondence and the second correspondence as described above may be generated as follows:

assuming that the sensitivity range of a shooting component of the electronic device is 100-6400, the electronic device controls the shooting component to obtain a 60-frame gray scale map under different sensitivities (such as sensitivity =100, 200, 400, 800, 1200, 1600, 2400, 3200, 4000, 4800, 5600, 6400), firstly, calculating the multi-frame average value of each pixel point, then calculating the variance of all frames of pixel positions with the same average value, performing linear fitting on the average value and the variance, taking the slope of a linear function obtained by fitting as multiplicative noise intensity, and taking the intercept of the linear function obtained by fitting as additive noise intensity. The electronic device further performs linear fitting of the obtained multiplicative noise intensity k and the sensitivity, and a linear function of the multiplicative noise intensity k obtained by fitting at this time with respect to the sensitivity is defined as the first correspondence relation. The electronic device further performs quadratic function fitting of the obtained additive noise intensity and sensitivity, and takes a quadratic function of the additive noise intensity with respect to the sensitivity obtained by the fitting at this time as the second correspondence.

Accordingly, in the present embodiment, based on the above first and second correspondence relationships, the determined target noise strength includes a target multiplicative noise strength and a target additive noise strength, and when the first sample RAW image is subjected to noise processing according to the target noise strength, the electronic device adds multiplicative noise to the first sample RAW image according to the target multiplicative noise strength, and adds additive noise to the first sample RAW image according to the target additive noise strength, thereby obtaining the second sample RAW image.

For example, take poisson noise and gaussian noise as examples:

x～k P(x*/k)+N(0,o’ ² )；

wherein x denotes the first sample RAW image, x denotes the second sample RAW image, and k denotes the target multiplicative noise intensity, o' ² Representing the target additive noise strength, P () representing the poisson distribution, and N () representing the gaussian distribution.

In an optional embodiment, when the electronic device is currently in a stable state, a first shooting RAW image corresponding to a target shooting scene is acquired based on a shooting instruction.

In this embodiment, when receiving an input shooting instruction, the electronic device does not directly respond to the input shooting instruction, but first determines whether the electronic device is currently in a stable state, and when the electronic device is currently in the stable state, the electronic device responds to the input shooting instruction to shoot a target shooting scene, so as to obtain a first shooting RAW image correspondingly.

The electronic device may determine the stable state in a plurality of different manners, for example, the electronic device may determine whether the current speeds in all directions are less than a preset speed, if so, determine that the electronic device is currently in the stable state, otherwise, determine that the electronic device is currently in the unstable state (or in the jittering state); for another example, the electronic device may determine whether the current displacement in each direction is smaller than a preset displacement, if so, determine that the current displacement is in a stable state, otherwise, determine that the current displacement is in an unstable state (or a jitter state).

In an optional embodiment, when the electronic device is currently in a stable state and the target shooting scene is in a static state, the first shooting RAW image corresponding to the target shooting scene is acquired based on the shooting instruction.

In this embodiment, when determining that the target shooting scene is in the static state, the electronic device does not directly respond to the input shooting instruction, but first determines whether the target shooting scene is in the static state, and only responds to the input shooting instruction when the target shooting scene is in the static state to shoot the target shooting scene, so as to obtain the first shooting RAW image correspondingly.

In this embodiment, a person skilled in the art can select a suitable determination method according to actual needs to determine how to determine whether the target shooting scene is in the static state, which is not specifically limited, for example, an optical flow method, a residual method, and the like can be used to determine whether the target shooting scene is in the static state.

Therefore, when the image shooting method provided by the application is applied to the electronic equipment, the image shot by the electronic equipment is subjected to noise reduction processing through the image noise reduction model to be used as a result image, noise points in the image shot by the electronic equipment can be reduced, and the image quality of the image shot by the electronic equipment is improved. In addition, the second sample RAW image used for training the image noise reduction model is obtained by adding noise to the first sample RAW image, and the image details of the second sample RAW image and the image details of the first sample RAW image are consistent, so that the image details can be ensured not to be lost when the image noise reduction model obtained by training is subjected to noise reduction processing.

In order to better implement the image capturing method in the embodiment of the present application, on the basis of the image capturing method, the present application further provides an image capturing apparatus, as shown in fig. 8, where the image capturing apparatus 300 includes:

a shooting module 310, configured to obtain a first shooting RAW image corresponding to a target shooting scene based on a shooting instruction;

and the noise reduction module 320 is configured to input the first captured RAW image into an image noise reduction model for processing, and output a noise-reduced RAW image, where the noise-reduced RAW image is a result image of the capture instruction.

In an optional embodiment, the image noise reduction model is trained based on a first sample RAW image and a second sample RAW image, the first sample RAW image is captured by an external device, the second sample RAW image is obtained by adding noise to the first sample RAW image according to the noise intensity of the captured image of the electronic device, and the noise intensity of the captured image of the external device is smaller than the noise intensity of the captured image of the electronic device.

In an optional embodiment, the image capturing apparatus 300 provided in the present application may further include a training module, configured to:

acquiring a trained teacher model for noise reduction;

acquiring a first sample RAW image and a second sample RAW image;

In an optional embodiment, the training module is configured to:

determining the prediction loss of the student model according to the first sample RAW image and the second sample RAW image;

determining the characteristic matching loss of the student model and the teacher model according to the RAW image of the second sample;

fusing the prediction loss and the characteristic matching loss to obtain the distillation loss of the student model;

and adjusting the network parameters of the student model according to the distillation loss to obtain the image noise reduction model.

In an optional embodiment, the training module is configured to:

denoising the second sample RAW image through a student model to obtain a sample denoising RAW image;

and determining the prediction loss of the student model according to the difference between the sample de-noising RAW image and the first sample RAW image.

In an optional embodiment, the student model comprises a first encoding network, a first decoding network and a first convolution output network, and the training module is configured to:

performing feature coding processing on the RAW image of the second sample through a first coding network to obtain a first feature map;

performing feature decoding processing on the first feature map through a first decoding network to obtain a first feature decoding map;

and performing convolution processing on the first feature decoded image through a first convolution output network, and then adding the convolution processed image and the second sample RAW image to obtain a sample noise-reduced RAW image.

In an optional embodiment, the first coding network includes 1 convolutional layer and N first coding modules, where N is a positive integer greater than 1, the first coding module includes a first downsampling submodule and a first convolution submodule, and the training module is configured to:

performing convolution processing on the RAW image of the second sample through the convolution layer to obtain an initial characteristic diagram;

and performing downsampling processing and convolution processing on the initial feature map sequentially through a first downsampling submodule and a first convolution submodule in the N first coding modules to obtain a first feature map.

In an optional embodiment, the first decoding network comprises N first decoding modules and N depth separable convolutional layers, the first decoding modules comprise a second convolutional submodule and a first upsampling submodule, and the training module is configured to:

performing convolution processing and up-sampling processing on the first feature map through a second convolution submodule and a first up-sampling submodule in a 1 st first decoding module to obtain a 1 st first up-sampling feature map;

convolution processing is carried out on the process characteristic diagram output by the (N-i) th first coding module through an i-th depth separable convolution layer, then addition processing is carried out on the process characteristic diagram and the first up-sampling characteristic diagram of the i-th first decoding module to obtain an i-th first sum value characteristic diagram, convolution processing and up-sampling processing are carried out on the i-th first sum value characteristic diagram through a second convolution submodule and a first up-sampling submodule in the (i + 1) th first decoding module to obtain an i + 1-th first up-sampling characteristic diagram, and i belongs to [1, N-1];

and acquiring an Nth first up-sampling feature map, performing convolution processing on the initial feature map through an Nth depth separable convolution layer, and adding the initial feature map and the Nth first up-sampling feature map to obtain a first feature decoding map.

In an alternative embodiment, the teacher model includes a second encoding network, a second decoding network, and a second convolutional output network, and the training module is configured to:

performing feature coding processing on the RAW image of the second sample through a second coding network to obtain a second feature map;

performing feature decoding processing on the second feature map through a second decoding network to obtain a second feature decoding map;

acquiring a first difference of a second coding network and a first coding network in coding processing of a second sample RAW image;

acquiring a second difference between the feature decoding processing of the second feature graph by the second decoding network and the feature decoding processing of the first feature graph by the first decoding network;

and determining the feature matching loss of the student model and the teacher model according to the first difference and the second difference.

In an optional embodiment, the second coding network includes M second coding modules and 1 third convolution sub-modules, M is a positive integer greater than 1, the second coding modules include a fourth convolution sub-module and a pooling sub-module, and the training module is configured to:

carrying out convolution processing and pooling processing on the RAW image of the second sample sequentially through a fourth convolution submodule and a pooling submodule in the M second coding modules to obtain an intermediate characteristic diagram;

and performing convolution processing on the intermediate characteristic diagram through a third convolution submodule to obtain a second characteristic diagram.

In an optional embodiment, the second decoding network includes M second decoding modules, the second decoding modules include a second upsampling sub-module and a fifth convolution sub-module, and the training module is configured to:

performing upsampling processing on the second feature map through a second upsampling submodule in a 1 st second decoding module to obtain a 1 st second upsampling feature map, adding the 1 st second upsampling feature map and the process feature map output by an Nth second encoding module to obtain a 1 st second sum feature map, and performing convolution processing on the 1 st second sum feature map through a fifth convolution submodule of the 1 st second decoding module to obtain a 1 st convolution feature map;

performing up-sampling processing on the jth convolution feature map through a second up-sampling sub-module of a jth second decoding module to obtain a jth second up-sampling feature map, adding the jth second up-sampling feature map and the process feature map output by the M- (j-1) th second coding module to obtain a jth second sum feature map, and performing convolution processing on the jth second sum feature map through a fifth convolution sub-module of the jth second decoding module to obtain a jth convolution feature map, wherein j belongs to [2, M ];

and acquiring the Mth convolution characteristic diagram, and taking the Mth convolution characteristic diagram as a second characteristic decoding diagram.

It should be noted that the image capturing apparatus provided in the embodiment of the present application and the image capturing method in the foregoing embodiment belong to the same concept, and specific implementation processes thereof are described in detail in the embodiment of the image capturing method, and are not described herein again.

The embodiment of the application further provides an electronic device, which may be a mobile electronic device configured with a shooting component, such as a smart phone, a tablet computer, a palm computer, a notebook computer, or a fixed electronic device configured with a shooting component, such as a desktop computer, a television, an advertisement player, or the like. Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 includes a processor 110 having one or more processing cores, a memory 120 having one or more computer-readable storage media, and a computer program stored on the memory 120 and executable on the processor. The processor 110 is electrically connected to the memory 120. Those skilled in the art will appreciate that electronic device 100 may include more or fewer components than illustrated, or some components may be combined, or a different arrangement of components.

The processor 110 is a control center of the electronic device 100, connects various parts of the whole electronic device 100 by using various interfaces and lines, performs various functions of the electronic device 100 and processes data by running or loading software programs and/or modules stored in the memory 120, and calling data stored in the memory 120, thereby performing overall monitoring of the electronic device 100.

In this embodiment, the processor 110 in the electronic device 100 loads instructions corresponding to processes of one or more application programs into the memory 120, and the processor 110 runs the application programs stored in the memory 120, so as to implement the image capturing method provided by this application, for example:

and inputting the first shot RAW image into an image noise reduction model for processing, and outputting a noise reduction RAW image which is a result image of the shooting instruction.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

Optionally, as shown in fig. 9, the electronic device 100 may further include: touch display 130, rf circuit 140, shooting element 150, input unit 160 and power 170. The processor 110 is electrically connected to the touch display screen 130, the radio frequency circuit 140, the shooting component 150, the input unit 160, and the power source 170, respectively.

The touch display screen 130 can be used for displaying a graphical user interface and receiving an operation instruction generated by a user acting on the graphical user interface. The touch display screen 130 may include a display panel and a touch panel. The display panel may be used, among other things, to display information entered by or provided to a user and various graphical user interfaces of the electronic device, which may be made up of graphics, text, icons, video, and any combination thereof. Alternatively, the Display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. The touch panel may be used to collect touch operations of a user (for example, operations of the user on or near the touch panel by using a finger, a stylus pen, or any other suitable object or accessory) and generate corresponding operation instructions, and the operation instructions execute corresponding programs. Alternatively, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. The touch panel may cover the display panel, and when the touch panel detects a touch operation thereon or nearby, the touch panel transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel according to the type of the touch event. In the embodiment of the present application, the touch panel and the display panel may be integrated into the touch display screen 130 to realize input and output functions. However, in some embodiments, the touch panel and the touch panel can be implemented as two separate components to perform the input and output functions. That is, the touch display 130 can also be used as a part of the input unit 160 to implement an input function.

The rf circuit 140 may be used for transceiving rf signals to establish wireless communication with a network device or other electronic devices via wireless communication, and for transceiving signals with the network device or other electronic devices.

The photographing assembly 150 is configured to collect an image, and includes at least a lens and an image sensor, where the lens is configured to project an external optical signal to the image sensor, and the image sensor is configured to perform photoelectric conversion on the optical signal projected by the lens, and convert the optical signal into a usable electrical signal, so as to obtain a digitized image.

The input unit 160 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint, iris, facial information, etc.), and generate a keyboard, mouse, joystick, optical, or trackball signal input related to user settings and function control.

The power supply 170 is used to power the various components of the electronic device 100. Optionally, the power supply 170 may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, power consumption management, and the like through the power management system. The power supply 170 may also include any component including one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any one of the image capturing methods provided by the embodiments of the present application.

Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

Since the computer program stored in the storage medium can execute the steps of any image capturing method provided in the embodiments of the present application, the beneficial effects that can be achieved by any image capturing method provided in the embodiments of the present application can be achieved, and for details, the steps are not described again in the foregoing embodiments.

The image capturing method, the image capturing apparatus, the electronic device, and the computer-readable storage medium provided in the embodiments of the present application are described in detail above, and specific examples are applied in the present application to explain the principles and embodiments of the present application, and the description of the above embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. An image capturing method, characterized by comprising:

and inputting the first shooting RAW image into an image noise reduction model for processing, and outputting a noise reduction RAW image which is a result image of the shooting instruction.

2. The method of claim 1, wherein the image noise reduction model is trained based on a first sample RAW image and a second sample RAW image, the first sample RAW image is captured by an external device, the second sample RAW image is obtained by the first sample RAW image after noise addition processing according to a noise intensity of an image captured by an electronic device, and the noise intensity of the image captured by the external device is smaller than the noise intensity of the image captured by the electronic device.

3. The method according to claim 2, wherein before the acquiring the first captured RAW image of the corresponding target captured scene based on the capture instruction, the method further comprises:

acquiring a trained teacher model for noise reduction;

acquiring the first sample RAW image and the second sample RAW image;

and carrying out distillation training on the student model according to the teacher model, the first sample RAW image and the second sample RAW image to obtain the image noise reduction model.

4. The method of claim 3, wherein said performing distillation training on said student model based on said teacher model, said first sample RAW image, and said second sample RAW image to obtain said image noise reduction model comprises:

determining a prediction loss of the student model according to the first sample RAW image and the second sample RAW image;

determining feature matching loss of the student model and the teacher model according to the second sample RAW image;

and adjusting network parameters of the student model according to the distillation loss to obtain the image noise reduction model.

5. The method of claim 4, wherein said determining a prediction loss of the student model from the first sample RAW image and the second sample RAW image comprises:

performing noise reduction processing on the second sample RAW image through the student model to obtain a sample noise reduction RAW image;

and determining the prediction loss of the student model according to the sample noise reduction RAW image and the first sample RAW image.

6. The method of claim 5, wherein the student model includes a first encoding network, a first decoding network, and a first convolution output network, and the denoising the second sample RAW image by the student model to obtain a sample denoised RAW image comprises:

performing feature coding processing on the second sample RAW image through the first coding network to obtain a first feature map;

performing feature decoding processing on the first feature map through the first decoding network to obtain a first feature decoding map;

and performing convolution processing on the first feature decoding image through the first convolution output network, and then adding the convolution processed image and the second sample RAW image to obtain a sample noise-reduced RAW image.

7. The method of claim 6, wherein the first coding network includes 1 convolutional layer and N first coding modules, N is a positive integer greater than 1, the first coding modules include a first downsampling sub-module and a first convolution sub-module, and the feature coding processing on the second sample RAW image through the first coding network to obtain a first feature map includes:

performing convolution processing on the second sample RAW image through the convolution layer to obtain an initial characteristic diagram;

and performing downsampling processing and convolution processing on the initial feature map sequentially through the first downsampling submodule and the first convolution submodule in the N first coding modules to obtain a first feature map.

8. The method of claim 7, wherein the first decoding network comprises N first decoding modules and N depth separable convolutional layers, the first decoding modules comprise a second convolutional submodule and a first upsampling submodule, and the feature decoding processing of the first feature map by the first decoding network to obtain a first feature decoded map comprises:

performing convolution processing and up-sampling processing on the first feature map through the second convolution submodule and the first up-sampling submodule in the 1 st first decoding module to obtain a 1 st first up-sampling feature map;

convolution processing is carried out on the process characteristic diagram output by the (N-i) th first coding module through an i-th depth separable convolution layer, then addition processing is carried out on the process characteristic diagram and the first up-sampling characteristic diagram of the i-th first decoding module to obtain an i-th first sum value characteristic diagram, convolution processing and up-sampling processing are carried out on the i-th first sum value characteristic diagram through the second convolution submodule and the first up-sampling submodule in the (i + 1) th first decoding module to obtain an i + 1-th first up-sampling characteristic diagram, and i belongs to [1, N-1];

and acquiring an Nth first up-sampling feature map, performing convolution processing on the initial feature map through an Nth depth separable convolution layer, and then adding the initial feature map and the Nth first up-sampling feature map to obtain a first feature decoding map.

9. The method of any of claims 6-8, wherein the teacher model includes a second encoding network, a second decoding network, and a second convolutional output network, and wherein determining the loss of feature matching for the student model and the teacher model from the second sample RAW image comprises:

performing feature coding processing on the second sample RAW image through the second coding network to obtain a second feature map;

performing feature decoding processing on the second feature map through the second decoding network to obtain a second feature decoding map;

acquiring a first difference of the second coding network and the first coding network in coding processing of the second sample RAW image;

acquiring a second difference between the feature decoding processing of the second feature map by the second decoding network and the feature decoding processing of the first feature map by the first decoding network;

determining feature matching losses of the student model and the teacher model according to the first difference and the second difference.

10. The method of claim 9, wherein the second coding network includes M second coding modules and 1 third convolution sub-modules, M being a positive integer greater than 1, the second coding modules include a fourth convolution sub-module and a pooling sub-module, and the feature coding processing on the second sample RAW image through the second coding network to obtain a second feature map includes:

performing convolution processing and pooling processing on the second sample RAW image sequentially through the fourth convolution sub-module and the pooling sub-module in the M second coding modules to obtain an intermediate feature map;

and performing convolution processing on the intermediate characteristic diagram through the third convolution submodule to obtain a second characteristic diagram.

11. The method of claim 10, wherein the second decoding network includes M second decoding modules, the second decoding modules include a second upsampling sub-module and a fifth convolution sub-module, and the performing the feature decoding process on the feature map through the second decoding network to obtain a second feature decoded map includes:

performing upsampling processing on the second feature map through the second upsampling submodule in the 1 st second decoding module to obtain a 1 st second upsampling feature map, adding the 1 st second upsampling feature map and the process feature map output by the Nth second encoding module to obtain a 1 st second sum feature map, and performing convolution processing on the 1 st second sum feature map through a fifth convolution submodule of the 1 st second decoding module to obtain a 1 st convolution feature map;

12. An image capturing apparatus, characterized by comprising:

13. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the steps in the image capturing method according to any one of claims 1 to 11.

14. An electronic device, characterized in that the electronic device comprises a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the image capturing method according to any one of claims 1 to 11 when executing the computer program.