CN115359105A

CN115359105A - Depth-of-field extended image generation method, depth-of-field extended image generation device, and storage medium

Info

Publication number: CN115359105A
Application number: CN202210917940.1A
Authority: CN
Inventors: 杨建权; 杨永兴; 周茂森; 吴日辉
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-08-01
Filing date: 2022-08-01
Publication date: 2022-11-18
Anticipated expiration: 2042-08-01
Also published as: CN115359105B

Abstract

The application provides a depth-of-field extended image generation method, equipment and a storage medium. According to the method, the left pixel image and the right pixel image are obtained by utilizing the existing double-pixel sensor in the camera of the electronic equipment, and then the left pixel image and the right pixel image are processed according to the depth-of-field expansion model obtained through pre-training to obtain the depth-of-field expansion image, so that the image definition and the image detail display effect are ensured, the photo effect is greatly improved, and the photographing experience of a user is further improved.

Description

Depth-of-field extended image generation method, depth-of-field extended image generation device, and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a storage medium for generating a depth-of-field extended image.

Background

At present, the photographing function of electronic devices such as mobile phones is more and more perfect, and the user experience requirement for photographing is also more and more high. However, because the aperture of the camera used by the electronic device such as the mobile phone is fixed, that is, the depth of field range of the optical imaging model used by the electronic device such as the mobile phone is fixed and limited at present, when the electronic device is used for taking a picture, an image formed on an imaging plane becomes blurred visible to the naked eye for an object beyond the depth of field range, which results in poor visual experience of the user.

Therefore, how to extend the depth of field of a photo shot by the electronic device and improve the photo effect to shoot a photo satisfying a user and improve the shooting experience of the user is a technical problem to be solved at present.

Disclosure of Invention

In order to solve the above technical problem, the present application provides a depth-of-field extended image generation method, device, and storage medium. According to the method, the left pixel image and the right pixel image are obtained by utilizing the existing double-pixel sensor in the camera of the electronic equipment, and then the left pixel image and the right pixel image are processed according to the depth-of-field expansion model obtained through pre-training to obtain the depth-of-field expansion image, so that the image definition and the image detail display effect are ensured, the photo effect is greatly improved, and the photographing experience of a user is further improved.

In a first aspect, the present application provides a method for generating extended depth of field images. The method is applied to first electronic equipment, a camera of the first electronic equipment integrates a double-pixel sensor, and the method comprises the following steps: acquiring a left pixel image and a right pixel image of a target object by using a double-pixel sensor, wherein the left pixel image and the right pixel image are both RAW images; and performing depth of field processing on the left image data and the right image data based on the depth of field extended model to obtain a target depth of field extended image corresponding to the target object.

The first electronic device is, for example, a mobile phone, a tablet computer, or the like.

Therefore, the left pixel image and the right pixel image are obtained by utilizing the existing double-pixel sensor in the camera of the electronic equipment, and then the left pixel image and the right pixel image are processed according to the depth-of-field extended model obtained by pre-training.

According to the first aspect, before acquiring the left pixel map and the right pixel map of the target object using the dual pixel sensor, the method further comprises: constructing a data set required by training a depth-of-field extended model, wherein the data set comprises a plurality of data groups, and each data group at least comprises a sample left pixel map and a sample right pixel map; taking a sample left pixel image and a sample right pixel image in each data set in the data set as input images, and inputting a network model to perform iterative training until the set requirements are met; and taking the network model meeting the set requirement as a depth of field extension model.

According to the first aspect, or any implementation manner of the first aspect, the constructing the dataset required for training the depth-of-field extension model includes: calibrating fuzzy kernels on the double-pixel sensor at different distances to obtain corresponding relations between information at different depths and different fuzzy kernels; acquiring a sample color mode RGB image and a sample depth image corresponding to the sample RGB image from a public data set, wherein the sample RGB image and the sample depth image in the public data set are provided by a second electronic device with a multi-camera module, and the sample RGB image is an RGB image shot in a focal scene; degrading each acquired sample RGB image into a sample RAW image according to an imaging channel corresponding to the double-pixel sensor; for each sample RAW image, selecting a fuzzy core corresponding to the sample RAW image from the corresponding relation according to the corresponding sample depth image; for each sample RAW image, processing the sample RAW image according to the selected fuzzy core to obtain a sample left pixel image and a sample right pixel image corresponding to the sample RAW image, and taking the sample left pixel image, the sample right pixel image and the sample RGB image as a data group; and summarizing the data group corresponding to each sample RGB image to obtain a data set required by constructing a training depth of field extended model.

The second electronic device providing the sample RGB image and the sample depth image is, for example, a mobile phone, a tablet computer, or the like.

The multi-camera module can be, for example, a dual-camera module in some implementations; in other embodiments, for example, there may be three camera modules, or even more, which is not illustrated here, and the application is not limited thereto. Therefore, the depth image can be obtained through the multiple camera modules, and the accuracy of the depth information in the depth image is ensured.

In addition, in other implementations, the second electronic device that provides the sample RGB image and the depth image may also be an electronic device that integrates a D-RGB camera. Thus, the sample RGB image and the depth image can be shot directly.

In addition, the depth of field supported by the camera module of the second electronic device providing the sample RGB image and the sample depth image is deep, that is, the sample RGB image and the depth image photographed by the camera module of the second electronic device are both obtained in a focus scene, and the contents in the sample RGB image and the depth image are clear and visible, so that the RAW image degenerated based on the sample RGB image in the public data set is ensured, and then the obtained left pixel image and the right pixel image can train a depth of field extension model of the accuracy draft.

In this way, a data set required by training is constructed by using RBG images and depth images in a public data set generated by an electronic device side such as a mobile phone as raw data in a mode of degradation and calibration of a fuzzy kernel, and then iterative training is performed on the basis of the constructed data set to obtain the RBG image and depth images. The degradation processing flow and the process of calibrating the fuzzy core are completed by the electronic equipment, so that the manufactured data set can be used by the electronic equipment. Therefore, the finally obtained depth of field extension model can be better suitable for the actual use scene of electronic equipment such as a mobile phone.

In addition, by utilizing the phase information contained in the left pixel image and the right pixel image corresponding to the DP sensor and the implicit scene depth, the depth-of-field acquired based on the training of the left pixel image and the right pixel image can capture fuzzy kernels of different depths, and further the fuzzy kernel prior processing is realized, so that the stability of the depth-of-field extended image subjected to the depth-of-field extension by the depth-of-field extended model is ensured, and the defects of the existing scheme are effectively overcome.

The details of the specific implementation of the training for obtaining the depth-of-field extended model with respect to the construction of the data set are described below, and are not described herein again.

According to the first aspect or any one of the foregoing implementation manners of the first aspect, the blur kernel on the dual-pixel sensor includes a left blur kernel, a right blur kernel, and a combined blur kernel, where the combined blur kernel is a blur kernel corresponding to a RAW image generated by combining a left pixel map and a right pixel map, the left blur kernel is a blur kernel corresponding to the left pixel map in the RAW image, and the right blur kernel is a blur kernel corresponding to the right pixel map in the RAW image; degrading each acquired sample RGB image to a sample RAW image according to an imaging path corresponding to the double-pixel sensor, wherein the degrading process comprises the following steps: for each sample RGB image, selecting a corresponding joint fuzzy core from the corresponding relation according to the corresponding sample depth image; degrading each obtained sample RGB image to a sample RAW image according to an imaging channel corresponding to the double-pixel sensor and a joint fuzzy core corresponding to each sample RGB; for each sample RAW image, selecting a blur kernel corresponding to the sample RAW image from the corresponding relationship according to the corresponding sample depth image, including: for each sample RAW image, selecting a left fuzzy kernel and a right fuzzy kernel corresponding to the sample RAW image from the corresponding relation according to the corresponding sample depth image; for each sample RAW image, processing the sample RAW image according to the selected fuzzy core to obtain a sample left pixel map and a sample right pixel map corresponding to the sample RAW image, wherein the processing comprises: and for each sample RAW image, processing the sample RAW image according to the selected left blur to obtain a sample left pixel image corresponding to the sample RAW image, and processing the sample RAW image according to the selected right blur to obtain a sample right pixel image corresponding to the sample RAW image.

According to the first aspect, or any one of the above implementation manners of the first aspect, the loss condition of the predicted depth-of-field extended image output by the network model, which satisfies the setting requirement, satisfies the set loss value, and the predicted depth-of-field image is obtained by predicting the network model based on the sample left pixel map and the sample right pixel map.

With respect to the predicted extended depth image, for example, the extended depth image output by the extended depth model, hereinafter, the loss thereof is, for example, L, hereinafter _EDOF 。

According to the first aspect, or any implementation manner of the first aspect, the loss condition of the predicted extended depth-of-field image is determined according to the predicted extended depth-of-field image, the gradient in the X direction, and the gradient in the Y direction of the sample RGB image. For a specific implementation of determining the loss of the predicted depth-of-field extended image, see the following formulas (2) to (4).

According to the first aspect, or any implementation manner of the first aspect, each data set further includes a sample depth image; the method further comprises the following steps: for each sample RGB image, the sample left pixel map, the sample right pixel map, the sample RGB image, and the sample depth image corresponding to the sample RGB image are taken as one data group. In this way, the depth-of-field extended model obtained based on the training of the data set of the type can output not only the depth-of-field extended image but also the depth image.

According to the first aspect, or any one of the above implementation manners of the first aspect, the loss condition of the predicted depth-of-field extended image and the loss condition of the predicted depth image, which satisfy the setting requirement, output by the network model satisfy the set loss value, and both the predicted depth-of-field image and the predicted depth image are obtained by predicting based on the sample left pixel map and the sample right pixel map by the network model.

With respect to the predicted depth image, for example, the depth image outputted by the depth extension model, hereinafter, the loss thereof is, for example, L, hereinafter described _Depth 。

According to the first aspect, or any implementation manner of the first aspect, the loss condition of the predicted extended depth of field image is determined according to the predicted extended depth of field image, the gradient in the X direction, and the gradient in the Y direction of the sample RGB image, and the loss condition of the predicted depth image is determined according to the sample depth image and the predicted depth image. Regarding the specific implementation manner of determining the loss condition of the predicted depth extended image, see the following formulas (2) to (4), and the specific implementation manner of determining the loss condition of the predicted depth image, see the following formula (6).

According to the first aspect, or any implementation manner of the first aspect above, the depth-of-field extension model is a convolutional neural network model.

According to the first aspect or any one of the above implementation manners of the first aspect, in the convolutional neural network model, convolutional layers located at input sources and convolutional layers located at output sources implement a jump connection, maximum pooling is adopted between the convolutional layers of the input sources, and upper convolutional processing is adopted between the convolutional layers of the output sources.

For the specific structure of the convolutional neural network, reference is made below, and details are not described here.

According to the first aspect, or any implementation manner of the first aspect, the target depth-of-field extended image is an RGB image.

In a second aspect, the present application provides an electronic device. The electronic device includes: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the instructions of the first aspect or any possible implementation of the first aspect.

Any one implementation manner of the second aspect and the second aspect corresponds to any one implementation manner of the first aspect and the first aspect, respectively. For technical effects corresponding to any one of the implementation manners of the second aspect and the second aspect, reference may be made to the technical effects corresponding to any one of the implementation manners of the first aspect and the first aspect, and details are not described here.

In a third aspect, the present application provides a computer readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation manner of the first aspect.

Any one implementation manner of the third aspect corresponds to any one implementation manner of the first aspect. For technical effects corresponding to any one implementation manner of the third aspect and the third aspect, reference may be made to the technical effects corresponding to any one implementation manner of the first aspect and the first aspect, and details are not repeated here.

In a fourth aspect, the present application provides a computer program comprising instructions for carrying out the method of the first aspect or any possible implementation manner of the first aspect.

Any one implementation manner of the fourth aspect and the fourth aspect corresponds to any one implementation manner of the first aspect and the first aspect, respectively. For technical effects corresponding to any one implementation manner of the fourth aspect and the fourth aspect, reference may be made to the technical effects corresponding to any one implementation manner of the first aspect and the first aspect, and details are not repeated here.

In a fifth aspect, the present application provides a chip, which includes a processing circuit and a transceiver pin. Wherein the transceiver pin and the processing circuit are in communication with each other via an internal connection path, and the processing circuit is configured to perform the method of the first aspect or any one of the possible implementations of the first aspect to control the receiving pin to receive signals and to control the sending pin to send signals.

Any one implementation manner of the fifth aspect and the fifth aspect corresponds to any one implementation manner of the first aspect and the first aspect, respectively. For technical effects corresponding to any one of the implementation manners of the fifth aspect and the fifth aspect, reference may be made to the technical effects corresponding to any one of the implementation manners of the first aspect and the first aspect, and details are not repeated here.

Drawings

FIGS. 1a and 1b are schematic diagrams illustrating an application scenario;

FIG. 2a shows an exemplary embodiment of a shooting scene;

fig. 2b is a schematic diagram illustrating depth-of-field expansion of an image area in front of an object plane based on the depth-of-field expansion image generation method provided in the embodiment of the present application;

fig. 2c is a schematic diagram illustrating depth-of-field expansion of an image area behind an object plane based on the depth-of-field expansion image generation method provided in the embodiment of the present application;

fig. 2d is a schematic diagram exemplarily illustrating depth-of-field expansion of the shooting scene shown in fig. 2a based on the depth-of-field expansion image generation method provided in the embodiment of the present application;

fig. 3 is a schematic diagram of a hardware structure of an exemplary electronic device;

FIG. 4 is a schematic diagram of a software architecture of an exemplary illustrated electronic device;

FIG. 5a is a schematic diagram illustrating the imaging of a light spot on an object plane in an image plane in a focus scene;

FIG. 5b is a schematic diagram illustrating the imaging of the light points at the object plane in an out-of-focus scene at the image plane;

FIG. 5c is a schematic diagram illustrating the imaging of the light points at the object plane at the image plane under another out-of-focus scene;

FIG. 6a is a diagram illustrating an exemplary processing of a left pixel map and a right pixel map via a depth-extended model;

FIG. 6b is a diagram illustrating another exemplary processing of the left and right pixel maps via a depth-extended model;

fig. 7 is a schematic diagram illustrating a longitudinal interaction of functional modules involved in implementing a depth-of-field extended image generation method provided by an embodiment of the present application;

fig. 8 is a schematic diagram illustrating lateral interaction of functional modules involved in implementing the extended depth of field image generation method according to the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.

The terms "first" and "second," and the like, in the description and in the claims of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first target object and the second target object, etc. are specific sequences for distinguishing different target objects, rather than describing target objects.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the description of the embodiments of the present application, the meaning of "a plurality" means two or more unless otherwise specified. For example, a plurality of processing units refers to two or more processing units; a plurality of systems refers to two or more systems.

The shooting scene is one of common application scenes of electronic equipment such as a mobile phone. With the higher and higher requirements of the user on the mobile phone photographing experience, the mobile phone photographing function is more and more perfect. The following explains the photographing function of the electronic device by taking a mobile phone as an example.

FIG. 1a illustrates an exemplary cell phone interface. Referring to fig. 1a, icons of a plurality of application programs, such as an icon 101 of a camera application, an icon 102 of a gallery application, and the like, are displayed on the mobile phone interface. In some embodiments, the cell phone interface shown in FIG. 1a may be referred to as a home interface. When a user clicks the camera application icon 101 in the interface, a shooting function can be realized by using the camera application; when the user clicks the gallery application icon 102 in the interface, the gallery application can be used to view pictures (or photos), videos, etc. stored in the cell phone.

Continuing to refer to fig. 1a, for example, when a user clicks an icon 101 of a camera application, the mobile phone, in response to the user operation, recognizes that a control corresponding to the user click operation is a control of the camera application, and then invokes a corresponding interface in the application framework layer to start the camera application, and starts a camera driver by invoking the kernel layer, and acquires an image by using a camera. At this time, the cell phone displays an interface of a camera application, such as the cell phone interface shown in fig. 1 b.

With the improvement of the photographing function of the mobile phone, more and more photographing modes are supported by the camera application. Illustratively, the shooting modes include at least an aperture mode, a night view mode, a portrait mode, a photographing mode, a recording mode, a professional mode, and the like, and may be referred to as mode options displayed in the shooting mode list 103 in fig. 1 b. And when the user clicks an icon option corresponding to a certain shooting mode, the mobile phone displays a camera application interface in the corresponding shooting mode. For example, if the user clicks the icon option 1031 of the photographing mode, the mobile phone displays an interface when the camera application adopts the photographing mode, as shown in fig. 1 b.

It should be noted that, in some embodiments, after the mobile phone starts the camera application, the camera application starts the photographing mode by default, that is, the photographing mode is the default photographing mode of the camera application.

With continued reference to FIG. 1b, for example, one or more controls can be included in the display interface corresponding to the photography mode. Included controls include, but are not limited to: a shooting mode list 103, a preview window 104, a function option list 105, a shutter control 106, a front and rear camera switching control 107, and a picture viewing control 108.

The preview window 104 can display the image captured by the camera in real time. The shutter control 106 may monitor a user operation for triggering photographing, that is, when the mobile phone detects the user operation acting on the shutter control 106, photographing may be performed in response to the user operation, and an image obtained by photographing may be stored in the gallery application.

For example, in some implementations, when a user wants to view a captured image, the gallery application can be opened by clicking on control 108 to view the captured image.

For example, in other implementations, when the user wants to view the captured image, the user may open the gallery application by clicking on the icon 102 of the gallery application in the interface shown in fig. 1a, so as to view the captured image.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.

Continuing to refer to fig. 1b, illustratively, shown in the function option list 105 are function options supported in the shooting mode, such as a flash function, an AI (Artificial Intelligence) function, a setting function, and the like.

Continuing with fig. 1b, for example, when the user clicks on the shutter control 106, the mobile phone detects a user operation on the shutter control 106, and takes a picture in response to the user operation.

In some existing implementation manners, in order to ensure the visual effect of a shot image, the acquisition of blurred image data and clear image data is generally realized in advance by changing the aperture size of a single lens reflex, and then a depth-of-field extension model is trained according to the blurred image data corresponding to different aperture sizes and the clear image data of a clear image; and then, presetting a depth of field extension model obtained based on image data (blurred image data and sharp image data) training shot by a single lens reflex camera into the mobile phone. Therefore, in the process of shooting images by the mobile phone, an image sensor in the camera can collect an original RAW domain image (hereinafter referred to as a RAW image), the RAW image is subjected to depth of field expansion by the depth of field expansion model and then converted into a color mode (Red Green Blue, RGB) image, and the image can be displayed on a display screen. Here, the RAW image may refer to an image in Bayer (Bayer) format, and the RAW field image may also be referred to as a RAW image in Bayer format.

However, there are many problems in the method (the depth-of-field extension method) in which a depth-of-field extension model is trained from blurred image data and sharp image data captured by a single lens reflex camera at different aperture sizes and then transplanted into a mobile phone.

For example, there is a domain difference between image data captured by a single lens reflex and image data captured by a mobile phone, which results in that a depth of field extension model obtained based on training of the image data captured by the single lens reflex is likely to be unusable when applied to the mobile phone, and thus depth of field extension processing cannot be realized.

For example, since the captured image data is limited and a single image is difficult to capture accurate depth information, the depth information of the image is not considered in the depth extension model obtained by training based on the image data captured by the single lens reflex camera, and the corresponding blur kernel is fixed, that is, the blur kernels used by the RAW images input into the depth extension model for depth extension processing are the same, so that the depth extension effect is unstable, and the final depth extension image effect is poor.

For example, when the single lens reflex is used for acquiring image data, a user needs to manually adjust the aperture, which not only wastes time and labor, but also causes influence on the position and posture of the single lens reflex due to adjustment of the aperture, thereby causing that the image data of the same object can not be aligned.

Therefore, how to extend the depth of field of a photo shot by the electronic device and improve the photo effect so as to shoot the photo satisfying the user and improve the shooting experience of the user is a technical problem to be solved at present.

In order to solve the above problem, an embodiment of the present application provides a depth-of-field extended image generation method. According to the method, the left pixel image and the right pixel image are obtained by utilizing the existing double-pixel sensor in the camera of the electronic equipment, and then the left pixel image and the right pixel image are processed according to the depth of field expansion model obtained by pre-training to obtain the depth of field expansion image, so that the image definition and the image detail display effect are ensured, the photo effect is greatly improved, and the photographing experience of a user is further improved.

Fig. 2a shows an exemplary shooting scene. In this shooting scene, there are regions beyond the depth of field, such as region 201 (object light-emitting point in front of the object plane) and region 202 (object light-emitting point behind the object plane) in fig. 2 a. If the processed image is still as shown in fig. 2a based on the existing depth-of-field extension mode, but based on the depth-of-field extension image generation method provided in the embodiment of the present application, the left pixel image and the right pixel image in the RAW image corresponding to the region 201 shown in (1) in fig. 2b are extracted, and the depth-of-field extension model constructed in the embodiment is used for processing, so that the region 201 'shown in (2) in fig. 2b can be obtained, and it is not difficult to find that in the region 201' processed by the depth-of-field extension model constructed in the embodiment, the image definition and the image detail display effect are significantly improved.

Accordingly, based on the depth-of-field extended image generation method provided by the embodiment of the present application, the left pixel map and the right pixel map in the RAW image corresponding to the region 202 shown in (1) in fig. 2c are extracted, and after the depth-of-field extended model constructed in the embodiment is used for processing, the region 202 'shown in (2) in fig. 2c can be obtained, and it is not difficult to find that in the region 202' processed by the depth-of-field extended model constructed in the embodiment, the image definition and the image detail display effect are significantly improved.

Thus, after the depth-of-field expansion processing is performed on the region 201 and the region 202 in the above manner, the image shown in fig. 2a (the image shown in (1) in fig. 2 d) becomes the image shown in (2) in fig. 2 d.

It should be noted that fig. 2a to 2d are both shown in the form of gray scale images, and comparing the original image of the gray scale image shown in (1) in fig. 2d with the original image of the gray scale image shown in (2) in fig. 2d, the original image of the gray scale image shown in (2) in fig. 2d is better in the definition of the entire screen and the image detail display effect than the original image of the gray scale image shown in (1) in fig. 2 d.

That is to say, based on the extended depth of field image generation method provided in this embodiment, the generated extended depth of field image has better definition, display details, and color display effect.

In order to better understand the technical solutions provided by the embodiments of the present application, the following describes the hardware structure and the software structure of the electronic device with reference to fig. 3 and 4.

Referring to fig. 3, the electronic device 100 may include: the mobile terminal includes a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like.

For example, in some implementations, the sensor module 180 may include a pressure sensor, a gyroscope sensor, a barometric pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc., which are not limited herein.

Furthermore, it should be noted that the processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

It is understood that the controller can be a neural center and a command center of the electronic device 100. In practical application, the controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

It should be noted that the processor 110 may also be provided with a memory for storing instructions and data. In some implementations, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

For example, in some implementations, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

Continuing with fig. 3, the charge management module 140 is illustratively configured to receive a charging input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging implementations, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging implementations, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

Continuing to refer to fig. 3, the power management module 141 is illustratively coupled to the battery 142, the charge management module 140, and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other implementations, the power management module 141 may also be disposed in the processor 110. In other implementations, the power management module 141 and the charging management module 140 may be disposed in the same device.

Continuing to refer to fig. 3, for example, the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

It should be noted that the antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other implementations, the antenna may be used in conjunction with a tuning switch.

Continuing to refer to fig. 3, for example, the mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some implementations, at least some of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some implementations, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

Further, it should be noted that the modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some implementations, the modem processor may be a stand-alone device. In other implementations, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

With continued reference to fig. 3, the wireless communication module 160 may provide wireless communication solutions including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), short-range wireless communication (NFC), infrared (IR), and the like, which are applied to the electronic device 100. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

Specifically, in the technical solution provided in the embodiment of the present application, the electronic device 100 may communicate with a cloud server or other servers through the mobile communication module 150 or the wireless communication module 160.

For example, in some implementations, the cloud server or other servers described above are, for example, servers for providing the depth-of-field extension model required by the present embodiment. For the scene obtained by the server training of the depth-of-field extended model, the electronic device 100 may send a request for obtaining the model to the cloud server through the mobile communication module 150, and then obtain the depth-of-field extended model obtained by the server training locally, so that images obtained by subsequent shooting with the camera 193 can implement the depth-of-field extended image generation method provided by the present application based on the depth-of-field extended model. The cloud may be a server cluster consisting of a plurality of servers.

For example, in other implementation manners, the depth-of-field extension model used in this embodiment may also be preset in the electronic device 100 directly before the electronic device 100 leaves the factory, or obtained by training the electronic device 100 after the electronic device leaves the factory.

It should be understood that the above description is only an example for better understanding of the technical solutions of the present application, and is not intended as the only limitation of the present application. For convenience of description, the depth-of-field extension model is preset in the electronic device 100 in advance, and then autonomous learning is completed according to the use of the user.

In addition, it should be noted that training logics of the depth-of-field extension model required by the server for training the embodiment and the depth-of-field extension model required by the electronic device 100 for training the embodiment are the same, and a specific training process will be described in detail in the following embodiments, and will not be described again here.

In addition, it should be noted that the electronic device 100 implements the display function through the GPU, the display screen 194, the application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

Continuing to refer to FIG. 3, a display screen 194 is illustratively used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some implementations, the electronic device 100 can include 1 or N display screens 194, N being a positive integer greater than 1.

In addition, it should be noted that the electronic device 100 may implement the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.

In addition, it should be noted that the ISP is used to process data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some implementations, the ISP may be provided in camera 193.

Further, it is noted that the camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some implementations, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

In the technical solution provided in the embodiment of the present application, the camera 193 is specifically a camera integrated with a dual pixel sensor (DP sensor). Thus, the RAW image shot by the camera can be processed by the ISP to obtain a left pixel image and a right pixel image.

It should be noted that the digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to perform fourier transform or the like on the frequency bin energy.

In addition, it should be noted that the video codec is used for compressing or decompressing digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

Continuing to refer to fig. 3, for example, the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

Continuing to refer to fig. 3, internal memory 121 may illustratively be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

Specifically, in the technical solution provided in the embodiment of the present application, the depth of field extension model used in the above-mentioned embodiment may be extracted and stored in the internal memory 121 of the terminal device, so as to facilitate the depth of field extension.

In addition, it should be noted that the electronic device 100 can implement an audio function through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.

It should be noted that the audio module 170 is used for converting digital audio information into analog audio signals and outputting the analog audio signals, and is also used for converting analog audio inputs into digital audio signals. The audio module 170 may also be used to encode and decode audio signals. In some implementations, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

Continuing to refer to fig. 3, exemplary keys 190 include a power on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be touch keys. The electronic apparatus 100 may receive a key input, and generate a key signal input related to user setting and function control of the electronic apparatus 100.

Continuing with fig. 3, illustratively, the motor 191 may generate a vibratory cue. The motor 191 may be used for incoming call vibration prompts as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Continuing to refer to fig. 3, for example, indicator 192 may be an indicator light that may be used to indicate a charge status, a charge change, or a message, missed call, notification, etc.

While the description is made with respect to the hardware configuration of the electronic device 100, it should be understood that the electronic device 100 shown in FIG. 3 is merely an example, and in particular implementations, the electronic device 100 may have more or fewer components than shown, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 3 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

In order to better understand the software structure of the electronic device 100 shown in fig. 3, the following description is made of the software structure of the electronic device 100. Before explaining the software structure of the electronic device 100, first, an architecture that can be adopted by the software system of the electronic device 100 will be explained.

Specifically, in practical applications, the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture.

Furthermore, it is understood that software systems currently used in mainstream electronic devices include, but are not limited to, windows systems, android systems, and iOS systems. For convenience of description, in the embodiment of the present application, a software structure of the electronic device 100 is exemplarily described by taking an Android system with a layered architecture as an example.

In addition, the depth-of-field extended image generation method provided in the embodiments of the present application is also applicable to other systems in specific implementations.

Referring to fig. 4, a block diagram of a software structure of the electronic device 100 according to the embodiment of the present application is shown.

As shown in fig. 4, the layered architecture of the electronic device 100 divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some implementations, the Android system is divided into five layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, a Hardware Abstraction Layer (HAL), and a kernel layer from top to bottom.

Wherein the application layer may include a series of application packages. As shown in fig. 4, the application package may include applications such as camera, settings, map, WLAN, bluetooth, gallery, music, etc., which are not listed here, but the present application is not limited thereto.

The application framework layer provides an Application Programming Interface (API) and a programming framework for an application of the application layer. In some implementations, these programming interfaces and programming frameworks can be described as functions. As shown in fig. 4, the application framework layer may include functions such as a depth extension module, a camera service, a view system, a content provider, a window manager, and a resource manager, which are not limited herein.

Illustratively, in this embodiment, the camera service is configured to invoke a camera (including a front camera and/or a rear camera) in response to a request from an application.

Illustratively, in this embodiment, the depth-of-field extension module is configured to perform depth-of-field extension on a left pixel map and a right pixel map provided by a camera service, so as to obtain a depth-of-field extended image.

The interaction between the camera service and the depth-of-field extended module, and the interaction process between the camera service and other functional modules involved in implementing the depth-of-field extended image generation method provided in this embodiment will be described in detail in the following embodiments, and details are not described here.

In addition, it should be understood that the above division of each functional module is only an example for better understanding of the technical solution of the present embodiment, and is not a sole limitation to the present embodiment. In practical applications, the functions described above may also be implemented by being integrated into one functional module, which is not limited in this embodiment.

In addition, in practical applications, the functional modules may also be represented as services and frames, such as a depth of view extension frame/depth of view extension service, which is not limited in this embodiment.

In addition, it should be noted that the window manager located in the application framework layer is used for managing the window program. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

In addition, it should be noted that the content provider located in the application framework layer is used to store and acquire data and make the data accessible to the application. The data may include video, images, audio, telephone calls made and received, browsing history and bookmarks, phone books, etc., which are not listed herein, but are not limited in this application.

In addition, it should be further noted that the view system located in the application framework layer includes visual controls, such as a control for displaying text, a control for displaying pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

In addition, it should be further noted that the resource manager located in the application framework layer provides various resources for the application, such as localized character strings, icons, pictures, layout files, video files, and the like, which are not listed here, and the application is not limited thereto.

The Android Runtime comprises a core library and a virtual machine. The Android Runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

It can be understood that the 2D graphics engine mentioned above is a drawing engine for 2D drawing.

The HAL layer is an interface layer between the operating system kernel and the hardware circuitry. HAL layers include, but are not limited to: an Audio hardware abstraction layer (Audio HAL) and a Camera hardware abstraction layer (Camera HAL). Among them, the Audio HAL is used for processing the Audio stream, for example, performing noise reduction, directional enhancement, and the like on the Audio stream, and the Camera HAL is used for processing the image stream.

Specifically, in the technical solution provided in the embodiment of the present application, camera HAL of the HAL layer is needed for taking a picture with a Camera.

Furthermore, it is understood that the kernel layer in the Android system is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, a microphone driver, a Bluetooth driver, a sensor driver and the like. Exemplarily, the camera driver can be used for transmitting image data shot by the camera to the camera service through the camera hardware abstraction layer, so that the camera service can send the image data shot by the camera to the depth-of-field extension module for performing depth-of-field extension processing.

The software structure of the electronic device 100 is described here, and it is understood that the layers in the software structure shown in fig. 4 and the components included in each layer do not constitute a specific limitation to the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer layers than those shown, and may include more or fewer components in each layer, which is not limited in this application.

According to the description of the shooting scene, the depth of field extended image generation scheme provided by the application at least needs to complete the training of the depth of field extended model when a user uses the camera of the electronic device to shoot, so that the depth of field of the image shot by the camera can be extended. For convenience of explanation, the depth-of-field extended image generation scheme provided by the present application is divided into a depth-of-field extended model training phase and a depth-of-field extended image generation phase in the following.

In order to better understand the depth-of-field extended image generation scheme provided by the embodiment of the present application, the following specifically describes the technical scheme provided by the embodiment from two stages, namely, a depth-of-field extended model training stage and a depth-of-field extended image generation stage, with reference to the drawings.

1. Depth of field extension model training phase

Understandably, the depth-of-field extension model needs to be obtained by iteratively training a network model based on a specific data set. Thus, in some implementations, the depth-of-field extended model training phase may be divided into a dataset acquisition sub-phase and a network model training sub-phase.

1.1 dataset acquisition sub-phase

In the present embodiment, the data set acquisition sub-phase includes a degeneration process flow and a fuzzy kernel (PSF) calibration. The fuzzy kernel calibration specifically refers to calibrating fuzzy kernels on a DP sensor in a camera at different distances; the degradation process flow is to degrade the known RGB image into the RAW image.

Specifically, in this embodiment, in the process of calibrating the blur kernel, for example, a rectangular plane with 12 × 6 (long × high) light emitting points shown in fig. 5a to 5c is used as an object plane, then the electronic device is fixed on an object such as a bracket, which can ensure that the position and the posture of the camera of the electronic device are fixed, and the camera of the electronic device is used for taking pictures at different distances by moving the bracket for fixing the electronic device, so as to obtain the blur kernel on the DP sensor in the cameras at different distances.

It should be noted that, in this embodiment, at different distances, the blur kernel in the calibrated dual-pixel sensor includes a left blur kernel, a right blur kernel, and a joint blur kernel.

The joint blurring kernel refers to a blurring kernel corresponding to a RAW image generated by combining a left pixel image and a right pixel image, the left blurring kernel is a blurring kernel corresponding to the left pixel image in the RAW image, and the right blurring kernel is a blurring kernel corresponding to the right pixel image in the RAW image.

Illustratively, in some implementations, when the object plane is within the depth of field, i.e., in a focus scene, the corresponding image plane is as shown in fig. 5a, i.e., in the focus scene, the image on the image plane is also a light spot, except for the difference in the size of the light spot that may appear according to distance.

For example, in other implementations, when the object plane is out of the depth of field, such as an out-of-focus scene with the depth of field in front of the object plane or behind the object plane, the corresponding image plane is as shown in fig. 5b and fig. 5c, that is, in the out-of-focus scene, the image on the image plane is a light spot, and there is a difference in the size of the light spot presented according to the distance.

It should be understood that the above description is only an example for better understanding of the technical solutions of the present application, and is not intended as the only limitation of the present application. In practical application, objects with different specifications can be selected as object planes according to requirements, and in addition, the operation of calibrating the fuzzy core can be completed before the electronic equipment leaves a factory.

In addition, it is understood that in practical applications, one or more electronic devices of the same model and the same batch may be calibrated, and then the calibration result may be multiplexed to other electronic devices of the model and the batch.

With regard to the above-mentioned degradation processing flow, in the present embodiment, a large number of RGB images (images after depth-of-field extension processing, that is, depth-of-field extension images) and depth images corresponding to these RGB images are acquired from the public data sets, specifically, by using the data sets suitable for electronic devices, which have been disclosed so far, as a basis. The RGB image is then degraded to a RAW image based on the existing DP sensor imaging path.

Specifically, when the RGB image is degenerated into the RAW image, the calibrated joint blur kernel is needed, that is, the RGB image can be degenerated into the RAW image according to the imaging path of the joint blur kernel and the DP sensor.

In addition, factors such as image distortion and noise may exist in the rollback process, and in order to avoid these interferences as much as possible, details of implementation may be found in related technical documents, such as an adopted path, or an image noise reduction network, such as a cyclelsp network, which are not described herein again, and this embodiment also does not limit this.

Then, after the degraded RAW image, the depth image corresponding to the RGB image degraded into the RAW image is taken as an assignment, the depth information of each pixel point in the RAW image is determined, and then the depth of field, or distance, is determined according to the depth information, so that a blur kernel suitable for the RAW image, specifically the left blur kernel and the right blur kernel, can be selected from the blur kernels corresponding to the calibrated different distances according to the distance, and then each pixel point in the RAW image is multiplied by the left blur kernel of the corresponding depth to obtain a left pixel image corresponding to the pixel point, and the right pixel image corresponding to the pixel point is multiplied by the right blur kernel of the corresponding depth to obtain a right pixel image corresponding to the pixel point.

For example, the spots in the first row and the first column on the image plane in fig. 5b, are processed in the above manner, and the left pixel map and the right pixel map are shown in fig. 5 b; the spots in the first row and the first column on the image plane in fig. 5c are processed in the above-described manner, and the resulting left pixel map and right pixel map are shown in fig. 5 c.

It can be understood that each image in the public data set has a mark capable of identifying the uniqueness of the image, so that in order to distinguish the left pixel map and the right pixel map of the images, the left pixel map and the right pixel map obtained in the above manner need to be marked with corresponding marks, that is, the left pixel map or the right pixel map of which image can be distinguished.

Correspondingly, after the processing according to the above manner, the same RGB image, and the left pixel map and the right pixel map corresponding to the RGB image are used as a set of data, and after each RGB image is subjected to similar processing, a data set meeting the above requirements can be obtained, and further a data set required for training the depth-of-field extended model can be obtained.

1.2 network model training sub-stage

For example, in the present embodiment, the network model is taken as a convolutional neural network model, and the structure of the convolutional neural network model is as shown in fig. 6a and 6B, and specifically, the convolutional neural network model may include 7 groups of convolutional layers, such as A1, A2, B1, B2, C1, C2, and D, each layer of convolutional layer may employ 3*3 convolutional kernel, and the convolutional layer of the A1 group is connected to the convolutional layer of the A2 group, the convolutional layer of the B1 group is connected to the convolutional layer of the B2 group, the convolutional layer of the C1 group is connected to the convolutional layer of the C2 group by Skip-connections (Skip-connections), maximum pooling is employed between A1 and B1, between B1 and C1, between C1 and D, and C2, and between D and C2, and B2, and between B2 and A2, and up-convolutional processing are employed between D and C2.

Therefore, when the convolutional neural network model with the above structure is trained based on the data in the data set obtained in 1.1, specifically, the left pixel map and the right pixel map in each group of data are used as input parameters, and then the convolution processing is performed through A1, then the maximum pooling processing is performed, then the convolution processing is performed through B1, then the maximum pooling processing is performed, then the convolution processing is performed through C1, then the maximum pooling processing is performed, and then the convolution processing is performed through D. Meanwhile, when convolution processing is carried out through the A1, the B1 and the C1, processing results are transmitted to the corresponding A2, the B2 and the C3 so as to carry out residual calculation, and the problem of gradient dispersion and degradation caused by increase of the number of network layers is prevented.

Correspondingly, after the processing from A1 to D is completed, the data after the D convolution processing is subjected to convolution processing, then the C2 is subjected to convolution processing, then the convolution processing is performed, then the B2 is subjected to convolution processing, then the data is subjected to convolution processing, finally the A2 is subjected to convolution processing, and then an image data can be output, wherein the image data is a depth-of-field extended image after the depth-of-field extension is performed and is substantially an RGB image predicted according to the input left pixel image and the right pixel image.

Understandably, the training of the corresponding convolutional neural network model needs to be performed with iterative training for multiple times, that is, each training result needs to determine whether the current corresponding loss function meets the set requirement, if the set requirement is met, the training is ended, and the current convolutional neural network model is used as the depth of field extension model required for realizing the depth of field extension image generation method provided by the embodiment.

Understandably, since only the output depth-of-field extended image (hereinafter referred to as EDOF) is considered in the embodiment, the loss function substantially considers the loss condition of the depth-of-field extended image, that is, the loss function corresponding to the convolutional neural network model is determined according to the following formula (1):

L＝L _EDOF (1)

wherein L is a loss function corresponding to the product neural network model, L _EDOF Is the loss function corresponding to the output depth extended image.

Further, L _EDOF And can be determined according to the following formula (2):

of which EDOF _pred For tagging EDOF (i.e. exported EDOF) predicted by convolutional neural network models, EDOF _GT For the actual labeling of the RGB images in the data set corresponding to the left pixel map and the right pixel map of the input convolutional neural network model,

is the gradient in the x-direction,

is a gradient in the y direction, λ _x Is a constant coefficient, λ, corresponding to the gradient in the x-direction _y The two constant coefficients are constant coefficients corresponding to the gradient in the y direction, and can be adjusted according to needs. Namely, EDOF by linear minimum mean square error algorithm _pred And EDOF _GT Processing is performed, and then L is obtained by summing the processing result with the gradient in the x direction and the gradient in the y direction multiplied by the corresponding coefficient _EDOF 。

Further, the air conditioner is characterized in that,

and may be determined according to the following equation (3),

and can be determined according to the following formula (4)：

Wherein the content of the first and second substances,

identified for the predicted gradient in the x direction (determined from the outputted EDOF image based on the edge detection sobel operator),

identified for the actual gradient in the x-direction (determined from the RGB image based on the sobel operator),

identified for the predicted gradient in the y-direction (determined from the outputted EDOF image based on the sobel operator),

identified for the actual gradient in the y-direction (determined from the RGB image based on the sobel operator).

Therefore, after one group or multiple groups of left pixel map and right pixel map are input to obtain corresponding EDOF, whether preset requirements are met or not can be determined according to the formulas (1) to (4), if yes, training is stopped, and the current convolutional neural network model is used as a depth of field extension model required for realizing the scheme of the embodiment; otherwise, the iterative training is continued.

In addition, it should be noted that, in practical applications, if it is desired to obtain a depth image corresponding to an EDOF image according to the left image map and the right image map, in the constructed data set, each set of data also needs to include a depth image containing a corresponding mark.

Accordingly, for such a scenario, after the left pixel map and the right pixel map are input into the convolutional neural network model of the structure shown in fig. 6a or fig. 6b, the output result includes both the depth extended image (EDOF image) and the depth image, as shown in fig. 6 b. For such a scenario, the loss function needs to be determined according to the following equation (5) considering the loss function corresponding to the depth image and the EDOF image, i.e. the convolutional neural network model:

L＝L _EDOF +L _Depth (5)

wherein L is _Depth Is a loss function corresponding to the depth image, L is a loss function corresponding to the product neural network model, L _EDOF Is the loss function corresponding to the output depth extended image.

Further, L _Depth Again, this can be determined according to the following equation (6):

L _Depth ＝L _MSE (Depth _pred ，Depth _FT ) (6)

wherein, depth _pred Marking, depth, of the output Depth image, predicted by the convolutional neural network model _FT And the actual marks of the depth images in the data groups corresponding to the left pixel map and the right pixel map of the input convolutional neural network model.

Therefore, after one group or multiple groups of left pixel map and right pixel map are input to obtain corresponding EDOF and depth image, whether preset requirements are met currently or not can be determined according to the formulas (1) to (6), if yes, training is stopped, and the current convolutional neural network model is used as a depth-of-field extension model required for realizing the scheme of the embodiment; otherwise, the iterative training is continued.

As can be seen from the above description, the depth-of-field extended model constructed in this embodiment is obtained by constructing a data set required for training in a manner of degenerating and calibrating a blur kernel based on an RBG image and a depth image in a common data set generated on an electronic device side such as a mobile phone as original data, and then performing iterative training based on the constructed data set. The degradation processing flow and the process of calibrating the fuzzy core are completed by the electronic device, so that the manufactured data set can be used by the electronic device. Therefore, the finally obtained depth of field extension model can be better suitable for the actual use scene of electronic equipment such as a mobile phone.

In addition, as can be seen from the above description, the data set used by the depth of field extension model constructed in the present embodiment is completely different from the depth of field extension model in the existing depth of field extension scheme, so that the objects to be targeted are different when the two are subsequently subjected to depth of field extension processing. Specifically, in the current depth-of-field extension scheme, the used depth-of-field extension model is a RAW image generated by combining a left pixel map and a right pixel map, that is, an image input to the depth-of-field extension model is a combined RAW image; in this embodiment, the depth-of-field extension model used is a left pixel map and a right pixel map acquired by the dual-pixel sensor, that is, the image of the depth-of-field extension model is the left pixel map and the right pixel map.

Because the phase information contained in the left pixel map and the right pixel map corresponding to the DP sensor and the implicit scene depth are utilized, the depth-of-field acquired based on the training of the left pixel map and the right pixel map includes the model which can capture the fuzzy kernels of different depths, and further the prior processing of the fuzzy kernels is realized, so that the stability of the depth-of-field extended image after the depth-of-field extended image is subjected to the depth-of-field extension by the depth-of-field extended model is ensured, and the defects of the existing scheme are effectively overcome.

2. Extended depth of field image generation stage

Regarding the depth-of-field extended image generation phase, it is specifically involved when the user takes a picture through the camera application. Specifically, when a user takes a picture through a camera application, the user needs to call a camera, the process of calling the camera by the camera application can be divided into two parts, the first part is a creation process and can also be understood as a preparation process, and the creation process is mainly a process of creating corresponding instances by each module and interacting control information; the second part is a shooting process, namely, a process of processing images acquired by the camera by each module or an example in each module. It should be noted that "example" described in the embodiments of the present application may also be understood as program code or process code running in a process.

The following describes in detail the creation process and the shooting process of the camera application in the process of invoking the camera with reference to an interaction flow diagram of each module shown in fig. 7.

Referring to fig. 7, the creation process specifically includes:

and S101, calling a camera service by the camera application, and carrying out corresponding processing by the camera service.

For example, after the camera application starts (e.g., the process shown in fig. 1a and 1 b), the camera application calls the camera service, for example, the camera application sends a request message to the camera service, where the request message may include, but is not limited to: an application ID (which may be, for example, an application package name) of the camera application, a PID (Process Identification), configuration information of the camera application, and the like. For example, the configuration information may include a resolution (e.g., 1080 × 720p) corresponding to the image displayed by the camera application.

Optionally, the request message may not include the application ID, for example, the request message includes configuration information, and for example, the camera service may acquire the application ID and the PID of the application corresponding to the received request message through an interface with the application layer.

The corresponding processing performed with respect to the camera service may be, for example, creating a camera service instance, a camera device client instance, a camera device instance, a camera data stream instance, and the like.

The camera service instance is used for providing an API (application programming interface) for an application of which the application program layer can call the camera, and creating a corresponding session (session) based on a request of the application. Taking a camera application as an example, the camera service instance may receive a request (including an ID and a configuration 1 of the application) input by the camera application based on the API interface, and the camera service instance may create a corresponding session (for example, the identification information of the session is session 1) based on the request of the camera application, and output the ID, the configuration 1 of the application of the camera application and the identification information of the session (i.e., session 1) to the camera device client instance.

The example of the client of the camera device can be regarded as a client of the camera service, and is mainly used for providing an E interface (an interface between different mobile service switching centers controlling adjacent areas) for the camera service to perform data interaction with other modules. The camera device client instance saves the corresponding relation between the application ID and the session1, and outputs the application ID, the configuration 1 and the session1 of the camera application to the camera device instance.

Among other things, the camera device instance is used to provide an interface for the HAL layer, as well as transparent transmission of data (e.g., images). Specifically, the camera device instance records the corresponding relationship among the information based on the ID, configuration 1, and session1 of the application of the camera application input by the camera device client instance, and outputs the ID and session1 of the application to the camera data stream instance.

The camera data stream instance is used for carrying out corresponding processing on the image. Specifically, the camera data stream instance stores the ID of the application of the camera application input by the camera device instance and session1 in a corresponding manner.

For the interaction between the above-mentioned examples of the creation of the camera service, reference may be made to the existing API standard, and details thereof are not described here.

It should be understood that the above description is only an example for better understanding of the technical solutions of the present application, and is not intended as the only limitation of the present application. In practical applications, the depth-of-field extended image generation method provided by the embodiment may be adopted in addition to the camera application for taking a picture, and the depth-of-field extended image generation method provided by the embodiment may also be adopted in executing an operation of taking a picture by calling a camera integrated in other applications.

And S102, calling a camera hardware abstraction layer by the camera service, and carrying out corresponding processing by the camera hardware abstraction layer.

Namely, the camera hardware abstraction layer of the HAL layer is called by the camera device instance, and the photographing request triggered by the camera is transmitted to the camera hardware abstraction layer, so that the camera hardware abstraction layer can create a corresponding instance in response to the request.

And S103, calling a camera driver in the kernel layer by the camera hardware abstraction layer, and carrying out corresponding processing by the camera driver.

Understandably, the camera driver performs corresponding processing, for example, to establish a corresponding instance.

And S104, the corresponding camera is called by the camera drive.

Illustratively, the camera begins capturing images in response to invocation of the camera driver. It should be noted that, in the creating process, each instance or module in the camera hardware abstraction layer and the camera driver performs corresponding processing on data (for example, an image), and a specific processing process may refer to a technical scheme in an embodiment of the prior art, which is not described in detail herein.

With continued reference to fig. 7, the shooting process specifically includes:

s201, the camera outputs the collected image to the camera for driving.

It should be noted that, since the present embodiment is directed to an electronic device having a dual-pixel sensor in a camera, an image captured by the camera is substantially two RAW images, specifically, a left pixel image and a right pixel image.

Correspondingly, after the left pixel image and the right pixel image are obtained, the left pixel image and the right pixel image are transmitted to the depth-of-field expansion module through the camera drive, the camera hardware abstraction layer and the camera service.

And S202, outputting the image to a camera hardware abstract layer by the camera driver.

And S203, outputting the image to a camera service by the camera hardware abstract layer.

Illustratively, the camera driver acquires an image collected by the camera and outputs the image of the camera to the camera hardware abstraction layer. The camera hardware abstraction layer outputs the images collected by the camera to the camera service.

And S204, outputting the image to the depth of field extension module by the camera service.

Understandably, it can be known from the above description that the camera drives the image output to the camera hardware abstraction layer, specifically, the image is the left pixel image and the right pixel image acquired by the dual-pixel sensor.

Accordingly, the camera hardware abstraction layer outputs the images served by the camera, which are also the left pixel map and the right pixel map. In addition, it should be noted that the camera hardware abstraction layer outputs an image to the camera service, specifically, an example of a camera device in the camera service.

Accordingly, the camera service outputs the image to the depth-of-field extension module, which is also the left pixel map and the right pixel map.

In addition, as can be known from the description of the above creation process, the camera device instance records the corresponding relationship between the ID, configuration 1, and session1 of the application of the camera application, which is input by the camera device client instance, so that after the camera device instance receives the image input by the camera hardware abstraction layer, the camera device instance detects the currently stored session, that is, when the session1 and other information (including the application ID and configuration 1) are currently stored, the camera device instance outputs the image and the session1 to the depth expansion module.

It should be noted that, in this embodiment, the purpose of maintaining and associating sessions in the creation process and the shooting process is to facilitate distinguishing different session processes, so that the method provided in this embodiment can be applied to various implementation scenarios, such as multiple times of calling shooting.

And S205, the depth of field extension module performs depth of field processing on the left pixel image and the right pixel image input by the camera service based on the depth of field extension model so as to obtain a depth of field extended image, and outputs the obtained depth of field extended image to the camera service.

And S206, outputting the depth of field extended image to a camera application by the camera service.

Regarding the process of outputting the extended depth of field image to the camera application by the camera service, for example, the camera device instance receives an image (i.e., a processed extended depth of field image) input by the extended depth of field module and a session corresponding to the camera application, for example, session1, the camera device instance outputs the extended depth of field image and session1 to the camera data stream instance, and then the camera data stream instance can output the extended depth of field image to the camera application based on the recorded correspondence between session1 and the application ID of the camera application, so that the extended depth of field image can be displayed on the current interface, as shown in fig. 8, or directly stored in the gallery application.

Therefore, in the depth-of-field extended image generation method provided by this embodiment, a left pixel map and a right pixel map are obtained by using existing dual-pixel sensors in a camera of an electronic device, and then the left pixel map and the right pixel map are processed according to a depth-of-field extended model obtained by pre-training.

Furthermore, it is understood that the electronic device comprises corresponding hardware and/or software modules for performing the respective functions in order to implement the above-described functions. The present application can be realized in hardware or a combination of hardware and computer software in connection with the exemplary algorithm steps described in connection with the embodiments disclosed herein. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, with the embodiment described in connection with the particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In addition, it should be noted that, the above description is given by taking the electronic device as a mobile phone as an example, and the example is only for better understanding of the technical solution of the present embodiment, and is not meant to be the only limitation to the present embodiment. In practical applications, the electronic device applicable to the extended depth of field image generation method provided in the embodiment of the present application is not limited to the above-mentioned mobile phone, and may be any device that includes a camera and has a fixed and non-adjustable aperture.

In addition, it should be further noted that, in an actual application scenario, the depth-of-field extended image generation method provided by the foregoing embodiments and implemented by the electronic device may also be executed by a chip system included in the electronic device, where the chip system may include a processor. The system-on-chip may be coupled to the memory, such that the computer program stored in the memory is called when the system-on-chip is running to implement the steps performed by the electronic device. The processor in the system on chip may be an application processor or a processor other than an application processor.

In addition, a computer-readable storage medium is provided, where computer instructions are stored, and when the computer instructions are run on an electronic device, the electronic device is caused to execute the relevant method steps to implement the depth-of-field extended image generation method in the foregoing embodiment.

In addition, the embodiment of the present application further provides a computer program product, which when running on an electronic device, causes the electronic device to execute the above related steps, so as to implement the depth-of-field extended image generation method in the above embodiment.

In addition, embodiments of the present application also provide a chip (which may also be a component or a module), which may include one or more processing circuits and one or more transceiver pins; the receiving pin and the processing circuit are communicated with each other through an internal connection path, and the processing circuit executes the related method steps to realize the depth-of-field extended image generation method in the embodiment, so as to control the receiving pin to receive signals and control the sending pin to send signals.

In addition, as can be seen from the foregoing description, the electronic device, the computer readable storage medium, the computer program product, or the chip provided in the embodiments of the present application are all configured to execute the corresponding methods provided above, and therefore, the beneficial effects achieved by the electronic device, the computer readable storage medium, the computer program product, or the chip may refer to the beneficial effects in the corresponding methods provided above, which are not described herein again.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for generating a depth-of-field extended image is applied to a first electronic device, a camera of the first electronic device integrates a double-pixel sensor, and the method comprises the following steps:

acquiring a left pixel image and a right pixel image of a target object by using the double-pixel sensor, wherein the left pixel image and the right pixel image are both RAW images;

and performing depth of field processing on the left image data and the right image data based on a depth of field extended model to obtain a target depth of field extended image corresponding to the target object.

2. The method of claim 1, wherein prior to said acquiring, with said dual pixel sensor, a left pixel map and a right pixel map of a target object, said method further comprises:

constructing a data set required for training the depth-of-field extension model, wherein the data set comprises a plurality of data groups, and each data group at least comprises a sample left pixel map and a sample right pixel map;

taking a sample left pixel map and a sample right pixel map in each data set in the data set as input images, and inputting a network model for iterative training until a set requirement is met;

and taking the network model meeting the set requirement as the depth-of-field extension model.

3. The method of claim 2, wherein constructing the data set required to train the depth-of-field extension model comprises:

calibrating fuzzy kernels on the double-pixel sensor at different distances to obtain corresponding relations between different depth information and different fuzzy kernels;

acquiring a sample color mode RGB image and a sample depth image corresponding to the sample RGB image from a public data set, wherein the sample RGB image and the sample depth image in the public data set are provided by a second electronic device with a multi-camera module, and the sample RGB image is an RGB image shot in a focus scene;

degrading each acquired sample RGB image into a sample RAW image according to an imaging channel corresponding to the double-pixel sensor;

for each sample RAW image, selecting a fuzzy core corresponding to the sample RAW image from the corresponding relation according to the corresponding sample depth image;

for each sample RAW image, processing the sample RAW image according to the selected fuzzy core to obtain a sample left pixel image and a sample right pixel image corresponding to the sample RAW image, and taking the sample left pixel image, the sample right pixel image and the sample RGB image as a data group;

and summarizing the data group corresponding to each sample RGB image to obtain a data set required by constructing and training the depth of field extension model.

4. The method according to claim 3, wherein the blur kernel on the dual-pixel sensor comprises a left blur kernel, a right blur kernel, and a joint blur kernel, the joint blur kernel is a blur kernel corresponding to a RAW image generated by combining a left pixel map and a right pixel map, the left blur kernel is a blur kernel corresponding to the left pixel map in the RAW image, and the right blur kernel is a blur kernel corresponding to the right pixel map in the RAW image;

the degrading each acquired sample RGB image to a sample RAW image according to the imaging path corresponding to the dual-pixel sensor comprises:

for each sample RGB image, selecting a corresponding joint fuzzy core from the corresponding relation according to the corresponding sample depth image;

degrading each acquired sample RGB image into a sample RAW image according to an imaging path corresponding to the double-pixel sensor and the joint fuzzy core corresponding to each sample RGB;

for each sample RAW image, selecting a blur kernel corresponding to the sample RAW image from the correspondence according to the corresponding sample depth image, including:

for each sample RAW image, selecting a left fuzzy core and a right fuzzy core corresponding to the sample RAW image from the corresponding relation according to the corresponding sample depth image;

for each sample RAW image, processing the sample RAW image according to the selected blur kernel to obtain a sample left pixel map and a sample right pixel map corresponding to the sample RAW image, including:

and for each sample RAW image, processing the sample RAW image according to the selected left blur to obtain a sample left pixel map corresponding to the sample RAW image, and processing the sample RAW image according to the selected right blur to obtain a sample right pixel map corresponding to the sample RAW image.

5. The method according to claim 3, wherein the meeting of the set requirement is that a loss condition of a predicted depth extended image output by the network model meets a set loss value, and the predicted depth image is predicted by the network model based on the sample left pixel map and the sample right pixel map.

6. The method according to claim 5, wherein the loss of the predicted extended depth of field image is determined according to the predicted extended depth of field image, the gradient in the X direction, and the gradient in the Y direction of the sample RGB image.

7. The method of claim 3, wherein each data set further comprises a sample depth image;

the method further comprises the following steps:

for each sample RGB image, taking the sample left pixel map, the sample right pixel map, the sample RGB image and the sample depth image corresponding to the sample RGB image as a data group.

8. The method according to claim 7, wherein the satisfaction requirement is that a loss condition of the predicted depth extended image and a loss condition of the predicted depth image output by the network model satisfy a set loss value, and both the predicted depth image and the predicted depth image are obtained by prediction based on the sample left pixel map and the sample right pixel map by the network model.

9. The method according to claim 8, wherein the loss of the predicted extended depth field image is determined according to the predicted extended depth field image, the gradient in the X direction, and the gradient in the Y direction of the sample RGB image, and the loss of the predicted depth image is determined according to the sample depth image and the predicted depth image.

10. The method of claim 1, wherein the depth-of-field extension model is a convolutional neural network model.

11. The method of claim 10, wherein convolutional layers in the convolutional neural network model at input sources are connected with convolutional layers at output sources in a hopping manner, wherein maximum pooling is adopted between convolutional layers at input sources, and upper convolutional layer processing is adopted between convolutional layers at output sources.

12. The method according to any one of claims 1 to 11, wherein the target depth extended image is an RGB image.

13. An electronic device, characterized in that the electronic device comprises: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the depth-extended image generation method of any one of claims 1 to 12.

14. A computer-readable storage medium characterized by comprising a computer program which, when run on an electronic device, causes the electronic device to execute the depth-extended image generation method according to any one of claims 1 to 12.