WO2021204202A1

WO2021204202A1 - Image auto white balance method and apparatus

Info

Publication number: WO2021204202A1
Application number: PCT/CN2021/085966
Authority: WO
Inventors: 冯思博; 陈梓艺; 万磊; 贾彦冰; 翟其彦; 曾毅华
Original assignee: 华为技术有限公司
Priority date: 2020-04-10
Filing date: 2021-04-08
Publication date: 2021-10-14

Abstract

The present application provides an image auto white balance method and apparatus. The method comprises: obtaining a photographing parameter used when a first camera of an electronic device captures an original RAW domain image; obtaining a multi-channel image corresponding to the original RAW domain image; inputting input data into a first neutral network model to obtain a first gain value for white balance, the input data comprising at least the photographing parameter of the first camera, and the multi-channel image; and performing first processing on the multi-channel image to obtain a target image, the first processing comprising a white balance processing based on the multi-channel image and the gain value. The implementation of embodiments of the present application can improve the accuracy and the stability of image auto white balance of the electronic device, and improve the user experience.

Description

Method and device for image automatic white balance

This application is required to be submitted to the Chinese Patent Office on April 10, 2020, the application number is 202010280949.7, and the application name is "Method and Apparatus for Automatic Image White Balance", and it shall be submitted to the Chinese Patent Office on August 14, 2020, the application number is 202010817963.6 , The priority of the Chinese patent application titled "Method and Device for Automatic Image White Balance", the entire content of which is incorporated in this application by reference.

Technical field

This application relates to the field of artificial intelligence, and in particular to methods and devices for image automatic white balance in the field of photography technology.

Background technique

With the rapid development of mobile phone chips, the camera functions of mobile phones have become more and more abundant, and users have put forward higher requirements for the basic quality (color, clarity, etc.) of pictures taken by mobile phones. Among them, color is one of the important factors to evaluate the quality of mobile phone photos, and the automatic white balance (AWB) is an important part of the formation of the color of the picture.

The human visual system has the characteristics of color constancy, that is, the human visual system can resist the color change of the light source, so as to constantly perceive the color of the object. But the image sensor (Sensor) shows different colors of objects under different light. For example, in a natural environment, the same object will show different colors under the illumination of different colors of light, such as green leaves that turn yellow under the morning light. , But it is blue in the evening. In order to eliminate the influence of the light source on the imaging of the image sensor, simulate the color constancy of the human visual system, and ensure that the white seen in any scene is true white, it is necessary to introduce automatic white balance technology.

White balance is an indicator that describes the accuracy of the white color generated after the three primary colors of red, green, and blue are mixed in the display. The automatic white balance technology is mainly used to solve the problem of image color cast under different light sources, so that the image of the scene in the image is in line with people. The color vision habits of the eye. The computational color constancy in the automatic white balance processing is dedicated to solving this problem. Its main purpose is to calculate the color of an unknown light source represented by any image, and then use the light source color to perform color correction on the input image to achieve Display under standard white light.

At present, how to achieve high-demand AWB is a technical challenge that needs to be solved urgently.

Summary of the invention

The embodiments of the present application provide a method and device for automatic image white balance, which can improve the accuracy and stability of the image white balance of an electronic device, and improve the user experience.

In a first aspect, an embodiment of the present application provides a method for automatic image white balance, which is applied to an electronic device including a first camera, and includes: acquiring shooting parameters used when the first camera shoots an original RAW domain image; Obtain the multi-channel image corresponding to the original RAW domain image; input the input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel Image; performing first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value.

Among them, the original RAW image can be referred to as the RAW image, and the RAW image can be the raw data of the CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal.

The shooting parameters indicate parameters used when performing shooting, such as shooting parameters used by a camera, an image sensor, and so on. Alternatively, shooting parameters can also be understood as control parameters generated when the processor controls the camera and image sensor during shooting. The shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.

A multi-channel image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels. In the RGB color mode, the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.

This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation, so as to assist the white balance processing process. The processing includes white balance processing implemented using a neural network model, and the neural network model is used to obtain a white balance gain value or an image light source value (gain value) required for white balance processing at least according to the shooting parameters and the multi-channel image. And the image light source value is the reciprocal relationship). The neural network model described in the embodiments of this application may be a single neural network model, or a combination of two or more neural network models.

After outputting the gain value or the image light source value, the electronic device can use the gain value or the image light source value to perform white balance processing on the channel image, thereby realizing the correction of the image chromatic aberration caused by the light source color temperature, so that the color of the object in the image is close to its original The overall effect of the image is in line with the visual and cognitive habits of the human eye.

It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image as the input of the neural network model, and provides more color information for the AWB neural network model. The shooting parameters are added as the input of the AWB neural network model to provide shooting configuration information for light source estimation, which can improve the ability of the AWB neural network model to distinguish different light source scenes and ensure good light source estimation accuracy. Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.

Based on the first aspect, in possible embodiments, the neural network model may be a model constructed based on deep learning, for example, it may be a deep neural network (DNN) model, a convolutional neural network, One of CNN), Long Short-Term Memory (LSTM), or Recurrent Neural Network (RNN), or a combination of multiple, etc.

Based on the first aspect, in a model implementation manner, the first neural network model realizes the prediction of the first gain value by fusing the shooting parameters of the first camera and the image features of the multi-channel image .

In an embodiment, the first neural network model may include a first feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the process of obtaining the first gain value through the first neural network model is specific It includes: performing feature extraction on the multi-channel image through the first feature extraction network (for example, performing statistical operations on the pixels of the channel image through convolution processing) to obtain the first feature; and fusing through the feature fusion network (fusion For example, the method may be one or more combinations of concat function processing, conv2d function processing, element multiplication processing, element addition processing, etc.) The shooting parameters of the first camera and the first feature are obtained to obtain the fused feature; by The light source prediction network performs prediction according to the fused features, and obtains the first gain value or the image light source value, which is used in the subsequent white balance processing process.

The AWB neural network model in the embodiments of the present application can be applied to all scenes, and a large amount of training data is used during model training. The training data includes data obtained in a bright light scene and data obtained in a dark light scene. In the massive data, it is difficult for the neural network to achieve high-precision fitting in the whole scene, and the added camera parameters can provide a priori information for the shooting scene, help the neural network to distinguish between bright light scenes and dark light scenes, thereby improving the two types The light source estimation accuracy of the scene. The implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.

Based on the foregoing model implementation, in a possible embodiment, the solution of the present application can be applied to an independent electronic device, and the neural network model can be configured in the electronic device. Correspondingly, the first processing specifically includes: obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera and the multi-channel image; The first gain value performs white balance processing on the multi-channel image; performs post-processing on the white balance processed image to obtain the target image. Therefore, when the electronic device has sufficient computing resources, the computing power of the electronic device is fully utilized to perform neural network calculations, which improves processing efficiency and reduces white balance processing delay.

Based on the foregoing model implementation, in possible embodiments, the solution of the present application can be applied to electronic devices in the end-cloud system, and the neural network model can be configured in the cloud server in the end-cloud system. Correspondingly, the first processing specifically includes: sending the shooting parameters of the first camera and the multi-channel image to a server; receiving the first gain value from the server, where the first gain value is Obtained through a first neural network model configured on the server; white balance processing is performed on the multi-channel image using the first gain value; post-processing the white balance processed image to obtain the target image. Therefore, when the computing power of the electronic device is not strong enough, the computing power of the cloud server can also be used to calculate the neural network model to ensure the accuracy and stability of the white balance processing, so that the solution of this application can be applied to different types of devices. Improve user experience.

Based on the first aspect, in yet another model implementation manner, the input data further includes scene semantic information represented by the multi-channel image; the first neural network model specifically integrates the shooting of the first camera Parameters, image features of the multi-channel image, and scene semantic information represented by the multi-channel image are used to predict the first gain value.

In an embodiment, the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the first gain is obtained through the first neural network The value process specifically includes: performing feature extraction on the multi-channel image through the first feature extraction network (for example, performing a statistical operation on the pixels of the channel image through convolution processing) to obtain the first feature; and through the second feature extraction network. The feature extraction network performs feature extraction on the scene semantic information (for example, through convolution processing to realize the analysis/perception of the scene information of the channel image), to obtain the second feature; through the feature fusion network fusion (for example, the fusion method may be concat One or more combinations of function processing, conv2d function processing, element multiplication processing, element addition processing, etc.) shooting parameters, the first feature and the second feature to obtain the fused feature; through the light source The prediction network makes predictions according to the fused features, and obtains the first gain value or the image light source value, which is used in the subsequent white balance processing process.

Among them, the scene semantic information represents the semantic features related to the shooting scene represented by the image. In specific implementation, various types of shooting scenes can be defined. For example, shooting scenes can be classified based on light source types, such as cold light source scenes, warm light source scenes, single light source scenes, multiple light source scenes, and so on. For another example, shooting scenes can be classified based on image content, such as portrait shooting scenes, non-portrait shooting scenes, object shooting scenes, landscape shooting scenes, and so on. The scene semantic information can provide a priori semantic information for the image to a large extent, help the AWB neural network to distinguish different scenes, and then improve the overall accuracy of the AWB neural network.

For example, during model training, it is difficult for neural networks to achieve high-precision fitting in the entire scene for massive training data. For example, under different light source conditions, the network output is unstable, which affects the skin color sense. At this time, if face detection information is added as the scene semantic information into the neural network, the neural network will increase the attention of the face area during the training process. , So as to improve the fitting accuracy of the network in the face scene.

Based on the foregoing model implementation, in a possible embodiment, the solution of the present application can be applied to an independent electronic device, and the neural network model can be configured in the electronic device. The first processing specifically includes: extracting scene semantic information from the multi-channel image; according to the shooting parameters of the first camera, the multi-channel image and the scene semantic information, by configuring in the electronic device Obtain the first gain value by the first neural network model; use the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image .

Based on the foregoing model implementation, the solution of the present application can be applied to electronic devices in the end-cloud system, and the neural network model can be configured in the cloud server in the end-cloud system. The processing specifically includes: The shooting parameters, the multi-channel image, and the scene semantic information are sent to a server; the first gain value is received from the server, and the first gain value is passed through a first neural network model configured on the server Obtained; using the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image.

Based on the first aspect, in a possible embodiment, performing scene semantic information extraction on the multi-channel image includes: performing object detection, scene classification, image scene segmentation, portrait segmentation, or face segmentation on the multi-channel image At least one operation in the detection to obtain the scene semantic information.

For example, the scene classification algorithm is used to realize the classification of faces and non-faces, the classification of single light sources and multiple light sources, the color temperature classification of light sources, or the classification of indoor and outdoor scenes, and so on.

For another example, the image scene segmentation algorithm can be used to segment the picture to generate a mask map; alternatively, the scene classification algorithm, object detection algorithm, face detection algorithm, skin color segmentation algorithm and other technologies can also be used to generate the mask map. The mask map can provide more information related to the shooting scene than a single frame of the AWB neural network model provided by this application, thereby enhancing the AWB neural network's attention to different shooting scenes, helping the neural network to fit and converge, and achieve better High prediction accuracy.

Based on the first aspect, in a possible embodiment, the acquiring a multi-channel image corresponding to the original RAW domain image includes: preprocessing the original RAW domain image to obtain the multi-channel image, the preprocessing Including demosaicing. Using a simplified demosaicing operation will make the length and width of the multi-channel image half the length and width of the down-sampled RAW image, which can increase the speed of subsequent algorithms.

Based on the first aspect, in a possible embodiment, the pre-processing process may also include black level correction (BLC) and lens shadow correction (LSC), and dark current can be reduced through BLC processing. The influence on the image signal can be eliminated by the LSC processing. Optionally, it may also include image down-sampling processing and noise reduction processing.

Based on the first aspect, in a possible embodiment, the white balance processed image can also be post-processed through some image enhancement algorithms to further improve the quality of the image, obtain the final target image for display, and output it to the electronic device Display on the screen. The image enhancement algorithm may include, for example, operations such as gamma correction, contrast enhancement, dynamic range enhancement, or image sharpening.

Based on the first aspect, in a possible embodiment, the multi-channel image is a three-channel image or a four-channel image.

Based on the first aspect, in a possible embodiment, a training process for a neural network model may be: the training data includes the annotation of the light source color of the image, the multi-channel image obtained by preprocessing the RAW image, the shooting parameters, Optionally, it also includes scene semantic information. After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model.

Based on the first aspect, in a possible embodiment, the image used for training the neural network model may not be a single frame image, but a labeled video sequence. In the AWB neural network model, network structures such as LSTM and RNN can be introduced, and time-domain related strategies can also be used during model training. In other words, the video sequence can be used as training data, and the AWB neural network model adds the images of the previous and subsequent frames of the current image as the model input. By using video sequence training, increasing the input of consecutive frames before and after, introducing LSTM, RNN and other structures, and increasing time-domain related training strategies, the stability of the light source estimation of the AWB neural network model can be increased, and the white balance beating under the same light source can be reduced. Probability. Thereby, it can be used in video functions, increase the stability of white balance, and improve user experience.

In a second aspect, an embodiment of the present application provides a method for automatic image white balance, which is applied to an electronic device including at least two cameras, the at least two cameras including a first camera and a second camera, and the method includes: Select a target camera from the at least two cameras according to the user's shooting instruction; the shooting instruction includes a shooting magnification; when the target camera is the second camera, acquire the second camera to shoot the second original RAW domain The shooting parameters used in the image and the second multi-channel image corresponding to the second original RAW domain image; perform color migration on the second multi-channel image to obtain a migrated image that fits the first camera; at least transfer the The shooting parameters of the second camera and the migration image are input to the first neural network model to obtain the first gain value of the white balance; the first neural network model is associated with the first camera, specifically, the first The neural network model is trained according to the data collected by the first camera and the shooting parameters of the first camera; the first gain value is processed into the second gain value corresponding to the second camera; The second multi-channel image is subjected to first processing to obtain a second target image; wherein the first processing includes white balance processing based on the second multi-channel image and the second gain value.

Among them, in the embodiment of the present application, the number of cameras configured in the electronic device is not limited. In the scenario of two or more cameras, the type of each camera is not limited. For example, the so-called "different types" can be cameras with different shooting magnifications (or zoom magnifications) or focal lengths, such as main cameras, telephoto cameras, wide-angle cameras, medium-telephoto cameras, ultra-telephoto cameras, or ultra-wide-angle cameras, etc. . For another example, the so-called "different types" may mean that the image sensors corresponding to each camera are different. For example, the image sensor corresponding to the wide-angle camera is an RGGB module, and the image sensor corresponding to a conventional camera is a RYYB module.

For another example, when the first camera and the second camera are two of the main camera, the telephoto camera, and the wide-angle camera, at least one of the following is true: the image sensor corresponding to the telephoto camera includes the RGGB module The image sensor corresponding to the main camera includes a RYYB module; the image sensor corresponding to the wide-angle camera includes an RGGB module; the shooting magnification of the telephoto camera is greater than the shooting magnification of the main camera; the main The shooting magnification of the camera is greater than the shooting magnification of the wide-angle camera.

In the embodiment of the present application, performing color migration on the second multi-channel image to obtain a migration image that fits with the first camera includes: performing a color shift on the second camera based on the difference between the second camera and the first camera. The second multi-channel image performs a color shift operation to obtain a shifted image that fits the photosensitive characteristics of the image sensor corresponding to the first camera. In this way, the migrated image (which can be referred to as the migrated image for short) is used as the input of the first AWB neural network with the camera parameters of the second camera, and the color value of the light source that meets the shooting characteristics of the first camera is calculated. On this basis, further The color value of the light source undergoes a migration operation, so that the color value of the light source is migrated to the color value of the light source corresponding to the second camera.

Nowadays mobile phones and other electronic devices are equipped with multiple cameras. The user will zoom in or zoom out or select the camera when shooting. However, there are differences in the types of image sensors or cameras corresponding to multiple cameras. RAW taken in the same scene There may be large differences in the image value range (the image sensor devices of the same type may have small differences). The automatic white balance method described in this application can make the neural network model compatible with two or more cameras at the same time, expand the applicable scenarios, improve the adaptability to multiple lenses, and greatly improve the user experience.

Based on the second aspect, in a possible embodiment, when the target camera is the first camera, the method further includes: acquiring the shooting parameters and the shooting parameters used when the first camera shoots the first original RAW domain image. A first multi-channel image corresponding to the first original RAW domain image; at least the shooting parameters of the first camera and the first multi-channel image are input to the first neural network model to obtain a third gain of white balance Value; white balance processing is performed according to the first multi-channel image and the third gain value to obtain a first target image.

Based on the second aspect, in a possible embodiment, the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.

Based on the second aspect, in a possible embodiment, the multi-channel image is a three-channel image or a four-channel image.

In a third aspect, an embodiment of the present application provides a method for automatic white balance of an image. The method is applied to an electronic device including at least two cameras. The at least two cameras include a first camera and a second camera. include:

The target camera is selected from the at least two cameras according to the user's shooting instruction; the shooting instruction includes a shooting magnification; the shooting parameters used when the target camera shoots the original RAW domain image and the number corresponding to the original RAW domain image are acquired. Channel image; determine the neural network model corresponding to the target camera; wherein, the first camera is associated with a first neural network model, and the second camera is associated with a second neural network model. Specifically, the first neural network model The network model is trained based on the data collected by the first camera and the shooting parameters of the first camera, and the second neural network model is obtained based on the data collected by the second camera and the shooting of the second camera. Parameter training is obtained; input data into the neural network model to obtain a white balance gain value; wherein, the input data includes at least the shooting parameters of the target camera and the multi-channel image; the multi-channel image is performed The first processing obtains a target image; wherein the first processing includes white balance processing based on the multi-channel image and the gain value.

Wherein, the magnifications of the first camera and the second camera are different, or the image sensors corresponding to the first camera and the second camera are different. Or the camera types of the first camera and the second camera are different, and the camera types include a main camera, a telephoto camera, a wide-angle camera, a medium-telephoto camera, an ultra-telephoto camera, and an ultra-wide-angle camera.

It can be seen that in implementing this solution, different cameras can be configured with different neural network models, for example, the first camera corresponds to the first neural network model, and the second camera corresponds to the second neural network model; the first neural network model can be Trained from the data collected by the first camera (or the same device, or a device similar to the first camera), the second neural network model can be a second camera (or the same device, or a device similar to the second camera) Equipment) data collected by training. In this way, the data of different cameras can be processed independently, and the pertinence and accuracy of the neural network model can be improved.

Based on the third aspect, in a possible embodiment, the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.

Based on the third aspect, in a possible embodiment, the multi-channel image is a three-channel image or a four-channel image.

In a fourth aspect, an embodiment of the present application provides a device for realizing automatic white balance of an image, including: a parameter acquisition module for acquiring shooting parameters used when the first camera captures an original RAW domain image; an image acquisition module for Acquire a multi-channel image corresponding to the original RAW domain image; a processing module for inputting input data into a first neural network model to obtain a first gain value of white balance; the input data includes at least the shooting parameters of the first camera And the multi-channel image; also used to perform first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value .

Among them, different functional modules of the device can cooperate with each other to implement the method described in any embodiment of the first aspect of the present application.

In a fifth aspect, an embodiment of the present application provides an electronic device. The electronic device includes a camera, a memory, and a processor, and optionally a display screen. The display screen is used for displaying images; wherein: the camera is used for Take an image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute any embodiment of the first aspect of the present application Described method.

In a sixth aspect, an embodiment of the present application provides an electronic device. The electronic device includes at least two cameras, a memory, and a processor. The at least two cameras include a first camera and a second camera, and optionally a display The display screen is used to display images. Wherein: the at least two cameras are both used to capture images; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, specifically It is used to execute the method described in any embodiment of the second aspect of the present application.

In a seventh aspect, an embodiment of the present application provides an electronic device. The electronic device includes at least two cameras, a memory, and a processor. The at least two cameras include a first camera and a second camera, and optionally a display The display screen is used to display images. Wherein: the at least two cameras are both used to capture images; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, specifically It is used to execute the method described in any embodiment of the third aspect of the present application.

In an eighth aspect, an embodiment of the present application provides a chip. The chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, so as to perform operations such as the first aspect or the second aspect. Or the method described in any embodiment of the third aspect.

In the ninth aspect, an embodiment of the present invention provides yet another non-volatile computer-readable storage medium; the computer-readable storage medium is used to store the information described in the first aspect, the second aspect, or any embodiment of the third aspect The implementation code of the method. When the program code is executed by a computing device, the method described in any embodiment of the first aspect or the second aspect or the third aspect can be implemented.

In a tenth aspect, an embodiment of the present invention provides a computer program product; the computer program product includes program instructions, and when the computer program product is executed by a computing device, it executes any of the aforementioned first, second, or third aspects. The method described in the embodiment. The computer program product may be a software installation package, and the computer program product may be downloaded and executed on the controller to implement the method described in any embodiment of the first aspect or the second aspect or the third aspect.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.

FIG. 1 is an exemplary diagram of an electronic device provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;

FIG. 3 is an example diagram of a terminal cloud interaction scenario provided by an embodiment of the present application;

4 is a schematic diagram of the device structure of a device-cloud interaction scenario provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of the device structure of a chip provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a system architecture provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of another system architecture provided by an embodiment of the present application;

FIG. 8 is a schematic flowchart of an image automatic white balance method provided by an embodiment of the present application;

FIG. 9 is an example diagram of a RAW image and a three-channel image provided by an embodiment of the present application;

FIG. 10 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application;

FIG. 11 is a schematic diagram of the structure and processing flow of a neural network model provided by an embodiment of the present application;

FIG. 12 is a schematic flowchart of another image automatic white balance method provided by an embodiment of the present application;

FIG. 13 is a schematic diagram of the structure and processing flow of another neural network model provided by an embodiment of the present application;

FIG. 14 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application;

FIG. 15 is an example diagram of an image preprocessing process provided by an embodiment of the present application;

FIG. 16 is an example diagram of an image post-processing process provided by an embodiment of the present application;

FIG. 17 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application;

FIG. 18 is a schematic diagram of a user operation scenario provided by an embodiment of the present application;

FIG. 19 is a block diagram of a possible software structure of a terminal according to an embodiment of the present application;

FIG. 20 is an example diagram of some model training processes provided by embodiments of the present application;

FIG. 21 is an example diagram of a processing flow in a multi-camera scenario provided by an embodiment of the present application;

FIG. 22 is another example diagram of a processing flow in a multi-camera scenario provided by an embodiment of the present application; FIG.

FIG. 23 is an example diagram of a target image under different shooting magnifications according to an embodiment of the present application; FIG.

FIG. 24 is a schematic diagram of the structure of an apparatus provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. The singular forms of "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. It should be noted that when used in this specification and the appended claims, the term "including" and any variations thereof are intended to cover non-exclusive inclusions. For example, a system, product, or device that includes a series of units/devices is not limited to the listed units/devices, but optionally includes unlisted units/devices, or optionally includes these products or devices Inherent other units/devices.

It should also be noted that the terms "first", "second", "third" and the like in this specification and claims are used to distinguish different objects, and are not used to describe a specific sequence or specific meaning.

The terms used in the implementation mode part of this application are only used to explain specific embodiments of this application, and are not intended to limit this application.

In the shooting scene, different light sources have different spectral components and distributions, and the color of the light source can also be called color temperature in terms of colorimetry. Exemplarily, the color emitted by a black body at 3200K is defined as white, and the color emitted by a black body at 5600K is defined as blue, and so on. In imaging, objects in the environment (including people, objects, scenes, etc.) present their colors by reflecting incident light to the image sensor. Therefore, the color of the light source in the environment will affect the color of the image of the object, directly or indirectly changing the object Its own color, forming a chromatic aberration. For example, white objects will become reddish when illuminated by light with low color temperature (such as incandescent lamps, candles, sunrise and sunset light source scenes), and will become blue when illuminated by light with high color temperature (such as cloudy sky, snowy sky, tree shade and other light source scenes). .

Auto White Balance (AWB) is to automatically correct the color of the image taken by the camera. The so-called "white balance" is to correct the chromatic aberration caused by different color temperatures so that white objects can appear as original white. Color objects are also as close to their original colors as possible, so that the overall effect of the image is in line with the visual and cognitive habits of the human eye.

For example, white balance processing can be implemented based on the lambert reflection model. In an embodiment, the processing algorithm of the white balance processing is shown in the following formula ①:

R＝I/L①

Among them, R represents the pixel value (Rr, Gr, Br) corresponding to the image after white balance processing, and R is close to or equal to the color of the object under neutral light.

I represents an image (Ri, Gi, Bi) captured by an electronic device, and the image may be the multi-channel image described in the embodiment of the present application

L may represent light source color information (R1, G1, B1), for example, may specifically be the image light source value described in the embodiment of the present application. It should be noted that L here is a broad concept. In camera imaging, L can also include the bias of the image sensor to the color of the object.

The task of the white balance processing is to estimate L through I and possible additional inputs, and further obtain the color R of the object under neutral light, so as to eliminate the imaging chromatic aberration affected by the light source as much as possible, so that the white color will appear under different light sources. White, and other color objects are as close as possible to their original colors.

In another embodiment, the processing algorithm of the white balance processing is shown in the following formula ②:

R＝I*G②

G represents the gain value of the white balance (1/Rl, 1/Gl, 1/Bl). By comparing the above ① and ②, it can be seen that the gain value and the light source color information can have the following reciprocal relationship:

G＝1/L③

The task of the white balance processing is to estimate G through I and possible additional inputs, and further obtain the color R of the object under neutral light, so as to eliminate the imaging chromatic aberration affected by the light source as much as possible, so that the white color will appear under different light sources. White, and other color objects are as close as possible to their original colors.

It should be noted that, for the convenience of description, the white balance processing in this article mainly uses the color information of the light source as an example to describe the solution, and the realization of the gain value solution can be implemented similarly, for example, the white balance is directly obtained based on the neural network model. Gain value, or obtain the image color information based on the neural network model, and further obtain the white balance gain value according to the image color information. This article will not expand the description.

The prior art proposes some methods to determine the color of the light source, such as the gray world algorithm, the perfect reflector algorithm, or the dynamic threshold algorithm to determine the light source color, or the color histogram of the image to determine the light source Color, etc.

Now we have put forward higher requirements for automatic white balance algorithms and methods, and the higher requirements are reflected in one or more of the following: (1) AWB needs to show higher light source estimation accuracy in various scenes; ( 2) In ambiguous scenes such as multiple light sources, it is impossible to estimate a light source value to satisfy all light source areas in the image, and the AWB algorithm is required to show a stable tendency in ambiguous scenes; (3) For photos taken under the same lighting conditions, white balance is required Try to be as stable as possible to avoid color jumps; (4) The computational performance overhead of the AWB algorithm must be small enough to meet the real-time requirements.

The embodiments of the application provide an automatic white balance method for images/videos based on deep learning, which can overcome the above technical defects, improve the accuracy of AWB in full scenes, and improve the stability of AWB of images/videos. And the tendency to ensure stability in ambiguous scenes such as multiple light sources to meet real-time requirements.

The following introduces possible application scenarios of the method described in this application.

Referring to FIG. 1, in an application scenario, the method described in this application can be applied to an independent electronic device 10.

The above-mentioned electronic device 10 may be mobile or fixed. For example, the electronic device 10 may be a mobile phone (mobile phone) with image processing function, a tablet personal computer (TPC), a notebook computer, or a media player. , Smart TV, laptop computer (LC), personal digital assistant (personal digital assistant, PDA), personal computer (PC), camera, SLR, video camera, smart watch, surveillance equipment, augmented reality (augmented reality) , AR) devices, virtual reality (virtual reality, VR) devices, wearable devices (WD) or in-vehicle devices, etc., which are not limited in the embodiments of the present application.

Please refer to FIG. 2 to better understand the internal structure diagram of the electronic device 10. As shown in FIG. 2, the electronic device 10 includes: at least one general-purpose processor 13, a memory 15 (one or more computer-readable storage media), and an image acquisition device 11, an image signal processor (ISP) 12 and display device 14, these components can communicate on one or more communication buses. in:

The image acquisition device 11 may include components such as a camera 111, an image sensor (Sensor) 112, etc., which are used to collect images or videos of the shooting scene. The images collected by the image acquisition device 11 may be one or more original RAW domain images. In this article, the original RAW domain image can be referred to as RAW image for short. The multiple original RAW domain images may form a sequence of image frames.

Among them, the camera 111 may be a monocular camera or a binocular camera, and is arranged in a front position (i.e., a front camera) or a rear position (i.e., a rear camera) on the housing of the main body of the electronic device 10,

The image sensor 112 is a photosensitive element, and this application does not limit the type of the photosensitive element. For example, it may be a Complementary Metal-Oxide Semiconductor (CMOS) or a Charge Coupled Device (CCD). The function of the image sensor 112 is to capture the optical image collected by the camera 111 and convert it into an electrical signal usable by the back-end ISP 12.

The image sensor 112 may provide shooting parameters required for actual shooting. The shooting parameters include, for example, at least one of exposure value, shutter time, aperture size, or ISO sensitivity. The ISO sensitivity is the sensitivity specified by the International Standards Organization (ISO), also known as the ISO value, which is used to measure the sensitivity of the sensor to light.

The main function of ISP12 is to process the signal output by the front-end image sensor 112. The algorithms included in ISP12 in the embodiment of this application mainly include Auto White Balance (AWB) algorithm. In addition, it may also include but is not limited to one of the following Or multiple processing algorithms: Automatic Exposure Control (AEC), Automatic Gain Control (AGC), color correction, lens correction, noise removal/noise reduction, dead pixel removal, linear correction, color interpolation, Image downsampling, level compensation, etc. In addition, in some instances, it can also include some image enhancement algorithms, such as gamma correction, contrast enhancement and sharpening, color noise removal and edge enhancement in the YUV color space, color enhancement, and color space conversion (such as RGB to YUV) and so on.

It should be noted that in possible implementation, some algorithms described in ISP12 can also be integrated into other components for processing. For example, image enhancement algorithms can be integrated into Field Programmable Gate Array (Field Programmable Gate Array). Array, FPGA) or digital signal processor (Digital Signal Processor, DSP), cooperate with ISP12 to complete the image processing process together.

The general-purpose processor 13 may be any type of device capable of processing electronic instructions. The electronic device 10 in this application may include one or more general-purpose processors 13, such as a central processing unit (CPU) 131 and neural network processing. One or two of Neural-network Processing Unit (NPU) 132. In addition, it may also include one or more of graphics processing units (GPUs), microprocessors, microcontrollers, main processors, controllers, and ASICs (Application Specific Integrated Circuits), etc. . The general-purpose processor 13 executes various types of digital storage instructions, such as software or firmware programs stored in the memory 813, which enables the computing node 800 to provide a wide variety of services. For example, the processor 811 can execute programs or process data to perform at least a part of the methods discussed herein.

Among them, the function of CPU131 is mainly to parse computer instructions and process data in computer software, realize overall control of electronic device 10, and control all hardware resources of electronic device 10 (such as storage resources, communication resources, I/0 interfaces, etc.) Perform control deployment.

NPU132 is a general term for a new type of processor based on neural network algorithms and acceleration. NPU is specifically designed for artificial intelligence to accelerate neural network operations and solve the problem of low efficiency of traditional chips in neural network operations.

It should be noted that the name of NPU132 does not constitute a limitation on this application. For example, in other application scenarios, NPU132 can also be deformed and replaced with other processors with similar functions, such as Tensor Processing Unit (TPU). Deep learning processor (Deep learning Processing Unit, DPU), etc.

In an embodiment of the present application, when there is an NPU 132, the NPU 132 can undertake tasks related to neural network calculations. For example, the NPU132 can be used to calculate the AWB neural network according to the image information provided by the ISP12 (such as multi-channel images) and the information provided by the image acquisition device (such as shooting parameters) to obtain the color information of the light source, and then feed the color information of the light source to the ISP12. So that ISP12 further completes the AWB process.

In yet another embodiment of the present application, when the CPU 131 exists and the NPU 132 does not exist, the CPU 131 can undertake tasks related to neural network calculations. That is, the CPU131 performs AWB neural network calculations based on the image information provided by the ISP12 (such as multi-channel images) and the information provided by the image acquisition device (such as shooting parameters) to obtain the light source color information, and then feeds back the light source color information to the ISP12 to facilitate ISP12 further completes the AWB process.

The display device 14 is used to display the currently previewed shooting scene and the shooting interface when the user needs to shoot, or to display the target image after the white balance processing. The display device 14 can also be used to display information requiring user operations or information provided to the user, as well as various graphical user interfaces of the electronic device 10. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.

The display device 14 may specifically include a display screen (display panel). Optionally, the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.

The display device 14 may also be a touch panel (touch screen, touch screen). The touch panel may include a display screen and a touch-sensitive surface. When the touch-sensitive surface detects a touch operation on or near it, it is sent to the CPU 131 to determine The type of the touch event, and then the CPU 131 provides a corresponding visual output on the display device 14 according to the type of the touch event.

The memory 15 may include a volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM), and a high-speed cache; the memory may also include a non-volatile memory (Non-Volatile Memory), such as a read-only memory. (Read-Only Memory, ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 604 may also include a combination of the foregoing types of memories. The memory 15 can be used to store the RAW image collected by the image collection device 11, the target image after white balance processing, the image information of the front and back frames, shooting parameters, scene semantic information and other data; the memory 15 can also be used to store program instructions for the processor Call and execute the method of image automatic white balance described in this application.

Based on the above components of the electronic device 10, the automatic white balance of the image can be achieved through the following process: when the electronic device 10 performs shooting, objects (people, objects, scenes, etc.) in the external environment are projected to the image through the optical image collected by the camera 111 On the surface of the sensor 112, it is converted into an electrical signal, and the electrical signal is converted into a digital image signal after analog-to-digital conversion (A/D conversion), and the digital image signal is a RAW image (for example, Bayer format). The image sensor 112 sends the RAW image to the ISP 12 for processing. When the ISP 12 needs to perform AWB, the ISP 12 sends image information (for example, a multi-channel image) to the general-purpose processor 13, and the image acquisition device 11 sends the shooting parameters to the general-purpose processor 13. The general-purpose processor 13 (for example, the CPU131 or the NPU132) can use the input information to calculate the neural network model to obtain the light source color information corresponding to the image. Furthermore, the color information of the light source is fed back to the ISP12, and the ISP12 completes AWB according to the color information of the light source and performs other image processing to obtain a target image, such as an image in YUV or RGB format. Then, the ISP 12 transmits the target image to the CPU 131 through the I/O interface, and the CPU 131 sends the target image to the display device 14 for display.

Those skilled in the art should understand that the electronic device 10 may also include more or fewer components than those shown in the figure, or a combination of certain components, or a different component arrangement. The device structure shown in FIG. 2 does not constitute a limitation on the electronic device 10.

Referring to Fig. 2, in another application scenario, the method described in this application can be applied to a scenario of end-cloud interaction. As shown in FIG. 2, the end-cloud system includes an electronic device 20 and a cloud server 30. The electronic device 20 and the cloud server 30 can communicate with each other, and the communication method is not limited to a wired or wireless method.

Among them, the electronic device 20 may be mobile or fixed. For example, the electronic device 10 may be a mobile phone (mobile phone), a tablet PC, a notebook computer, a media player, a smart TV, a notebook computer, or a personal computer with image processing functions. Digital assistants, personal computers, cameras, SLRs, camcorders, smart watches, monitoring devices, augmented reality devices, virtual reality devices, wearable devices, or in-vehicle devices, etc., are not limited in the embodiments of the present application.

The cloud server 30 may include one or more servers, or one or more processing nodes, or one or more virtual machines running on the server. The cloud server 30 may also be called a server cluster, a management platform, or a data processing center. Etc., the embodiment of the present application does not limit it.

Please refer to Figure 4 to better understand the internal structure of the device under the end-cloud system. As shown in FIG. 4, the end-cloud system includes an electronic device 20 and a cloud server 30. The electronic equipment 20 includes: at least one general-purpose processor 23, a memory 25, an image acquisition device 21, an image signal processor ISP 22, a display device 24, and a communication device 26. These components can communicate on one or more communication buses to achieve Functions of the electronic device 20. The cloud server 30 includes a memory 33, a neural network processor NPU 31, and a communication device 32. These components can communicate on one or more communication buses to realize the functions of the cloud server 30. The electronic device 20 establishes a communication connection with the communication device 32 of the cloud server 30 through the communication device 26, and the communication method is not limited to a wired or wireless method. For example, the communication device 26 and the communication device 32 can be used to send and receive wireless signals to and from each other. The wireless communication methods include but are not limited to one of radio frequency (RF), data communication, Bluetooth, WiFi, etc. Multiple.

It can be seen that the main difference between this end-cloud system and the aforementioned electronic device 10 in FIG. It is implemented on the cloud server 30, that is, the NPU 31 of the cloud server 30 performs neural network calculations. Therefore, the electronic device 20 in the end-cloud system may not include the NPU. The embodiments of the present application make full use of the computing resources of the cloud server, which is beneficial to reduce the operating burden and configuration requirements of the electronic device 20, and improve the user experience.

Based on the above components of the end-cloud system, the automatic white balance of the image can be realized through the following process: when the electronic device 20 performs shooting, objects (people, objects, scenes, etc.) in the external environment are collected by the camera in the image collection device 21 The optical image is projected onto the image sensor in the image acquisition device 21 and converted into an electrical signal. The electrical signal is converted into a digital image signal after analog-to-digital conversion (A/D conversion). The digital image signal is a RAW image (such as Bayer format). ). The image capture device 21 sends the RAW image to the ISP 22 for processing. When the ISP 22 needs to perform AWB, the ISP 22 sends image information (for example, a multi-channel image) to the general-purpose processor 23, and the image acquisition device 21 sends the shooting parameters to the general-purpose processor 23. The general-purpose processor 23 (for example, the CPU 231) may further send the above-mentioned information to the cloud server 30 through the communication device 26. After the cloud server 30 receives the above-mentioned information through the communication device 32, the NPU 31 uses the above-mentioned input information (multi-channel image, shooting parameters, etc.) to calculate a neural network model to obtain the light source color information corresponding to the image. Furthermore, the color information of the light source is fed back to the electronic device 20 through the communication device 32. The color information of the light source is sent to the ISP 22. The ISP 22 performs AWB according to the color information of the light source and performs other image processing to obtain the target image. The target image is for example YUV. Or an image in RGB format. Then, the ISP 22 transmits the target image to the CPU 231 through the I/O interface, and the CPU 231 sends the target image to the display device 24 for display.

It should be noted that the functions of the relevant components of the electronic device 20 in the end-cloud system may be similar to the description of the relevant components of the electronic device 10 in FIG.

Those skilled in the art should understand that the electronic device 20 and the cloud server 30 may also include more or less components than those shown in the figure, or a combination of certain components, or a different component arrangement. The device structure shown in FIG. 2 does not constitute a limitation to the present application.

FIG. 5 is a hardware structure of a chip provided by an embodiment of the present application. The chip includes a neural network processor NPU300.

In an implementation, the NPU 300 may be set in the electronic device 10 as shown in FIG. 2 to complete the calculation work of the neural network. In this case, the NPU 300 is the NPU 132 described in FIG. 2.

In another implementation, the NPU 300 can be set in the cloud server 30 as shown in FIG. 4 to complete the calculation of the neural network. In this case, the NPU 300 is the NPU 31 described in FIG. 4.

The NPU300 can be mounted on a central processing unit (CPU) as a coprocessor, and the main CPU distributes tasks. The core part of the NPU 300 is the arithmetic circuit 303. The arithmetic circuit 303 is controlled by the controller 304 to extract matrix data from the memory and perform multiplication operations.

In some implementations, the arithmetic circuit 303 includes multiple processing units (processengine, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array; the arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.

For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C; the arithmetic circuit 303 fetches the corresponding data of the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303; the arithmetic circuit 303 receives the input The memory 301 takes the data of matrix A and matrix B to perform matrix operations, and the partial results or final results of the obtained matrix are stored in an accumulator 308 (accumulator).

The vector calculation unit 307 can perform further processing on the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 307 may be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, and so on.

In some implementations, the vector calculation unit 307 can store the processed output vector to the unified memory 306. For example, the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 307 generates a normalized value, a combined value, or both.

In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 303, for example for use in a subsequent layer in a neural network.

The unified memory 306 is used to store input data and output data. The weight data directly stores the input data in the external memory into the input memory 401 and/or the unified memory 406 through the storage unit access controller (DMAC) 305, stores the weight data in the external memory into the weight memory 302, and stores The data in the unified memory 306 is stored in the external memory.

The bus interface unit 310 (bus interface unit, BIU) is used to implement the interaction between the main CPU, the DMAC, and the fetch memory 309 through the bus.

The instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304; the controller 304 is used to call the instructions buffered in the instruction fetch memory 309 to control the working process of the computing accelerator.

Generally, unified memory 306, input memory 301, weight memory 302, and fetch memory 309 are all on-chip (On-Chip) memories. Memory (doubledataratesynchronousdynamicrandomaccessmemory, DDRSDRAM), high-bandwidth memory (highbandwidthmemory, HBM) or other readable and writable memory.

Specifically, the calculation of each layer in the neural network model described in the embodiment of the present application (ie, the AWB neural network model described later) may be performed by the arithmetic circuit 303 or the vector calculation unit 307.

Since the embodiment of the present application relates to the application of a neural network, in order to better understand the working principle of the neural network described in the embodiment of the present application, the implementation process of the neural network in the present application is described below.

First, the relevant terms and concepts of the neural network involved in the embodiments of the present application are introduced.

(1) Neural network model

In this article, the neural network and neural network model can be regarded as the same concept, and the two are used selectively based on the convenience of expression. The neural network model described in the embodiments of this application may be composed of neural units. The neural unit may refer to _{an arithmetic unit that takes x s} and intercept 1 as inputs, and the output of the arithmetic unit may be:

Among them, s=1, 2,...n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.

In the embodiments of the present application, the neural network model may be a model constructed based on deep learning, for example, it may be a deep neural network (DNN) model, a convolutional neural network (CNN) or a recurrent neural network (Recurrent Neural Network, RNN), or a combination of multiple, etc.

Illustratively, taking a convolutional neural network model as an example, a convolutional neural network (convolutional neural network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be regarded as a filter. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units on the same feature plane can share weights. The convolution kernel can be initialized in the form of a matrix of random size, or can be initialized with all zeros or other general initialization methods, which are not limited here. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.

In this application, for some specific implementations of the neural network model, reference may be made to the AWB neural network model described later.

(2) Loss function

In the process of training the neural network model, because we hope that the output of the neural network model is as close as possible to the value that we really want to predict, we can compare the predicted value of the current network with the target value that we really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the neural network model), for example, if the predicted value of the network If it is high, adjust the weight vector to make it predict lower, and keep adjusting until the neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is an important equation used to measure the difference between the predicted value and the target value. . Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.

(3) Backpropagation algorithm

The neural network can use the backpropagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged. The back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.

Refer to FIG. 6, which shows a system architecture 100 for neural network model training provided by an embodiment of the present application.

In FIG. 6, a data collection device 160 is used to collect training data. For the method of the embodiment of the present application, the neural network model (ie, the AWB neural network model described later) can be further trained through training data.

In an example, the training data for training the neural network model in the embodiment of the present application may include the multi-channel image corresponding to the original raw domain image, the shooting parameters corresponding to the original raw domain image, and the light source color information annotated to the original raw domain image .

In another example, the training data for training the neural network model in the embodiment of the present application may include multi-channel images corresponding to the original raw domain images, scene semantic information extracted from the multi-channel images, shooting parameters corresponding to the original raw domain images, And the light source color information annotated to the original raw domain image.

It should be noted that the image in the training data may be a single frame image or a multi-frame image of a video frame sequence.

After the training data is collected, the data collection device 160 stores the training data in the database 130, and the training device 120 obtains the target model 101 (for example, the AWB neural network model in the embodiment of the present application) based on the training data maintained in the database 130. The training device 120 inputs the training data into the target model 101 until the degree of difference between the predicted light source color information output by the training target model 101 and the light source color information labelled in the image meets a preset condition. For example, it may be that the angle error of the two corresponding color vectors is smaller than the preset threshold, or remain unchanged, or no longer reduce, so as to complete the training of the target model 101.

It should be noted that in actual applications, the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.

In addition, it should be noted that the training device 120 does not necessarily perform the training of the target model 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to the embodiments of this application. The limit.

The target model 101 obtained by training according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 5. The execution device 110 may use the target model 101 to perform neural network calculations to realize the prediction of the color information of the light source.

In the application scenario of an independent electronic device shown in FIG. 1, the execution device 110 may be the electronic device 10 described above. The input data of the execution device 110 may come from the data storage system 150, and the data storage system 150 may be a memory placed in the execution device 110, or may be an external memory independent of the execution device 110. In the embodiment of the present application, the input data may include, for example, a multi-channel image and shooting parameters; or, may include a multi-channel image, scene semantic information extracted from the image, and shooting parameters. Thus, the execution device 110 realizes the prediction of the color information of the light source based on the input data.

In the end-cloud application scenario shown in FIG. 3, the execution device 110 may be the cloud server 30 in the end-cloud system described above. At this time, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices. For example, the user can input data to the I/O interface 212 through the client device 140, and the client device 140 For example, it may be the electronic device 20 in the end-cloud system. In one case, the client device 140 may automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send input data and the user's authorization is required, the user can set the corresponding authority in the client device 140. In the embodiment of the present application, the input data may include, for example, a multi-channel image and shooting parameters; or, may include a multi-channel image, scene semantic information extracted from the image, and shooting parameters. Thus, the execution device 110 realizes the prediction of the color information of the light source based on the input data. Subsequently, the predicted light source color information can be returned to the client device 140 through the I/O interface 112.

The associating function module 113 can be used to perform relevant processing according to the input data. For example, in an embodiment of the present application, the associating function module 113 can extract scene semantic information from a multi-channel image.

It is worth noting that the training device 120 can generate a corresponding target model 101 based on different training data for different goals or tasks, and the corresponding target model 101 can be used to achieve the above goals or complete the above tasks, thereby Provide users with the desired results. For example, it can be used to train the AWB neural network model as described in the embodiment of FIG. 11 or FIG. 13 below.

In an implementation, the execution device 110 may be configured with a chip as shown in FIG. 5 to complete the calculation work of the calculation module 111.

In another implementation, the training device 120 may also be configured with a chip as shown in FIG. 5 to complete the training work of the training device 120 and output the trained target model 101 to the execution device 110.

It is worth noting that FIG. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.

Referring to FIG. 7, FIG. 7 shows another system architecture 400 provided by an embodiment of the present application. The system architecture includes a local device 420, a local device 430, an execution device 410, and a data storage system 450. The local device 420 and the local device 430 are connected to the execution device 410 through a communication network 440.

Exemplarily, the execution device 410 may be implemented by one or more servers.

Optionally, the execution device 410 can be used in conjunction with other computing devices. For example: data storage, routers, load balancers and other equipment. The execution device 410 may be arranged on one physical site or distributed on multiple physical sites. The execution device 410 may use the data in the data storage system 450 or call the program code in the data storage system 450 to implement the image processing method of the embodiment of the present application.

It should be noted that the above-mentioned execution device 410 may also be a cloud server. At this time, the execution device 410 may be deployed in the cloud, and the execution device 410 may be the cloud server 30 described in the embodiment in FIG. It may be the electronic device 20 described in the embodiment in FIG. 3 above.

In a possible implementation manner, the automatic white balance method in the embodiment of the present application may be independently executed by the local device 420 or the local device 430. For example, the local device 420 and the local device 430 may obtain the relevant parameters of the aforementioned neural network model from the execution device 410, deploy the neural network model on the local device 420 and the local device 430, and use the neural network model to implement the AWB process.

In another possible implementation manner, the automatic white balance method of the embodiment of the present application may be coordinated by the local device 420 or the local device 430 through interaction with the execution device 410. For example, the user may operate respective user devices (for example, the local device 420 and the local device 430) to interact with the execution device 410.

Each local device can represent any computing device, for example, personal computers, computer workstations, smart phones, tablets, cameras, smart cameras, smart car devices or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles Wait.

The local device of each user can interact with the execution device 410 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

It should be understood that the foregoing is an example of an application scenario, and does not limit the application scenario of the present application in any way.

Based on the application scenarios, devices, and systems described above, some automatic white balance methods provided in the embodiments of the present application are described below.

Referring to FIG. 8, FIG. 8 is a schematic flowchart of an image automatic white balance method provided by an embodiment of the present application. The method can be applied to an electronic device that includes a camera and a display screen. The method includes but is not limited to The following steps:

S501: Acquire shooting parameters used when the camera shoots the original RAW domain image.

In this article, the original RAW image can be referred to as the RAW image. The RAW image can be the raw data of a CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal, and the raw data has not been processed by an image signal processor (ISP). The RAW image may specifically be a bayer image in a Bayer format.

The shooting parameters indicate parameters used when performing shooting, such as shooting parameters used by a camera, an image sensor, and so on. Alternatively, shooting parameters can also be understood as control parameters generated when the processor controls the camera and the image sensor during shooting.

The shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.

The color characteristics of the images acquired by the camera and image sensor of the electronic device under the same environment with different shooting parameter configurations will show differences, so the shooting parameters provide the physical conditions for the image at the time of shooting. This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation.

S502: Acquire a multi-channel image corresponding to the original RAW domain image.

After the electronic device obtains the RAW image, it can process the RAW image into a multi-channel image, as shown in Figure 9. Multichannel (Multichannel) image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels. In the RGB color mode, the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.

For example, in an example, the multi-channel image may specifically be a color three-channel image, such as an RGB three-channel image.

In another example, the multi-channel image may specifically be a four-channel image, for example, it may refer to an RGGB four-channel image; or, a BGGR four-channel image; or, a RYYB four-channel image.

S503: Process the multi-channel image to obtain a target image for display on the display screen. Specifically, the input data may be input to the neural network model to obtain the gain value of the white balance, the input data includes at least the shooting parameters of the camera and the multi-channel image; the first processing is performed on the multi-channel image to obtain the target image; wherein, The first processing includes white balance processing based on the multi-channel image and the first gain value.

Wherein, the neural network model is used to obtain the gain value or light source color information required for the white balance processing at least according to the shooting parameters and the multi-channel image.

The neural network model described in the embodiments of this application may be a single neural network model, or a combination of two or more neural network models.

For example, the neural network model can be a model built based on deep learning, for example, it can be a deep neural network (DNN) model, a convolutional neural network (CNN), or a long short-term memory network (LongShort). -Term Memory, LSTM) or Recurrent Neural Network (RNN), or a combination of multiple, etc.

The neural network model provided by the embodiments of the present application can obtain the light source color information required in the white balance processing, such as the image light source value (r/g, 1, b/g), according to the shooting parameters and the multi-channel image. After outputting the light source color information, the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, thereby realizing the correction of the image chromatic aberration caused by the light source color temperature, so that the color of the object in the image is close to its original The color and the overall effect of the image conform to the visual and cognitive habits of the human eye.

It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image as the input of the AWB neural network model, and provides more color information for the AWB neural network model. The shooting parameters are added as the input of the AWB neural network model to provide shooting configuration information for light source estimation, which can improve the ability of the AWB neural network model to distinguish different light source scenes and ensure good light source estimation accuracy. Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.

Referring to FIG. 10, FIG. 10 is a schematic flowchart of a specific image automatic white balance method provided by an embodiment of the present application. The method can be applied to an electronic device. The method includes but is not limited to the following steps:

S601. Shoot at least one original RAW domain image. In the case of single-frame shooting, it can be a photographing scene, and in multi-frame shooting, it can be a video recording scene or a time-lapse shooting scene.

S602. Acquire shooting parameters used for shooting the RAW image.

Specifically, the shooting parameters refer to shooting parameters used when performing shooting, such as parameters used by a camera, an image sensor, and so on. Alternatively, shooting parameters can also be understood as control parameters generated when the processor controls the camera and the image sensor during shooting.

S603: Process the RAW image into a multi-channel image.

S604: Input the multi-channel image and shooting parameters into the neural network model to obtain light source color information.

In other words, the neural network model can obtain the light source color information required in the white balance processing according to the shooting parameters and the multi-channel image.

Among them, the neural network model described in the embodiments of the present application may be a single neural network model, or a combination of two or more neural network models.

Referring to FIG. 11, the neural network model may be the AWB neural network model shown in FIG. 11. The AWB neural network model specifically includes a first feature extraction network, a feature fusion network and a light source prediction network.

The first feature extraction network is used to perform feature extraction on the channel image corresponding to the RAW image to obtain the first feature; the first feature is used to characterize the color information of the channel image. In a specific implementation, the first feature extraction network may include one or more convolution kernels, and a statistical operation on the pixels of the channel image is implemented through convolution processing, so as to obtain the first feature.

The feature fusion network is used to fuse the first feature and the shooting parameter to obtain the fused feature. The fusion method is not limited to one or more combinations of operations such as concat function processing, conv2d function processing, elementwise multiply processing, and elementwise add processing. For example, the aforementioned two-way information (the first feature and the shooting parameter) can be weighted to obtain the fused feature.

It should be noted that in the process of fusion of the feature fusion network, the shooting parameters can be expanded into the form of a multi-dimensional array to match the array form of the first feature, so that the mathematical form of the two-way data is consistent, so as to facilitate data fusion deal with.

The light source prediction network is used to make predictions based on the fused features to obtain light source color information. The light source color information can be used to indicate the color temperature of the light source or the color difference of the image, so the light source color information can be used in the subsequent AWB process.

For example, after the fused features are processed by the light source prediction network, the light source prediction network outputs the image light source value (r/g, 1, b/g), which can be used in the subsequent AWB processing process.

As can be seen from the above, the AWB neural network model realizes the prediction of the color information of the light source by fusing the characteristics of the channel image and the shooting parameters.

In the application scenario of the independent electronic device shown in Figure 1 above, the AWB neural network model can be configured in the electronic device, and the processor (such as CPU or NPU) in the electronic device is used to implement the neural network model calculation to obtain the Light source color information. Therefore, when the electronic device has sufficient computing resources, the computing power of the electronic device is fully utilized to perform neural network calculations, which improves processing efficiency and reduces white balance processing delay. The specific hardware implementation process has been described in detail in the previous section, and will not be repeated here.

In the aforementioned end-cloud application scenario shown in Figure 3, the AWB neural network model can be configured in the cloud server in the end-cloud system. The electronic device can send the multi-channel image, the scene semantic information extracted from the image, and the shooting parameters to the cloud server, and use the processor (such as CPU or NPU) in the cloud server to realize the neural network model calculation to obtain the color information of the light source. The server then feeds back the color information of the light source to the electronic device. Therefore, when the computing power of the electronic device is not strong enough, the computing power of the cloud server can also be used to calculate the neural network model to ensure the accuracy and stability of the white balance processing, so that the solution of this application can be applied to different types of devices. Improve user experience. The specific implementation process has been described in detail above, and will not be repeated here.

S605: Perform white balance processing on the multi-channel image according to the color information of the light source to obtain a target image and display it on the display screen.

Specifically, after the AWB neural network model outputs light source color information (such as image light source value), the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, so as to realize the image color difference caused by the light source color temperature. Correction so that the color of the object in the image is close to its original color, and the overall effect of the image is in line with the visual and cognitive habits of the human eye.

It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image instead of the statistical feature as the input of the AWB neural network model, which provides more color information for the AWB neural network model. Added shooting parameters as the input of the AWB neural network model, such as one or more of the shutter speed, exposure time, exposure value, ISO, aperture size, etc., to provide shooting configuration information for light source estimation, and to obtain The RAW image provides a reference for shooting conditions. Taking the shooting parameters as the input of the neural network model can help the network improve the accuracy of light source prediction. It can improve the discrimination ability of the AWB neural network model for different light source scenes and ensure good light source estimation accuracy.

For example, the AWB neural network model in the embodiment of the present application can be applied to the entire scene, and a large amount of training data is used during model training. The training data includes data obtained in a bright light scene and data obtained in a dark light scene. In the massive data, it is difficult for the neural network to achieve high-precision fitting in the whole scene, and the added camera parameters can provide a priori information for the shooting scene, help the neural network to distinguish between bright light scenes and dark light scenes, thereby improving the two types The light source estimation accuracy of the scene.

It should be noted that the above only provides an example as an illustration. The camera parameters can not only be used to distinguish bright light scenes and dark light scenes, but also other scenes, such as outdoor and indoor, day and night, etc. Scenes of different categories of attributes. Therefore, adding camera parameters as the input of the neural network can effectively refresh the model's light source estimation accuracy in these types of scenes, thereby improving the overall light source estimation accuracy.

In addition, in the model input, which parameters are selected among the camera parameters such as shutter speed, exposure time, exposure value, ISO, aperture size, etc., can be selected based on the actual available information of the electronic device. One or more of the above-mentioned camera parameters can provide a reference for the shooting conditions of the image, which are all helpful to improve the accuracy of the network. The actual application requires flexible selection according to the hardware and software conditions.

Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.

Referring to FIG. 12, FIG. 12 is a schematic flowchart of another method for image automatic white balance provided by an embodiment of the present application. The method can be applied to electronic devices. The main difference between this method and the method described in FIG. 10 is that the neural network The calculation process of the model also uses scene semantic information to further improve the accuracy of light source color information prediction. The method includes but is not limited to the following steps:

S701. Take at least one frame of original RAW domain image.

S702. Acquire shooting parameters used when shooting the RAW image.

S703. Process the RAW image to obtain a multi-channel image.

The implementation process of the foregoing S701-S703 can be similar to the description of the foregoing steps S601-S603, which will not be repeated here.

S704: Extract scene semantic information of the multi-channel image.

In different shooting scenes, the color of the light source may be different. For example, for indoor portrait shooting scenes, the light source may be an incandescent lamp. For outdoor landscape shooting scenes, the light source may be the sun or street lights. In order to further improve the accuracy of the light source color information prediction, the embodiments of the present application may use the scene semantic information to provide a reference on the shooting scene for the light source color estimation.

In the embodiment of the present application, the scene semantic information represents the semantic features related to the shooting scene represented by the image. In specific implementation, various types of shooting scenes can be defined.

For example, shooting scenes can be classified based on light source types, such as cold light source scenes, warm light source scenes, single light source scenes, multiple light source scenes, and so on.

For another example, shooting scenes can be classified based on image content, such as portrait shooting scenes, non-portrait shooting scenes, object shooting scenes, landscape shooting scenes, and so on.

In addition, the shooting scene can also be a combination of the above-mentioned multiple scenes. In addition, other types of shooting scenes may also be defined based on actual application needs, which is not limited in the embodiment of the present application.

Specifically, one or more preset extraction algorithms can be used to extract scene semantic information from a multi-channel image.

For example, the preset extraction algorithm may be one of a scene classification algorithm, an image scene segmentation algorithm, an object detection algorithm, a portrait segmentation algorithm, a face detection algorithm, a human detection algorithm, a skin color segmentation algorithm, or an object detection algorithm, etc. Or multiple combinations.

For another example, the scene information extraction of the image may not use the scene segmentation technology, but the object detection algorithm is used to extract the scene semantic information, and the generated object category box is generated to generate the scene category mask map and sent to the AWB neural network. In this way, object detection technology can be used instead of scene segmentation to extract scene semantic information, which simplifies the complexity of scene information extraction, increases the calculation speed, reduces the calculation complexity, and reduces the performance overhead.

It should be noted that in the above embodiment, the scene semantic information as an auxiliary input is not necessarily in the form of a mask map, but may also be in other forms. For example, after the image is processed by a scene classification algorithm, the output may be a series of classification confidences. Degree (vector), in the form of a vector as the input of the neural network model.

S705: Input the multi-channel image, scene semantic information and shooting parameters into the neural network model to obtain light source color information.

In other words, the neural network model can obtain the light source color information required in the white balance processing according to the shooting parameters, scene semantic information, and multi-channel images.

Referring to FIG. 13, the neural network model may be the AWB neural network model shown in FIG. 13. The AWB neural network model specifically includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network.

The first feature extraction network is used to perform feature extraction on the channel image corresponding to the RAW image to obtain the first feature; the first feature is used to characterize the color information of the channel image.

In an alternative embodiment, the first feature extraction network may include one or more small convolution kernels, and the statistical operation on the pixels of the channel image is realized through convolution processing, so as to obtain the first feature.

The second feature extraction network is used to perform feature extraction on the scene semantic information to obtain a second feature, and the second feature is used to characterize the scene information corresponding to the channel image.

In an optional embodiment, the second feature extraction network may include one or more large convolution kernels, and the analysis/perception of the scene information of the channel image is realized through convolution processing, so as to obtain the second feature.

It should be noted that the so-called "large convolution kernel" and "small convolution kernel" are conceptually relative to each other, that is to say, in an optional solution, the second feature extraction network can be set The scale of the convolution kernel is larger than the convolution kernel in the first feature extraction network, so as to realize a larger range of image perception ability, so as to obtain more accurate scene information.

The feature fusion network is used to fuse the first feature, the second feature, and the shooting parameter to obtain the fused feature. The fusion method is not limited to one or more combinations of operations such as concat function processing, conv2d function processing, elementwise multiply processing, and elementwise add processing. For example, the aforementioned three-way information (the first feature, the second feature, and the shooting parameters) can be weighted to obtain the fused feature.

It should be noted that in the process of fusion of the feature fusion network, the shooting parameters can be expanded into the form of a multi-dimensional array to match the array form of the first feature and the second feature, so that the mathematical form of the three-way data is consistent. To facilitate data fusion processing.

It can be seen from the above that the AWB neural network model realizes the prediction of the color information of the light source by fusing the characteristics of the channel image, the characteristics of the scene semantic information, and the shooting parameters.

In the application scenario of the independent electronic device shown in Figure 1 above, the AWB neural network model can be configured in the electronic device, and the processor (such as CPU or NPU) in the electronic device is used to implement the neural network model calculation to obtain the Light source color information. The specific hardware implementation process has been described in detail in the previous section, and will not be repeated here.

In the aforementioned end-cloud application scenario shown in Figure 3, the AWB neural network model can be configured in the cloud server in the end-cloud system. The electronic device can send the multi-channel image, the scene semantic information extracted from the image, and the shooting parameters to the cloud server, and use the processor (such as CPU or NPU) in the cloud server to realize the neural network model calculation to obtain the color information of the light source. The server then feeds back the color information of the light source to the electronic device. The specific implementation process has been described in detail above, and will not be repeated here.

S706: Perform white balance processing on the multi-channel image according to the color information of the light source to obtain the target image and display it on the display screen. For details, please refer to the description of step S605, which is not repeated here.

It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image instead of the statistical feature as the input of the AWB neural network model, and provides more color information for the AWB neural network model. The scene semantic information and shooting parameters are added as the input of the AWB neural network model, which provides more effective prior knowledge (shooting configuration information and scene information) for light source estimation, and greatly enhances the AWB neural network model for different light source scenes. The discrimination ability improves the overall light source estimation accuracy and can effectively help the neural network to converge and fit.

Among them, the scene semantic information can provide a priori semantic information for the image to a large extent, help the AWB neural network to distinguish different scenes, and then improve the overall accuracy of the AWB neural network.

For another example, if the neural network does not perform well in blue sky, grass and other scenes, image segmentation technology can be introduced, and the segmented sky area and grass area are input into the neural network as scene information, and the neural network will increase the number of sky scenes and grass scenes. Attention, so as to improve the accuracy of light source estimation in this scene.

It should be noted that many forms of scene semantic information are provided in the embodiments of this application. In actual applications, the specific types of scene semantic information to be adopted can be determined according to the needs of AWB in different scenarios. This application does not make special decisions. Restriction, including the specific content of the scene semantic information and the method of obtaining it are not restricted. For example, one or more of the extraction techniques of image segmentation, instance segmentation, face detection, human body detection, skeleton detection, scene classification, etc. can be used. The semantic information of the scene is obtained as the input of the AWB neural network.

Therefore, the implementation of this application can improve the white balance accuracy of electronic equipment shooting in full scenes, improve the stability of AWB for single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.

In order to better understand the method provided by the foregoing embodiment of the present application, a more detailed embodiment is described below. Referring to FIG. 14, the method can be applied to an electronic device, including but not limited to the following steps:

S801. Shoot at least one RAW image.

S802. Acquire shooting parameters used for shooting the RAW image.

When the user takes a photo on the interactive interface of the terminal, the mobile phone collects a frame of RAW image in BAYER format while taking a photo, and at the same time obtains the corresponding shooting parameters when the picture is taken.

The selection of shooting parameters can choose the exposure value, shutter time, aperture size, ISO sensitivity and other parameters. Because the color characteristics of the pictures acquired by the sensor of the mobile phone under the same environment and different parameter configurations will show differences, the shooting parameters provide the conditions for the image when shooting, and provide a reference for the light source estimation algorithm.

S803: Perform preprocessing on the RAW image to obtain a color three-channel image, such as an RGB three-channel image. For an RGB three-channel image, each pixel has three components of red, green and blue.

Referring to FIG. 15, the preprocessing process of the RAW image can be performed by, for example, the ISP of the electronic device, and the preprocessing process includes all the image processing steps experienced in generating the color three-channel image.

Specifically, FIG. 15 shows an example of a preprocessing process, which may include Black Level Correction (BLC) and Lens Shade Correction (LSC). BLC processing can reduce dark current. The influence on the image signal can be eliminated by the LSC processing. Optionally, it also includes image down-sampling processing and noise reduction processing. A specific implementation process is described as follows:

You can first downsample the RAW image to a size suitable for the network input to improve the subsequent calculation speed. Then perform a simple noise reduction process on the down-sampled image (the process of noise reduction process should try to avoid affecting the color of the image). After that, BLC processing and LSC processing are performed to eliminate the influence of the level offset of the image sensor and the brightness and color unevenness caused by the imaging of the convex lens of the camera. After the above processing, the RAW image is in Bayer format, and it needs to undergo demosaicing to obtain a color three-channel image. In order not to affect the color, the demosaicing operation can be simplified to an average green channel, and red, blue and green are re-arranged to obtain a color three-channel image.

Using a simplified demosaicing operation will make the length and width of the color three-channel image be half the length and width of the down-sampled RAW image (refer to Figure 9 above), which can increase the speed of subsequent algorithms.

It should be noted that the preprocessing process may also include other processing algorithms, which are not limited in other embodiments of the present application.

S804: Extract scene semantic information of the multi-channel image.

S805: Input the multi-channel image, scene semantic information and shooting parameters into the neural network model to obtain light source color information.

The specific implementation of the above S804-S805 can be similar to the description of S704-705, which will not be repeated here.

S806: Perform white balance processing on the image by using the color information of the light source.

S807: Further perform image enhancement processing on the image after the white balance processing, to obtain the final target image for display.

Referring to FIG. 15, the process of image enhancement processing on the RAW image can be executed by the ISP of the electronic device, for example, or by other devices of the electronic device, such as a Field Programmable Gate Array (FPGA) or a digital signal. The processor (Digital Signal Processor, DSP) executes.

In the embodiments of the present application, the white balance processed image may also be post-processed through some image enhancement algorithms to further improve the image quality, obtain the final target image for display, and output it to the display screen of the electronic device for display.

Among them, the image enhancement algorithm may include, for example, operations such as gamma correction, contrast enhancement, dynamic range enhancement, or image sharpening.

It should be noted that the post-processing process may also adopt other processing algorithms according to actual application needs, which is not limited in other embodiments of the present application.

In order to better understand the solution of the present application, the following takes the structure of the electronic device 10 shown in FIG. 2 as an example to describe the solution as an example:

When the user uses the electronic device 10 to perform shooting, the CPU 131 controls the camera 111 to collect light signals from the shooting environment, and the image sensor 112 converts the light signals captured by the camera 111 into digital signals, thereby obtaining one or more RAW images. Was sent to ISP12. The ISP12 performs preprocessing to process the RAW image into a color three-channel image, and extracts scene semantic information of the color three-channel image. The color three-channel image and the scene semantic information are further input to the NPU 132, and the control parameters (shutter, exposure time, aperture size, etc.) of the CPU 131 for the camera 111 and the image sensor 112 are also input to the NPU 132. According to the input data, the NPU132 executes the calculation of the AWB neural network model to obtain the light source color value (r/g, 1, b/g); and returns the light source color value to the ISP12. ISP12 performs white balance processing according to the color value of the light source, and uses image enhancement algorithms to further optimize the white balance processed image to obtain the target image. The target image is further sent to the display device 14 through the CPU 131 for display.

It can be seen that, compared with the embodiment shown in FIG. 12, the embodiment of the present application, on the basis of achieving better AWB, also provides a refinement of the image preprocessing process and refinement of the image post-processing stage. Method to realize. The introduction of the preprocessing process not only facilitates the rapid and efficient generation of multi-channel images, so as to realize the realization of the AWB method of the present application, but also helps to improve the image quality (for example, reducing the influence of dark current, reducing noise, eliminating vignetting, etc. Etc.) and neural network algorithm computing speed. Through the introduction of the post-processing process, the quality of the image can be further improved, the application needs of the user can be met, and the user's look and feel experience can be improved.

In order to fully understand the solution of the present application, the following describes the AWB solution more explicitly from the perspective of the application program of an electronic device (such as a mobile phone). With reference to Figure 17 and Figure 18, the solution includes but is not limited to the following steps:

S901: It is detected that the user instructs the camera to perform a shooting operation.

Wherein, the operation for instructing to perform shooting may be, for example, touch, click, voice control, key control, remote control, etc., for triggering the electronic device to shoot. For example, the operation used by the user to instruct the shooting behavior may include pressing the shooting button in the camera of the electronic device, or may include the user device instructing the electronic device to perform the shooting behavior through voice, or it may also mean that the user instructs the electronic device to perform the shooting behavior through a shortcut key. The shooting behavior may also include other user instructing the electronic device to perform the shooting behavior. This application does not make specific restrictions.

In a possible implementation manner, before step S901, the method further includes: detecting an operation for opening the camera by the user; in response to the operation, displaying a shooting interface on the display screen of the electronic device.

For example, after the electronic device detects that the user has clicked an icon of a camera application (application, APP) on the desktop, it can start the camera application and display the shooting interface.

(A) in FIG. 18 shows a graphical user interface (GUI) of a shooting interface 91 of a mobile phone. The shooting interface 91 includes a shooting control 93 and other shooting options. After the electronic device detects that the user clicks on the shooting control 93, the mobile phone executes the shooting process.

Exemplarily, the shooting interface 91 may further include a viewing frame 92; after the electronic device starts the camera, in the preview state, the preview image can be displayed in the viewing frame 92 in real time. It is understandable that the size of the viewfinder frame can be different in the photo mode and the video mode. For example, the viewfinder frame may be the viewfinder frame in the photographing mode. In the video mode, the viewfinder frame can be the entire display screen. In the preview state, that is, before the user turns on the camera and does not press the photo/video button, the preview image can be displayed in the viewfinder frame in real time.

S902. In response to the operation, display the target image on the display screen.

Wherein, the target image is obtained after white balance processing is implemented using a neural network model, and the neural network model is used to obtain light source color information required for the white balance processing according to input data.

Exemplarily, as shown in FIG. 18, in response to the user's instruction operation, the mobile phone executes the shooting process in the background, including: shooting through the camera to obtain a RAW image; performing preprocessing through the ISP to process the RAW image into a three-channel color Image; use the AWB neural network model to perform calculations based on input data to obtain light source color information, and implement white balance processing based on the light source color information. Later, image enhancement algorithms can be used to further optimize to obtain the target image. The target image is displayed on the display screen. For example, (b) in FIG. 18 shows a GUI of a display interface 94 based on an album, and the target image 95 can be displayed on the display interface 94.

In an embodiment, the input data of the model includes shooting parameters and multi-channel images. The structure of the model and the process of performing the calculation can be similar to the description of the aforementioned embodiment in FIG. 11, which will not be repeated here.

In yet another manner, the input data of the model includes shooting parameters, multi-channel images, and scene semantic information extracted from the multi-channel images. The structure of the model and the process of performing the calculation can be similar to the description of the foregoing embodiment in FIG. 13, and details are not repeated here.

It should be understood that the specific implementation process of the above method can be similar to the description of the previous embodiment in Figures 8-16, that is, the expansion, definition, explanation and description of the relevant content in the previous Figure 8-16 are also applicable to Figures 17 and 18 I won’t repeat the same content here.

The software system architecture of the electronic device that can be used to implement the methods shown in FIG. 17 and FIG. 18 is further described below. The software system can adopt a layered architecture, event-driven architecture, micro-kernel architecture, micro-service architecture, or cloud architecture. The following takes the layered architecture of the Android system as an example for description. Refer to FIG. 19, which is a block diagram of a possible software structure of an electronic device in an embodiment of the present application.

As shown in the figure, the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.

The application layer can include a series of application packages. As shown in the figure, the application package may include applications such as a camera APP, an image beautification APP, and an album APP.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions. As shown in Figure 3, the application framework layer can include a window manager, a content provider, a resource manager, a view system, and so on. in:

The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, etc.

The content provider is used to store and retrieve data and make these data accessible to applications. The data may include image data, video data, and so on.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

The view system includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to construct the display interface of the application.

For example, the shooting interface of a camera APP presented through the view system is shown in Figure 18 (a). The shooting interface 91 includes a shooting control 93, a preview box 92, and other related controls, such as image browsing controls. Front and rear camera switching controls, etc. The preview frame 92 is used to preview the scene image to be shot.

When the user clicks or touches the front and rear camera switching controls, the electronic device can be instructed to select the front camera or the rear camera for shooting.

When the user clicks or touches the shooting control 93, the electronic device will drive the camera device to initiate a shooting operation, and instruct the lower-level system library to process the image and save it in the album.

When the user clicks or touches the image browsing control, the electronic device can call the album APP and display the image processed by the automatic white balance method proposed in this application.

For example, the display interface of a photo album APP presented through the view system is shown in (b) of FIG. 18. The target image 95 can be displayed on the display interface 94.

Android Runtime The scheduling and management of the Android system can include core libraries and virtual machines. The core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.

The application layer and application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.

The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), graphics engine, etc.

The surface manager is used to manage the display subsystem and provide layer fusion functions for multiple applications.

The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.

The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, sensor driver, etc. Among them, the camera driver can be used to drive the camera of the electronic device for shooting, and the display driver can be used to display the processed image on the display panel of the display screen.

The graphics engine is a drawing engine for image processing. In the embodiment of this application, the graphics engine can be used to process the RAW image into a color three-channel image; extract the scene semantic information of the color three-channel image; combine the color three-channel image, shooting parameters and the scene Semantic information is input to a neural network to obtain light source color information; white balance processing is performed on the color three-channel image according to the light source color information to obtain an image for display.

It should be noted that the training process involved in the AWB neural network model can have multiple implementation forms. For example, two exemplary training processes are shown in FIG. 20.

A training process for the AWB neural network model may be: the training data includes the annotation of the light source color of the image, the multi-channel image obtained by preprocessing the RAW image, the shooting parameters, and optionally also the scene semantic information. After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model. When the model meets the application index through a large amount of model training, the target model can be output.

Another training process for the AWB neural network model can be: the training data includes the labeling of the light source color of the image, the target image obtained by the preprocessing of the RAW image and the image enhancement algorithm processing, the shooting parameters, and optionally the scene Semantic information. After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model. When the model meets the application index through a large amount of model training, the target model can be output.

In addition, it should be noted that, in a possible implementation, the image used for training the AWB neural network model may not be a single frame image, but a labeled video sequence. In the AWB neural network model, network structures such as LSTM and RNN can be introduced, and time-domain related strategies can also be used during model training. In other words, the video sequence can be used as training data, and the AWB neural network model adds the images of the previous and subsequent frames of the current image as the model input. By using video sequence training, increasing the input of consecutive frames before and after, introducing LSTM, RNN and other structures, and increasing time-domain related training strategies, the stability of the light source estimation of the AWB neural network model can be increased, and the white balance beating under the same light source can be reduced. Probability. Thereby, it can be used in video functions, increase the stability of white balance, and improve user experience.

In addition, in the embodiments of the present application, the number of cameras configured in the electronic device is not limited. In the scenario of two or more cameras, the type of each camera is not limited. For example, the so-called "different types" can be cameras with different magnifications (shooting magnification or zoom magnification) or cameras with different focal lengths, such as conventional cameras, main cameras, telephoto cameras, wide-angle cameras, medium-telephoto cameras, ultra-telephoto cameras, Or ultra-wide-angle camera and so on. For another example, the so-called "different type" can mean that the image sensor corresponding to each camera is different. For example, the image sensor corresponding to the wide-angle camera can be the RGGB module, the image sensor corresponding to the main camera can be the RYYB module, and the image corresponding to the telephoto camera. The sensor can be a module of RGGB.

Nowadays mobile phones and other electronic devices are equipped with multiple cameras. The user will zoom in or zoom out or select the camera when shooting. However, there are differences in the types of image sensors or cameras corresponding to multiple cameras. RAW taken in the same scene There may be large differences in the image value range (the image sensor devices of the same type may have small differences). In the case of two or more cameras, the automatic white balance method (or the method of obtaining image light source information) described in this application can be adjusted and adapted in multiple ways.

For example, a shooting scene is shown in FIG. 23, and the user will perform a viewfinder operation when taking a photo. The user can zoom in or zoom out the viewfinder (mobile phone screen) to achieve the effect of zooming in and out of the scene. The example effects of several target images are shown in (1)(2)(3) in Figure 23 respectively. Among them, for (1), when the user needs to shoot the details of the distant view, the picture must be zoomed in. When the zoom is 10 times (10x) and above, the focal length of the main camera is not enough to provide a very clear effect. At this time, it will switch to Telephoto lens for viewfinder shooting. The telephoto lens may use an RGGB module, and the sensitivity and spectral response curves will be different from the main camera. For (2), in general shooting, if the viewfinder is in the range of 1x to 10x, the focal length of the main camera is sufficient to provide a clear effect. At this time, the RAW image collected by the main camera will be cropped according to the focal length to achieve the effect of zooming in. . The main camera may use the RYYB module, which has better sensitivity, and the response curve of the spectrum will be different from that of the RGGB module. For (3), if the viewfinder is smaller than 1x, the current focal length of the main camera is not enough to provide a larger field of view (filed of view, FOV). If there is a wide-angle lens, the camera device will switch to a wide-angle camera to provide a larger viewing angle. The wide-angle camera may use an RGGB module, and the wide-angle camera may also use a photosensitive module different from the main camera or telephoto. The sensitivity and spectral response will be different from the above two cameras.

Referring to FIG. 21, FIG. 21 shows a possible shooting process. In this scenario, an electronic device is configured with a first camera and a second camera as an example. The first camera and the second camera may be different types of cameras. The two cameras share the neural network model. In this scenario, the electronic device is equipped with the first AWB neural network model. The first AWB neural network model can be used by the first camera (or the same device, or similar to the first camera). Equipment) data collected by training.

As shown in Figure 21, in actual shooting, if the user selects the first camera for shooting, the obtained RAW image of the first camera is preprocessed to obtain a multi-channel image, which can match the camera parameters of the first camera As the input of the first AWB neural network, the color value (or gain value) of the light source corresponding to the first camera is calculated.

If the user selects the second camera for shooting, the obtained RAW image of the second camera is preprocessed to obtain a multi-channel image. In addition, the electronic device also performs an image migration operation based on the multi-channel image, that is, the image color of the multi-channel image Migrate to the image color that meets the shooting characteristics of the first camera. Specifically, based on the difference between the second camera and the first camera, perform a color migration operation on the multi-channel image corresponding to the second camera to obtain A transition image that fits the photosensitive characteristics of the image sensor corresponding to the first camera. Then, the migration image is matched with the camera parameters of the second camera as the input of the first AWB neural network, and the color value (or gain value) of the light source that meets the shooting characteristics of the first camera is calculated. On this basis, the color value of the light source is further (Or gain value) perform a migration operation, so as to migrate the light source color value (or gain value) to the light source color value (or gain value) corresponding to the second camera.

It should be noted that when training the model for the first AWB neural network, you can use the image data collected by the first camera (or a device of the same model, or a device similar to the first camera), the camera parameters of the first camera, etc. Used as training data for model training. The image data and camera parameters collected by the second camera or other cameras can also be used, but the collected image data needs to be migrated to the first camera to participate in the training.

It should also be noted that when the first AWB neural network model is trained on the data collected by the second camera, it is equivalent to the roles of the first camera and the second camera in the embodiment of FIG. 21, and the implementation method will be similar. The above implementation process will not be repeated here.

Referring to FIG. 22, FIG. 22 shows another possible shooting process. In this scenario, the first camera and the second camera are configured by the electronic device as an example. The first camera and the second camera may be different types of cameras. The two cameras correspond to different neural network models. For example, the first camera corresponds to the first AWB neural network model, and the second camera corresponds to the second AWB neural network model; the first AWB neural network model can be generated by the first camera ( Or the same device, or a device similar to the first camera). The second AWB neural network model can be acquired by the second camera (or the same device, or a device similar to the second camera). Obtained from data training.

As shown in Figure 21, in actual shooting, if the user selects the first camera for shooting, the obtained RAW image of the first camera is preprocessed to obtain a multi-channel image, and the multi-channel image is combined with the first camera's camera parameters as the first An AWB neural network is input to calculate the light source color value (or gain value) corresponding to the first camera.

If the user selects the second camera for shooting, the obtained RAW image of the second camera is preprocessed to obtain a multi-channel image. The multi-channel image is combined with the camera parameters of the second camera as the input of the second AWB neural network, and the second AWB neural network is calculated. The color value (or gain value) of the light source corresponding to the camera.

Similarly, when training the model for the second AWB neural network, you can use the image data collected by the second camera (or the device of the same model, or the device similar to the second camera), the camera parameters of the second camera, etc. as training The data is used for model training. The image data and camera parameters collected by the first camera or other cameras can also be used, but the collected image data needs to be migrated to the second camera to participate in the training.

The embodiments in Figures 21 and 22 above are only used to explain the solution of the present application and not to limit it. For example, the above process can be similarly extended to multiple (more than two) cameras in practical applications; or the training and model training of models in practical applications The use of also needs to utilize the scene semantic information, and the specific implementation method can be combined with the description of the embodiments in Figure 12 and Figure 13 above, and will not be repeated here.

Based on the same application concept, an embodiment of the present application also provides an apparatus for realizing automatic white balance of an image. See FIG. 24, which is a schematic structural diagram of an apparatus for automatic white balance of an image provided by an embodiment of the present application. The device includes: a parameter acquisition module 1001, an image acquisition module 1002, and a processing module 1003. In an example, the above-mentioned functional module may run in a processor of an electronic device having a camera (for example, it may be referred to as a first camera). in:

The parameter acquisition module 1001 is configured to acquire the shooting parameters used when the first camera shoots the original RAW domain image.

The image acquisition module 1002 is configured to acquire a multi-channel image corresponding to the original RAW domain image.

The processing module 1003 is configured to input input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel image; The multi-channel image is subjected to first processing to obtain a target image; wherein, the first processing includes white balance processing based on the multi-channel image and the first gain value.

In some possible embodiments, the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.

In some possible embodiments, the first neural network model implements the prediction of the first gain value by fusing the shooting parameters of the first camera and the image features of the multi-channel image.

In some possible embodiments, the processing module is specifically configured to: according to the shooting parameters of the first camera and the multi-channel image, obtain the first neural network model configured in the electronic device. Gain value; use the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image.

In some possible embodiments, the processing module is specifically configured to: send the shooting parameters of the first camera and the multi-channel image to a server; receive the first gain value from the server, and the The first gain value is obtained through the first neural network model configured on the server; the multi-channel image is white-balanced by using the first gain value; the white-balanced image is post-processed , To obtain the target image.

In some possible embodiments, the first neural network model includes a first feature extraction network, a feature fusion network, and a light source prediction network; the processing module is specifically configured to: Perform feature extraction on the channel image to obtain the first feature; fuse the shooting parameters of the first camera and the first feature through the feature fusion network to obtain the fused feature; use the light source prediction network to obtain the fused feature according to the fusion To predict the features of, and obtain the first gain value.

In some possible embodiments, the input data further includes scene semantic information represented by the multi-channel image; the first neural network model specifically combines the shooting parameters of the first camera, the multi-channel image The image characteristics of the image and the scene semantic information represented by the multi-channel image are used to predict the gain value.

In some possible embodiments, the processing module is specifically configured to: extract scene semantic information from the multi-channel image; according to the shooting parameters of the first camera, the multi-channel image, and the scene semantic information , Obtaining the first gain value through a first neural network model configured in the electronic device; using the first gain value to perform white balance processing on the multi-channel image; performing white balance processing on the white balance processed image Post-processing to obtain the target image.

In some possible embodiments, the processing module is specifically configured to: send the shooting parameters of the first camera, the multi-channel image, and the scene semantic information to a server; and receive the first camera from the server. A gain value, the first gain value is obtained through a first neural network model configured in the server; the white balance processing is performed on the multi-channel image by using the first gain value; the white balance processing is performed The post-processed image is performed to obtain the target image.

In some possible embodiments, the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network; the processing module is specifically configured to: pass the first feature The extraction network performs feature extraction on the multi-channel image to obtain the first feature; performs feature extraction on the scene semantic information through the second feature extraction network to obtain the second feature; merges the shooting through the feature fusion network The parameters, the first feature, and the second feature are used to obtain a fused feature; the light source prediction network is used to predict according to the fused feature to obtain the first gain value.

In some possible embodiments, the processing module is specifically configured to perform at least one operation of object detection, scene classification, image scene segmentation, portrait segmentation, or face detection on the multi-channel image, so as to obtain the Describe the semantic information of the scene.

In some possible embodiments, the image acquisition module is specifically configured to perform preprocessing on the original RAW domain image to obtain the multi-channel image, and the preprocessing includes demosaicing.

In some possible embodiments, the multi-channel image is a three-channel image or a four-channel image.

It should be noted that through the detailed description of the aforementioned embodiments in FIGS. 8-16, those skilled in the art can clearly know the implementation methods of the various functional modules included in the device, so for the sake of brevity of the description, the details are not described herein again.

Based on the same inventive concept, the embodiments of the present application also provide another electronic device, the electronic device includes a camera, a display screen, a memory, and a processor, wherein: the camera is used to capture images; the display screen is used to display Image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute FIG. 8, FIG. 10, and FIG. 12, The method steps described in any of the method embodiments described in FIG. 14 and FIG. 17.

Based on the same inventive concept, the embodiments of the present application also provide yet another electronic device. The electronic device includes at least two cameras, a memory, and a processor. The at least two cameras include a first camera and a second camera, wherein: The at least two cameras are both used to capture images; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it can be used to execute The method steps described in any of the method embodiments described in FIG. 21 or FIG. 22. Or it can be used to execute the method steps described in any of the method embodiments described in FIG. 8, FIG. 10, FIG. 12, FIG. 14, and FIG. 17.

The embodiment of the present application also provides a chip, which includes a transceiver unit and a processing unit. Wherein, the transceiver unit may be an input/output circuit or a communication interface; the processing unit is a processor, microprocessor, or integrated circuit integrated on the chip. The chip can execute the method steps described in any of the above-mentioned method embodiments in FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22.

The embodiment of the present application also provides a computer-readable storage medium on which an instruction is stored. When the instruction is executed, the method embodiment of FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22 is executed. The method steps described in any of the embodiments.

The embodiment of the present application also provides a computer program product containing instructions that, when executed, execute any of the above-mentioned method embodiments in FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22 Method steps described

It should be understood that in the various method embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here. In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

The above-mentioned embodiments are only used to illustrate the technical solution of the present application, but not to limit it. Although this application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should also understand that: any changes, modifications, or changes to some of the technical features based on the technical solutions described in the above-mentioned embodiments All equivalent replacements of should belong to the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A method for image automatic white balance, characterized in that the method is applied to an electronic device including a first camera, and includes:

Acquiring the shooting parameters used when the first camera shoots the original RAW domain image;

Acquiring a multi-channel image corresponding to the original RAW domain image;

Inputting input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel image;

Performing first processing on the multi-channel image to obtain a target image;

Wherein, the first processing includes white balance processing based on the multi-channel image and the first gain value.
The method according to claim 1, wherein the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
The method according to claim 1 or 2, wherein the first neural network model realizes the first gain by fusing the shooting parameters of the first camera and the image features of the multi-channel image. The value of the prediction.
The method according to claim 3, wherein the first processing specifically comprises:

Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera and the multi-channel image;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The method according to claim 3, wherein the first processing specifically comprises:

Sending the shooting parameters of the first camera and the multi-channel image to the server;

Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The method according to any one of claims 3-5, wherein the first neural network model includes a first feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the first neural network The process for the model to obtain the first gain value specifically includes:

Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;

Fusing the shooting parameters of the first camera and the first feature through the feature fusion network to obtain the fused feature;

The light source prediction network is used to predict according to the merged features to obtain the first gain value.
The method according to claim 1 or 2, wherein the input data further includes scene semantic information represented by the multi-channel image; the first neural network model is specifically obtained by fusing the first camera Shooting parameters, image features of the multi-channel image, and scene semantic information represented by the multi-channel image are used to predict the first gain value.
The method according to claim 7, wherein the first processing specifically comprises:

Extracting scene semantic information on the multi-channel image;

Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera, the multi-channel image, and the scene semantic information;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The method according to claim 7, wherein the processing specifically comprises:

Sending the shooting parameters of the first camera, the multi-channel image and the scene semantic information to a server;

Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The method according to any one of claims 7-9, wherein the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, through The process for the first neural network to obtain the first gain value specifically includes:

Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;

Performing feature extraction on the scene semantic information through the second feature extraction network to obtain a second feature;

Fusing the shooting parameters, the first feature, and the second feature through the feature fusion network to obtain the fused feature;

The light source prediction network is used to predict according to the merged features to obtain the first gain value.
The method according to claim 8, wherein the extraction of scene semantic information on the multi-channel image comprises:

At least one of object detection, scene classification, image scene segmentation, portrait segmentation, or face detection is performed on the multi-channel image to obtain the scene semantic information.
The method according to any one of claims 1-11, wherein the acquiring a multi-channel image corresponding to the original RAW domain image comprises:

The multi-channel image is obtained by preprocessing the original RAW domain image, and the preprocessing includes demosaicing.
The method according to any one of claims 1-12, wherein the multi-channel image is a three-channel image or a four-channel image.
A method for image automatic white balance, characterized in that the method is applied to an electronic device including at least two cameras, the at least two cameras include a first camera and a second camera, and the method includes:

Selecting a target camera from the at least two cameras according to a user's shooting instruction; the shooting instruction includes a shooting magnification;

When the target camera is the second camera, acquiring shooting parameters used when the second camera shoots a second original RAW domain image and a second multi-channel image corresponding to the second original RAW domain image;

Performing color migration on the second multi-channel image to obtain a migration image that fits the first camera;

At least inputting the shooting parameters of the second camera and the migration image into a first neural network model to obtain a first gain value of white balance; the first neural network model is associated with the first camera;

Obtaining a second gain value corresponding to the second camera according to the first gain value;

Performing first processing on the second multi-channel image to obtain a second target image;

Wherein, the first processing includes white balance processing based on the second multi-channel image and the second gain value.
The method according to claim 14, wherein the performing color migration on the second multi-channel image to obtain a migration image that fits the first camera comprises:

Based on the difference between the second camera and the first camera, a color shift operation is performed on the second multi-channel image to obtain a shifted image that fits the photosensitive characteristics of the image sensor corresponding to the first camera.
The method according to claim 14 or 15, wherein when the target camera is the first camera, the method further comprises:

Acquiring the shooting parameters used when the first camera captures the first original RAW domain image and the first multi-channel image corresponding to the first original RAW domain image;

At least inputting the shooting parameters of the first camera and the first multi-channel image into the first neural network model to obtain a third gain value of the white balance;

White balance processing is performed according to the first multi-channel image and the third gain value to obtain a first target image.
The method according to any one of claims 14-16, wherein the respective magnifications of the first camera and the second camera are different, or the first camera and the second camera respectively correspond to The image sensor is different.
The method according to claim 17, wherein the first camera and the second camera have different camera types, and the camera types include a main camera, a telephoto camera, a wide-angle camera, a medium telephoto camera, Ultra-telephoto camera, ultra-wide-angle camera.
The method according to claim 18, wherein when the first camera and the second camera are two of a main camera, a telephoto camera, and a wide-angle camera, at least one of the following is true:

The image sensor corresponding to the telephoto camera includes an RGGB module;

The image sensor corresponding to the main camera includes a RYYB module;

The image sensor corresponding to the wide-angle camera includes an RGGB module;

The shooting magnification of the telephoto camera is greater than the shooting magnification of the main camera;

The shooting magnification of the main camera is greater than the shooting magnification of the wide-angle camera.
The method according to any one of claims 14-19, wherein the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
The method according to any one of claims 14-20, wherein the multi-channel image is a three-channel image or a four-channel image.
A method for image automatic white balance, characterized in that the method is applied to an electronic device including a first camera and a second camera, and the method includes:

Selecting one of the first camera and the second camera as the target camera according to the shooting instruction of the user; the shooting instruction includes a shooting magnification;

Acquiring the shooting parameters used when the target camera shoots the original RAW domain image and the multi-channel image corresponding to the original RAW domain image;

Determining a neural network model corresponding to the target camera; wherein the first camera is associated with a first neural network model, and the second camera is associated with a second neural network model;

Inputting input data into the neural network model to obtain a white balance gain value; wherein, the input data includes at least the shooting parameters of the target camera and the multi-channel image;

Performing first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the gain value.
The method according to claim 22, wherein the magnifications of the first camera and the second camera are different, or the image sensors corresponding to the first camera and the second camera are different.
22. The method of claim 23, wherein the first camera and the second camera have different camera types, and the camera types include a main camera, a telephoto camera, a wide-angle camera, a medium telephoto camera, Ultra-telephoto camera, ultra-wide-angle camera.
The method according to claim 24, wherein when the first camera and the second camera are two of a main camera, a telephoto camera, and a wide-angle camera, at least one of the following is true:

The image sensor corresponding to the telephoto camera includes an RGGB module;

The image sensor corresponding to the main camera includes a RYYB module;

The image sensor corresponding to the wide-angle camera includes an RGGB module;

The shooting magnification of the telephoto camera is greater than the shooting magnification of the main camera;

The shooting magnification of the main camera is greater than the shooting magnification of the wide-angle camera.
The method according to any one of claims 22-25, wherein the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
A device for realizing image automatic white balance, characterized in that it comprises:

The parameter acquisition module is configured to acquire the shooting parameters used when the first camera shoots the original RAW domain image;

An image acquisition module for acquiring a multi-channel image corresponding to the original RAW domain image;

The processing module is configured to input input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel image; The multi-channel image is subjected to first processing to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value.
The device according to claim 27, wherein the shooting parameter comprises at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
The device according to claim 27 or 28, wherein the first neural network model implements the first gain by fusing the shooting parameters of the first camera and the image features of the multi-channel image. The value of the prediction.
The device according to claim 29, wherein the processing module is specifically configured to:

Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera and the multi-channel image;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The device according to claim 29, wherein the processing module is specifically configured to:

Sending the shooting parameters of the first camera and the multi-channel image to the server;

Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The device according to any one of claims 29-31, wherein the first neural network model comprises a first feature extraction network, a feature fusion network, and a light source prediction network;

The processing module is specifically used for:

Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;

Fusing the shooting parameters of the first camera and the first feature through the feature fusion network to obtain the fused feature;

The light source prediction network is used to predict according to the merged features to obtain the first gain value.
The device according to claim 27 or 28, wherein the input data further includes scene semantic information represented by the multi-channel image; the first neural network model is specifically obtained by fusing the first camera Shooting parameters, image features of the multi-channel image, and scene semantic information represented by the multi-channel image are used to predict the gain value.
The device according to claim 33, wherein the processing module is specifically configured to:

Extracting scene semantic information on the multi-channel image;

Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera, the multi-channel image, and the scene semantic information;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The device according to claim 33, wherein the processing module is specifically configured to:

Sending the shooting parameters of the first camera, the multi-channel image and the scene semantic information to a server;

Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;

Performing white balance processing on the multi-channel image by using the first gain value;

Performing post-processing on the white balance processed image to obtain the target image.
The device according to any one of claims 33-35, wherein the first neural network model comprises a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network;

The processing module is specifically used for:

Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;

Performing feature extraction on the scene semantic information through the second feature extraction network to obtain a second feature;

Fusing the shooting parameters, the first feature, and the second feature through the feature fusion network to obtain the fused feature;

The light source prediction network is used to predict according to the merged features to obtain the first gain value.
The device according to claim 34, wherein the processing module is specifically configured to:

At least one of object detection, scene classification, image scene segmentation, portrait segmentation, or face detection is performed on the multi-channel image to obtain the scene semantic information.
The device according to any one of claims 27-37, wherein the image acquisition module is specifically configured to:

The multi-channel image is obtained by preprocessing the original RAW domain image, and the preprocessing includes demosaicing.
The device according to any one of claims 27-38, wherein the multi-channel image is a three-channel image or a four-channel image.
An electronic device, wherein the electronic device includes a camera, a memory, and a processor, wherein: the camera is used to capture images; the memory is used to store programs; the processor is used to execute The program, when the processor executes the program stored in the memory, is specifically configured to execute the method according to any one of claims 1 to 13.
An electronic device, characterized in that the electronic device includes at least two cameras, a memory and a processor, the at least two cameras include a first camera and a second camera, wherein: the at least two cameras are both used for Take an image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute claims 14-21 or 22- The method of any one of 26.
A computer-readable storage medium, wherein program instructions are stored in the computer-readable storage medium, and when the program instructions are run by a processor, it implements claims 1-13 or claims 14-21 or 22 -The method of any one of 26.
A chip, characterized in that, the chip includes a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute claims 1-13 or 14-21 or The method of any one of 22-24.