WO2021204202A1 - Image auto white balance method and apparatus - Google Patents

Image auto white balance method and apparatus Download PDF

Info

Publication number
WO2021204202A1
WO2021204202A1 PCT/CN2021/085966 CN2021085966W WO2021204202A1 WO 2021204202 A1 WO2021204202 A1 WO 2021204202A1 CN 2021085966 W CN2021085966 W CN 2021085966W WO 2021204202 A1 WO2021204202 A1 WO 2021204202A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
camera
channel image
neural network
white balance
Prior art date
Application number
PCT/CN2021/085966
Other languages
French (fr)
Chinese (zh)
Inventor
冯思博
陈梓艺
万磊
贾彦冰
翟其彦
曾毅华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010817963.6A external-priority patent/CN113518210B/en
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021204202A1 publication Critical patent/WO2021204202A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/10Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N9/00Details of colour television systems
    • H04N9/64Circuits for processing colour signals
    • H04N9/73Colour balance circuits, e.g. white balance circuits or colour temperature control

Definitions

  • This application relates to the field of artificial intelligence, and in particular to methods and devices for image automatic white balance in the field of photography technology.
  • the human visual system has the characteristics of color constancy, that is, the human visual system can resist the color change of the light source, so as to constantly perceive the color of the object.
  • the image sensor Sensor
  • the color constancy of the human visual system In order to eliminate the influence of the light source on the imaging of the image sensor, simulate the color constancy of the human visual system, and ensure that the white seen in any scene is true white, it is necessary to introduce automatic white balance technology.
  • White balance is an indicator that describes the accuracy of the white color generated after the three primary colors of red, green, and blue are mixed in the display.
  • the automatic white balance technology is mainly used to solve the problem of image color cast under different light sources, so that the image of the scene in the image is in line with people. The color vision habits of the eye.
  • the computational color constancy in the automatic white balance processing is dedicated to solving this problem. Its main purpose is to calculate the color of an unknown light source represented by any image, and then use the light source color to perform color correction on the input image to achieve Display under standard white light.
  • the embodiments of the present application provide a method and device for automatic image white balance, which can improve the accuracy and stability of the image white balance of an electronic device, and improve the user experience.
  • an embodiment of the present application provides a method for automatic image white balance, which is applied to an electronic device including a first camera, and includes: acquiring shooting parameters used when the first camera shoots an original RAW domain image; Obtain the multi-channel image corresponding to the original RAW domain image; input the input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel Image; performing first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value.
  • the original RAW image can be referred to as the RAW image
  • the RAW image can be the raw data of the CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal.
  • the shooting parameters indicate parameters used when performing shooting, such as shooting parameters used by a camera, an image sensor, and so on. Alternatively, shooting parameters can also be understood as control parameters generated when the processor controls the camera and image sensor during shooting.
  • the shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.
  • a multi-channel image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels.
  • the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.
  • This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation, so as to assist the white balance processing process.
  • the processing includes white balance processing implemented using a neural network model, and the neural network model is used to obtain a white balance gain value or an image light source value (gain value) required for white balance processing at least according to the shooting parameters and the multi-channel image. And the image light source value is the reciprocal relationship).
  • the neural network model described in the embodiments of this application may be a single neural network model, or a combination of two or more neural network models.
  • the electronic device After outputting the gain value or the image light source value, the electronic device can use the gain value or the image light source value to perform white balance processing on the channel image, thereby realizing the correction of the image chromatic aberration caused by the light source color temperature, so that the color of the object in the image is close to its original
  • the overall effect of the image is in line with the visual and cognitive habits of the human eye.
  • the embodiment of the present application uses the multi-channel image corresponding to the RAW image as the input of the neural network model, and provides more color information for the AWB neural network model.
  • the shooting parameters are added as the input of the AWB neural network model to provide shooting configuration information for light source estimation, which can improve the ability of the AWB neural network model to distinguish different light source scenes and ensure good light source estimation accuracy. Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
  • the neural network model may be a model constructed based on deep learning, for example, it may be a deep neural network (DNN) model, a convolutional neural network, One of CNN), Long Short-Term Memory (LSTM), or Recurrent Neural Network (RNN), or a combination of multiple, etc.
  • DNN deep neural network
  • LSTM Long Short-Term Memory
  • RNN Recurrent Neural Network
  • the first neural network model realizes the prediction of the first gain value by fusing the shooting parameters of the first camera and the image features of the multi-channel image .
  • the first neural network model may include a first feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the process of obtaining the first gain value through the first neural network model is specific It includes: performing feature extraction on the multi-channel image through the first feature extraction network (for example, performing statistical operations on the pixels of the channel image through convolution processing) to obtain the first feature; and fusing through the feature fusion network (fusion
  • the method may be one or more combinations of concat function processing, conv2d function processing, element multiplication processing, element addition processing, etc.
  • the shooting parameters of the first camera and the first feature are obtained to obtain the fused feature; by
  • the light source prediction network performs prediction according to the fused features, and obtains the first gain value or the image light source value, which is used in the subsequent white balance processing process.
  • the AWB neural network model in the embodiments of the present application can be applied to all scenes, and a large amount of training data is used during model training.
  • the training data includes data obtained in a bright light scene and data obtained in a dark light scene.
  • the neural network it is difficult for the neural network to achieve high-precision fitting in the whole scene, and the added camera parameters can provide a priori information for the shooting scene, help the neural network to distinguish between bright light scenes and dark light scenes, thereby improving the two types
  • the light source estimation accuracy of the scene is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
  • the solution of the present application can be applied to an independent electronic device, and the neural network model can be configured in the electronic device.
  • the first processing specifically includes: obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera and the multi-channel image; The first gain value performs white balance processing on the multi-channel image; performs post-processing on the white balance processed image to obtain the target image. Therefore, when the electronic device has sufficient computing resources, the computing power of the electronic device is fully utilized to perform neural network calculations, which improves processing efficiency and reduces white balance processing delay.
  • the solution of the present application can be applied to electronic devices in the end-cloud system, and the neural network model can be configured in the cloud server in the end-cloud system.
  • the first processing specifically includes: sending the shooting parameters of the first camera and the multi-channel image to a server; receiving the first gain value from the server, where the first gain value is Obtained through a first neural network model configured on the server; white balance processing is performed on the multi-channel image using the first gain value; post-processing the white balance processed image to obtain the target image.
  • the computing power of the cloud server can also be used to calculate the neural network model to ensure the accuracy and stability of the white balance processing, so that the solution of this application can be applied to different types of devices. Improve user experience.
  • the input data further includes scene semantic information represented by the multi-channel image; the first neural network model specifically integrates the shooting of the first camera Parameters, image features of the multi-channel image, and scene semantic information represented by the multi-channel image are used to predict the first gain value.
  • the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the first gain is obtained through the first neural network
  • the value process specifically includes: performing feature extraction on the multi-channel image through the first feature extraction network (for example, performing a statistical operation on the pixels of the channel image through convolution processing) to obtain the first feature; and through the second feature extraction network.
  • the feature extraction network performs feature extraction on the scene semantic information (for example, through convolution processing to realize the analysis/perception of the scene information of the channel image), to obtain the second feature; through the feature fusion network fusion (for example, the fusion method may be concat One or more combinations of function processing, conv2d function processing, element multiplication processing, element addition processing, etc.) shooting parameters, the first feature and the second feature to obtain the fused feature; through the light source
  • the prediction network makes predictions according to the fused features, and obtains the first gain value or the image light source value, which is used in the subsequent white balance processing process.
  • the scene semantic information represents the semantic features related to the shooting scene represented by the image.
  • various types of shooting scenes can be defined.
  • shooting scenes can be classified based on light source types, such as cold light source scenes, warm light source scenes, single light source scenes, multiple light source scenes, and so on.
  • shooting scenes can be classified based on image content, such as portrait shooting scenes, non-portrait shooting scenes, object shooting scenes, landscape shooting scenes, and so on.
  • the scene semantic information can provide a priori semantic information for the image to a large extent, help the AWB neural network to distinguish different scenes, and then improve the overall accuracy of the AWB neural network.
  • the network output is unstable, which affects the skin color sense.
  • face detection information is added as the scene semantic information into the neural network, the neural network will increase the attention of the face area during the training process. , So as to improve the fitting accuracy of the network in the face scene.
  • the solution of the present application can be applied to an independent electronic device, and the neural network model can be configured in the electronic device.
  • the first processing specifically includes: extracting scene semantic information from the multi-channel image; according to the shooting parameters of the first camera, the multi-channel image and the scene semantic information, by configuring in the electronic device Obtain the first gain value by the first neural network model; use the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image .
  • the solution of the present application can be applied to electronic devices in the end-cloud system, and the neural network model can be configured in the cloud server in the end-cloud system.
  • the processing specifically includes: The shooting parameters, the multi-channel image, and the scene semantic information are sent to a server; the first gain value is received from the server, and the first gain value is passed through a first neural network model configured on the server Obtained; using the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image.
  • performing scene semantic information extraction on the multi-channel image includes: performing object detection, scene classification, image scene segmentation, portrait segmentation, or face segmentation on the multi-channel image At least one operation in the detection to obtain the scene semantic information.
  • the scene classification algorithm is used to realize the classification of faces and non-faces, the classification of single light sources and multiple light sources, the color temperature classification of light sources, or the classification of indoor and outdoor scenes, and so on.
  • the image scene segmentation algorithm can be used to segment the picture to generate a mask map; alternatively, the scene classification algorithm, object detection algorithm, face detection algorithm, skin color segmentation algorithm and other technologies can also be used to generate the mask map.
  • the mask map can provide more information related to the shooting scene than a single frame of the AWB neural network model provided by this application, thereby enhancing the AWB neural network's attention to different shooting scenes, helping the neural network to fit and converge, and achieve better High prediction accuracy.
  • the acquiring a multi-channel image corresponding to the original RAW domain image includes: preprocessing the original RAW domain image to obtain the multi-channel image, the preprocessing Including demosaicing.
  • the preprocessing Including demosaicing.
  • Using a simplified demosaicing operation will make the length and width of the multi-channel image half the length and width of the down-sampled RAW image, which can increase the speed of subsequent algorithms.
  • the pre-processing process may also include black level correction (BLC) and lens shadow correction (LSC), and dark current can be reduced through BLC processing.
  • BLC black level correction
  • LSC lens shadow correction
  • dark current can be reduced through BLC processing.
  • the influence on the image signal can be eliminated by the LSC processing.
  • it may also include image down-sampling processing and noise reduction processing.
  • the white balance processed image can also be post-processed through some image enhancement algorithms to further improve the quality of the image, obtain the final target image for display, and output it to the electronic device Display on the screen.
  • the image enhancement algorithm may include, for example, operations such as gamma correction, contrast enhancement, dynamic range enhancement, or image sharpening.
  • the multi-channel image is a three-channel image or a four-channel image.
  • a training process for a neural network model may be: the training data includes the annotation of the light source color of the image, the multi-channel image obtained by preprocessing the RAW image, the shooting parameters, Optionally, it also includes scene semantic information.
  • the model After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model.
  • the image used for training the neural network model may not be a single frame image, but a labeled video sequence.
  • network structures such as LSTM and RNN can be introduced, and time-domain related strategies can also be used during model training.
  • the video sequence can be used as training data, and the AWB neural network model adds the images of the previous and subsequent frames of the current image as the model input.
  • an embodiment of the present application provides a method for automatic image white balance, which is applied to an electronic device including at least two cameras, the at least two cameras including a first camera and a second camera, and the method includes: Select a target camera from the at least two cameras according to the user's shooting instruction; the shooting instruction includes a shooting magnification; when the target camera is the second camera, acquire the second camera to shoot the second original RAW domain The shooting parameters used in the image and the second multi-channel image corresponding to the second original RAW domain image; perform color migration on the second multi-channel image to obtain a migrated image that fits the first camera; at least transfer the The shooting parameters of the second camera and the migration image are input to the first neural network model to obtain the first gain value of the white balance; the first neural network model is associated with the first camera, specifically, the first The neural network model is trained according to the data collected by the first camera and the shooting parameters of the first camera; the first gain value is processed into the second gain value corresponding to the second camera; The second multi-channel image is
  • the number of cameras configured in the electronic device is not limited.
  • the type of each camera is not limited.
  • the so-called “different types” can be cameras with different shooting magnifications (or zoom magnifications) or focal lengths, such as main cameras, telephoto cameras, wide-angle cameras, medium-telephoto cameras, ultra-telephoto cameras, or ultra-wide-angle cameras, etc. .
  • the so-called “different types” may mean that the image sensors corresponding to each camera are different.
  • the image sensor corresponding to the wide-angle camera is an RGGB module
  • the image sensor corresponding to a conventional camera is a RYYB module.
  • the image sensor corresponding to the telephoto camera includes the RGGB module
  • the image sensor corresponding to the main camera includes a RYYB module
  • the image sensor corresponding to the wide-angle camera includes an RGGB module
  • the shooting magnification of the telephoto camera is greater than the shooting magnification of the main camera
  • the main The shooting magnification of the camera is greater than the shooting magnification of the wide-angle camera.
  • performing color migration on the second multi-channel image to obtain a migration image that fits with the first camera includes: performing a color shift on the second camera based on the difference between the second camera and the first camera.
  • the second multi-channel image performs a color shift operation to obtain a shifted image that fits the photosensitive characteristics of the image sensor corresponding to the first camera.
  • the migrated image (which can be referred to as the migrated image for short) is used as the input of the first AWB neural network with the camera parameters of the second camera, and the color value of the light source that meets the shooting characteristics of the first camera is calculated.
  • the color value of the light source undergoes a migration operation, so that the color value of the light source is migrated to the color value of the light source corresponding to the second camera.
  • the automatic white balance method described in this application can make the neural network model compatible with two or more cameras at the same time, expand the applicable scenarios, improve the adaptability to multiple lenses, and greatly improve the user experience.
  • the method when the target camera is the first camera, the method further includes: acquiring the shooting parameters and the shooting parameters used when the first camera shoots the first original RAW domain image.
  • the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  • the multi-channel image is a three-channel image or a four-channel image.
  • an embodiment of the present application provides a method for automatic white balance of an image.
  • the method is applied to an electronic device including at least two cameras.
  • the at least two cameras include a first camera and a second camera. include:
  • the target camera is selected from the at least two cameras according to the user's shooting instruction; the shooting instruction includes a shooting magnification; the shooting parameters used when the target camera shoots the original RAW domain image and the number corresponding to the original RAW domain image are acquired.
  • Channel image determine the neural network model corresponding to the target camera; wherein, the first camera is associated with a first neural network model, and the second camera is associated with a second neural network model.
  • the first neural network model The network model is trained based on the data collected by the first camera and the shooting parameters of the first camera, and the second neural network model is obtained based on the data collected by the second camera and the shooting of the second camera.
  • Parameter training is obtained; input data into the neural network model to obtain a white balance gain value; wherein, the input data includes at least the shooting parameters of the target camera and the multi-channel image; the multi-channel image is performed
  • the first processing obtains a target image; wherein the first processing includes white balance processing based on the multi-channel image and the gain value.
  • the magnifications of the first camera and the second camera are different, or the image sensors corresponding to the first camera and the second camera are different.
  • the camera types of the first camera and the second camera are different, and the camera types include a main camera, a telephoto camera, a wide-angle camera, a medium-telephoto camera, an ultra-telephoto camera, and an ultra-wide-angle camera.
  • different cameras can be configured with different neural network models, for example, the first camera corresponds to the first neural network model, and the second camera corresponds to the second neural network model; the first neural network model can be Trained from the data collected by the first camera (or the same device, or a device similar to the first camera), the second neural network model can be a second camera (or the same device, or a device similar to the second camera) Equipment) data collected by training.
  • the data of different cameras can be processed independently, and the pertinence and accuracy of the neural network model can be improved.
  • the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  • the multi-channel image is a three-channel image or a four-channel image.
  • an embodiment of the present application provides a device for realizing automatic white balance of an image, including: a parameter acquisition module for acquiring shooting parameters used when the first camera captures an original RAW domain image; an image acquisition module for Acquire a multi-channel image corresponding to the original RAW domain image; a processing module for inputting input data into a first neural network model to obtain a first gain value of white balance; the input data includes at least the shooting parameters of the first camera And the multi-channel image; also used to perform first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value .
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes a camera, a memory, and a processor, and optionally a display screen.
  • the display screen is used for displaying images; wherein: the camera is used for Take an image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute any embodiment of the first aspect of the present application Described method.
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes at least two cameras, a memory, and a processor.
  • the at least two cameras include a first camera and a second camera, and optionally a display
  • the display screen is used to display images.
  • the at least two cameras are both used to capture images;
  • the memory is used to store a program;
  • the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, specifically It is used to execute the method described in any embodiment of the second aspect of the present application.
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes at least two cameras, a memory, and a processor.
  • the at least two cameras include a first camera and a second camera, and optionally a display
  • the display screen is used to display images.
  • the at least two cameras are both used to capture images;
  • the memory is used to store a program;
  • the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, specifically It is used to execute the method described in any embodiment of the third aspect of the present application.
  • an embodiment of the present application provides a chip.
  • the chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, so as to perform operations such as the first aspect or the second aspect. Or the method described in any embodiment of the third aspect.
  • an embodiment of the present invention provides yet another non-volatile computer-readable storage medium; the computer-readable storage medium is used to store the information described in the first aspect, the second aspect, or any embodiment of the third aspect
  • the implementation code of the method When the program code is executed by a computing device, the method described in any embodiment of the first aspect or the second aspect or the third aspect can be implemented.
  • an embodiment of the present invention provides a computer program product; the computer program product includes program instructions, and when the computer program product is executed by a computing device, it executes any of the aforementioned first, second, or third aspects.
  • the computer program product may be a software installation package, and the computer program product may be downloaded and executed on the controller to implement the method described in any embodiment of the first aspect or the second aspect or the third aspect.
  • FIG. 1 is an exemplary diagram of an electronic device provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is an example diagram of a terminal cloud interaction scenario provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the device structure of a device-cloud interaction scenario provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the device structure of a chip provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another system architecture provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of an image automatic white balance method provided by an embodiment of the present application.
  • FIG. 9 is an example diagram of a RAW image and a three-channel image provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the structure and processing flow of a neural network model provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of another image automatic white balance method provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of the structure and processing flow of another neural network model provided by an embodiment of the present application.
  • FIG. 14 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application.
  • FIG. 15 is an example diagram of an image preprocessing process provided by an embodiment of the present application.
  • FIG. 16 is an example diagram of an image post-processing process provided by an embodiment of the present application.
  • FIG. 17 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application.
  • FIG. 18 is a schematic diagram of a user operation scenario provided by an embodiment of the present application.
  • FIG. 19 is a block diagram of a possible software structure of a terminal according to an embodiment of the present application.
  • FIG. 20 is an example diagram of some model training processes provided by embodiments of the present application.
  • FIG. 21 is an example diagram of a processing flow in a multi-camera scenario provided by an embodiment of the present application.
  • FIG. 22 is another example diagram of a processing flow in a multi-camera scenario provided by an embodiment of the present application.
  • FIG. 23 is an example diagram of a target image under different shooting magnifications according to an embodiment of the present application.
  • FIG. 24 is a schematic diagram of the structure of an apparatus provided by an embodiment of the present application.
  • a system, product, or device that includes a series of units/devices is not limited to the listed units/devices, but optionally includes unlisted units/devices, or optionally includes these products or devices Inherent other units/devices.
  • the color of the light source can also be called color temperature in terms of colorimetry.
  • the color emitted by a black body at 3200K is defined as white
  • the color emitted by a black body at 5600K is defined as blue, and so on.
  • objects in the environment including people, objects, scenes, etc.
  • the color of the light source in the environment will affect the color of the image of the object, directly or indirectly changing the object Its own color, forming a chromatic aberration.
  • white objects will become reddish when illuminated by light with low color temperature (such as incandescent lamps, candles, sunrise and sunset light source scenes), and will become blue when illuminated by light with high color temperature (such as cloudy sky, snowy sky, tree shade and other light source scenes).
  • low color temperature such as incandescent lamps, candles, sunrise and sunset light source scenes
  • high color temperature such as cloudy sky, snowy sky, tree shade and other light source scenes
  • AVB Auto White Balance
  • white balance is to correct the chromatic aberration caused by different color temperatures so that white objects can appear as original white. Color objects are also as close to their original colors as possible, so that the overall effect of the image is in line with the visual and cognitive habits of the human eye.
  • white balance processing can be implemented based on the lambert reflection model.
  • the processing algorithm of the white balance processing is shown in the following formula 1:
  • R represents the pixel value (Rr, Gr, Br) corresponding to the image after white balance processing, and R is close to or equal to the color of the object under neutral light.
  • I represents an image (Ri, Gi, Bi) captured by an electronic device, and the image may be the multi-channel image described in the embodiment of the present application
  • L may represent light source color information (R1, G1, B1), for example, may specifically be the image light source value described in the embodiment of the present application. It should be noted that L here is a broad concept. In camera imaging, L can also include the bias of the image sensor to the color of the object.
  • the task of the white balance processing is to estimate L through I and possible additional inputs, and further obtain the color R of the object under neutral light, so as to eliminate the imaging chromatic aberration affected by the light source as much as possible, so that the white color will appear under different light sources.
  • White, and other color objects are as close as possible to their original colors.
  • R represents the pixel value (Rr, Gr, Br) corresponding to the image after white balance processing, and R is close to or equal to the color of the object under neutral light.
  • I represents an image (Ri, Gi, Bi) captured by an electronic device, and the image may be the multi-channel image described in the embodiment of the present application
  • G represents the gain value of the white balance (1/Rl, 1/Gl, 1/Bl).
  • the task of the white balance processing is to estimate G through I and possible additional inputs, and further obtain the color R of the object under neutral light, so as to eliminate the imaging chromatic aberration affected by the light source as much as possible, so that the white color will appear under different light sources.
  • White, and other color objects are as close as possible to their original colors.
  • the white balance processing in this article mainly uses the color information of the light source as an example to describe the solution, and the realization of the gain value solution can be implemented similarly, for example, the white balance is directly obtained based on the neural network model. Gain value, or obtain the image color information based on the neural network model, and further obtain the white balance gain value according to the image color information. This article will not expand the description.
  • the prior art proposes some methods to determine the color of the light source, such as the gray world algorithm, the perfect reflector algorithm, or the dynamic threshold algorithm to determine the light source color, or the color histogram of the image to determine the light source Color, etc.
  • the embodiments of the application provide an automatic white balance method for images/videos based on deep learning, which can overcome the above technical defects, improve the accuracy of AWB in full scenes, and improve the stability of AWB of images/videos. And the tendency to ensure stability in ambiguous scenes such as multiple light sources to meet real-time requirements.
  • the method described in this application can be applied to an independent electronic device 10.
  • the above-mentioned electronic device 10 may be mobile or fixed.
  • the electronic device 10 may be a mobile phone (mobile phone) with image processing function, a tablet personal computer (TPC), a notebook computer, or a media player.
  • Smart TV laptop computer (LC), personal digital assistant (personal digital assistant, PDA), personal computer (PC), camera, SLR, video camera, smart watch, surveillance equipment, augmented reality (augmented reality) , AR) devices, virtual reality (virtual reality, VR) devices, wearable devices (WD) or in-vehicle devices, etc., which are not limited in the embodiments of the present application.
  • the electronic device 10 includes: at least one general-purpose processor 13, a memory 15 (one or more computer-readable storage media), and an image acquisition device 11, an image signal processor (ISP) 12 and display device 14, these components can communicate on one or more communication buses.
  • ISP image signal processor
  • the image acquisition device 11 may include components such as a camera 111, an image sensor (Sensor) 112, etc., which are used to collect images or videos of the shooting scene.
  • the images collected by the image acquisition device 11 may be one or more original RAW domain images.
  • the original RAW domain image can be referred to as RAW image for short.
  • the multiple original RAW domain images may form a sequence of image frames.
  • the camera 111 may be a monocular camera or a binocular camera, and is arranged in a front position (i.e., a front camera) or a rear position (i.e., a rear camera) on the housing of the main body of the electronic device 10,
  • the image sensor 112 is a photosensitive element, and this application does not limit the type of the photosensitive element. For example, it may be a Complementary Metal-Oxide Semiconductor (CMOS) or a Charge Coupled Device (CCD).
  • CMOS Complementary Metal-Oxide Semiconductor
  • CCD Charge Coupled Device
  • the function of the image sensor 112 is to capture the optical image collected by the camera 111 and convert it into an electrical signal usable by the back-end ISP 12.
  • the image sensor 112 may provide shooting parameters required for actual shooting.
  • the shooting parameters include, for example, at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  • the ISO sensitivity is the sensitivity specified by the International Standards Organization (ISO), also known as the ISO value, which is used to measure the sensitivity of the sensor to light.
  • ISO International Standards Organization
  • the main function of ISP12 is to process the signal output by the front-end image sensor 112.
  • the algorithms included in ISP12 in the embodiment of this application mainly include Auto White Balance (AWB) algorithm.
  • AEB Auto White Balance
  • it may also include but is not limited to one of the following Or multiple processing algorithms: Automatic Exposure Control (AEC), Automatic Gain Control (AGC), color correction, lens correction, noise removal/noise reduction, dead pixel removal, linear correction, color interpolation, Image downsampling, level compensation, etc.
  • AEC Automatic Exposure Control
  • AGC Automatic Gain Control
  • color correction lens correction
  • noise removal/noise reduction dead pixel removal
  • linear correction color interpolation
  • color interpolation image downsampling
  • level compensation etc.
  • image enhancement algorithms such as gamma correction, contrast enhancement and sharpening, color noise removal and edge enhancement in the YUV color space, color enhancement, and color space conversion (such as RGB to YUV) and so on.
  • ISP12 image enhancement algorithms can be integrated into Field Programmable Gate Array (Field Programmable Gate Array). Array, FPGA) or digital signal processor (Digital Signal Processor, DSP), cooperate with ISP12 to complete the image processing process together.
  • FPGA Field Programmable Gate Array
  • DSP Digital Signal Processor
  • the general-purpose processor 13 may be any type of device capable of processing electronic instructions.
  • the electronic device 10 in this application may include one or more general-purpose processors 13, such as a central processing unit (CPU) 131 and neural network processing. One or two of Neural-network Processing Unit (NPU) 132. In addition, it may also include one or more of graphics processing units (GPUs), microprocessors, microcontrollers, main processors, controllers, and ASICs (Application Specific Integrated Circuits), etc. .
  • the general-purpose processor 13 executes various types of digital storage instructions, such as software or firmware programs stored in the memory 813, which enables the computing node 800 to provide a wide variety of services. For example, the processor 811 can execute programs or process data to perform at least a part of the methods discussed herein.
  • CPU131 the function of CPU131 is mainly to parse computer instructions and process data in computer software, realize overall control of electronic device 10, and control all hardware resources of electronic device 10 (such as storage resources, communication resources, I/0 interfaces, etc.) Perform control deployment.
  • NPU132 is a general term for a new type of processor based on neural network algorithms and acceleration. NPU is specifically designed for artificial intelligence to accelerate neural network operations and solve the problem of low efficiency of traditional chips in neural network operations.
  • NPU132 does not constitute a limitation on this application.
  • NPU132 can also be deformed and replaced with other processors with similar functions, such as Tensor Processing Unit (TPU). Deep learning processor (Deep learning Processing Unit, DPU), etc.
  • TPU Tensor Processing Unit
  • DPU Deep learning processor
  • the NPU 132 can undertake tasks related to neural network calculations.
  • the NPU132 can be used to calculate the AWB neural network according to the image information provided by the ISP12 (such as multi-channel images) and the information provided by the image acquisition device (such as shooting parameters) to obtain the color information of the light source, and then feed the color information of the light source to the ISP12. So that ISP12 further completes the AWB process.
  • the CPU 131 when the CPU 131 exists and the NPU 132 does not exist, the CPU 131 can undertake tasks related to neural network calculations. That is, the CPU131 performs AWB neural network calculations based on the image information provided by the ISP12 (such as multi-channel images) and the information provided by the image acquisition device (such as shooting parameters) to obtain the light source color information, and then feeds back the light source color information to the ISP12 to facilitate ISP12 further completes the AWB process.
  • the image information provided by the ISP12 such as multi-channel images
  • the image acquisition device such as shooting parameters
  • the display device 14 is used to display the currently previewed shooting scene and the shooting interface when the user needs to shoot, or to display the target image after the white balance processing.
  • the display device 14 can also be used to display information requiring user operations or information provided to the user, as well as various graphical user interfaces of the electronic device 10. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.
  • the display device 14 may specifically include a display screen (display panel).
  • the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
  • the display device 14 may also be a touch panel (touch screen, touch screen).
  • the touch panel may include a display screen and a touch-sensitive surface. When the touch-sensitive surface detects a touch operation on or near it, it is sent to the CPU 131 to determine The type of the touch event, and then the CPU 131 provides a corresponding visual output on the display device 14 according to the type of the touch event.
  • the memory 15 may include a volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM), and a high-speed cache; the memory may also include a non-volatile memory (Non-Volatile Memory), such as a read-only memory. (Read-Only Memory, ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 604 may also include a combination of the foregoing types of memories.
  • volatile memory volatile Memory
  • RAM random access memory
  • non-Volatile Memory such as a read-only memory.
  • ROM Read-Only Memory
  • Flash Memory Flash Memory
  • HDD Hard Disk Drive
  • SSD Solid-State Drive
  • the memory 604 may also include a combination of the foregoing types of memories.
  • the memory 15 can be used to store the RAW image collected by the image collection device 11, the target image after white balance processing, the image information of the front and back frames, shooting parameters, scene semantic information and other data; the memory 15 can also be used to store program instructions for the processor Call and execute the method of image automatic white balance described in this application.
  • the automatic white balance of the image can be achieved through the following process: when the electronic device 10 performs shooting, objects (people, objects, scenes, etc.) in the external environment are projected to the image through the optical image collected by the camera 111 On the surface of the sensor 112, it is converted into an electrical signal, and the electrical signal is converted into a digital image signal after analog-to-digital conversion (A/D conversion), and the digital image signal is a RAW image (for example, Bayer format).
  • A/D conversion analog-to-digital conversion
  • the digital image signal is a RAW image (for example, Bayer format).
  • the image sensor 112 sends the RAW image to the ISP 12 for processing.
  • the ISP 12 When the ISP 12 needs to perform AWB, the ISP 12 sends image information (for example, a multi-channel image) to the general-purpose processor 13, and the image acquisition device 11 sends the shooting parameters to the general-purpose processor 13.
  • the general-purpose processor 13 (for example, the CPU131 or the NPU132) can use the input information to calculate the neural network model to obtain the light source color information corresponding to the image. Furthermore, the color information of the light source is fed back to the ISP12, and the ISP12 completes AWB according to the color information of the light source and performs other image processing to obtain a target image, such as an image in YUV or RGB format. Then, the ISP 12 transmits the target image to the CPU 131 through the I/O interface, and the CPU 131 sends the target image to the display device 14 for display.
  • image information for example, a multi-channel image
  • the general-purpose processor 13 for example, the CPU131 or the NPU132
  • the ISP12 completes AWB according to
  • the electronic device 10 may also include more or fewer components than those shown in the figure, or a combination of certain components, or a different component arrangement.
  • the device structure shown in FIG. 2 does not constitute a limitation on the electronic device 10.
  • the method described in this application can be applied to a scenario of end-cloud interaction.
  • the end-cloud system includes an electronic device 20 and a cloud server 30.
  • the electronic device 20 and the cloud server 30 can communicate with each other, and the communication method is not limited to a wired or wireless method.
  • the electronic device 20 may be mobile or fixed.
  • the electronic device 10 may be a mobile phone (mobile phone), a tablet PC, a notebook computer, a media player, a smart TV, a notebook computer, or a personal computer with image processing functions.
  • Digital assistants, personal computers, cameras, SLRs, camcorders, smart watches, monitoring devices, augmented reality devices, virtual reality devices, wearable devices, or in-vehicle devices, etc., are not limited in the embodiments of the present application.
  • the cloud server 30 may include one or more servers, or one or more processing nodes, or one or more virtual machines running on the server.
  • the cloud server 30 may also be called a server cluster, a management platform, or a data processing center. Etc., the embodiment of the present application does not limit it.
  • the end-cloud system includes an electronic device 20 and a cloud server 30.
  • the electronic equipment 20 includes: at least one general-purpose processor 23, a memory 25, an image acquisition device 21, an image signal processor ISP 22, a display device 24, and a communication device 26. These components can communicate on one or more communication buses to achieve Functions of the electronic device 20.
  • the cloud server 30 includes a memory 33, a neural network processor NPU 31, and a communication device 32. These components can communicate on one or more communication buses to realize the functions of the cloud server 30.
  • the electronic device 20 establishes a communication connection with the communication device 32 of the cloud server 30 through the communication device 26, and the communication method is not limited to a wired or wireless method.
  • the communication device 26 and the communication device 32 can be used to send and receive wireless signals to and from each other.
  • the wireless communication methods include but are not limited to one of radio frequency (RF), data communication, Bluetooth, WiFi, etc. Multiple.
  • the electronic device 20 in the end-cloud system may not include the NPU.
  • the embodiments of the present application make full use of the computing resources of the cloud server, which is beneficial to reduce the operating burden and configuration requirements of the electronic device 20, and improve the user experience.
  • the automatic white balance of the image can be realized through the following process: when the electronic device 20 performs shooting, objects (people, objects, scenes, etc.) in the external environment are collected by the camera in the image collection device 21 .
  • the optical image is projected onto the image sensor in the image acquisition device 21 and converted into an electrical signal.
  • the electrical signal is converted into a digital image signal after analog-to-digital conversion (A/D conversion).
  • the digital image signal is a RAW image (such as Bayer format). ).
  • the image capture device 21 sends the RAW image to the ISP 22 for processing.
  • the ISP 22 When the ISP 22 needs to perform AWB, the ISP 22 sends image information (for example, a multi-channel image) to the general-purpose processor 23, and the image acquisition device 21 sends the shooting parameters to the general-purpose processor 23.
  • the general-purpose processor 23 (for example, the CPU 231) may further send the above-mentioned information to the cloud server 30 through the communication device 26.
  • the NPU 31 uses the above-mentioned input information (multi-channel image, shooting parameters, etc.) to calculate a neural network model to obtain the light source color information corresponding to the image. Furthermore, the color information of the light source is fed back to the electronic device 20 through the communication device 32.
  • the color information of the light source is sent to the ISP 22.
  • the ISP 22 performs AWB according to the color information of the light source and performs other image processing to obtain the target image.
  • the target image is for example YUV. Or an image in RGB format.
  • the ISP 22 transmits the target image to the CPU 231 through the I/O interface, and the CPU 231 sends the target image to the display device 24 for display.
  • the electronic device 20 and the cloud server 30 may also include more or less components than those shown in the figure, or a combination of certain components, or a different component arrangement.
  • the device structure shown in FIG. 2 does not constitute a limitation to the present application.
  • FIG. 5 is a hardware structure of a chip provided by an embodiment of the present application.
  • the chip includes a neural network processor NPU300.
  • the NPU 300 may be set in the electronic device 10 as shown in FIG. 2 to complete the calculation work of the neural network.
  • the NPU 300 is the NPU 132 described in FIG. 2.
  • the NPU 300 can be set in the cloud server 30 as shown in FIG. 4 to complete the calculation of the neural network.
  • the NPU 300 is the NPU 31 described in FIG. 4.
  • the NPU300 can be mounted on a central processing unit (CPU) as a coprocessor, and the main CPU distributes tasks.
  • the core part of the NPU 300 is the arithmetic circuit 303.
  • the arithmetic circuit 303 is controlled by the controller 304 to extract matrix data from the memory and perform multiplication operations.
  • the arithmetic circuit 303 includes multiple processing units (processengine, PE).
  • the arithmetic circuit 303 is a two-dimensional systolic array; the arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the arithmetic circuit 303 is a general-purpose matrix processor.
  • the arithmetic circuit 303 fetches the corresponding data of the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303; the arithmetic circuit 303 receives the input
  • the memory 301 takes the data of matrix A and matrix B to perform matrix operations, and the partial results or final results of the obtained matrix are stored in an accumulator 308 (accumulator).
  • the vector calculation unit 307 can perform further processing on the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 307 may be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, and so on.
  • the vector calculation unit 307 can store the processed output vector to the unified memory 306.
  • the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 307 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 303, for example for use in a subsequent layer in a neural network.
  • the unified memory 306 is used to store input data and output data.
  • the weight data directly stores the input data in the external memory into the input memory 401 and/or the unified memory 406 through the storage unit access controller (DMAC) 305, stores the weight data in the external memory into the weight memory 302, and stores The data in the unified memory 306 is stored in the external memory.
  • DMAC storage unit access controller
  • the bus interface unit 310 (bus interface unit, BIU) is used to implement the interaction between the main CPU, the DMAC, and the fetch memory 309 through the bus.
  • the instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304; the controller 304 is used to call the instructions buffered in the instruction fetch memory 309 to control the working process of the computing accelerator.
  • unified memory 306, input memory 301, weight memory 302, and fetch memory 309 are all on-chip (On-Chip) memories.
  • Memory doubledataratesynchronousdynamicrandomaccessmemory, DDRSDRAM), high-bandwidth memory (highbandwidthmemory, HBM) or other readable and writable memory.
  • each layer in the neural network model described in the embodiment of the present application may be performed by the arithmetic circuit 303 or the vector calculation unit 307.
  • the embodiment of the present application relates to the application of a neural network, in order to better understand the working principle of the neural network described in the embodiment of the present application, the implementation process of the neural network in the present application is described below.
  • the neural network and neural network model can be regarded as the same concept, and the two are used selectively based on the convenience of expression.
  • the neural network model described in the embodiments of this application may be composed of neural units.
  • the neural unit may refer to an arithmetic unit that takes x s and intercept 1 as inputs, and the output of the arithmetic unit may be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • the neural network model may be a model constructed based on deep learning, for example, it may be a deep neural network (DNN) model, a convolutional neural network (CNN) or a recurrent neural network (Recurrent Neural Network, RNN), or a combination of multiple, etc.
  • DNN deep neural network
  • CNN convolutional neural network
  • RNN recurrent neural network
  • a convolutional neural network (convolutional neural network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units.
  • Neural units on the same feature plane can share weights.
  • the convolution kernel can be initialized in the form of a matrix of random size, or can be initialized with all zeros or other general initialization methods, which are not limited here.
  • the convolution kernel can obtain reasonable weights through learning.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • loss function loss function
  • objective function object function
  • the neural network can use the backpropagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • FIG. 6 shows a system architecture 100 for neural network model training provided by an embodiment of the present application.
  • a data collection device 160 is used to collect training data.
  • the neural network model ie, the AWB neural network model described later
  • the neural network model can be further trained through training data.
  • the training data for training the neural network model in the embodiment of the present application may include the multi-channel image corresponding to the original raw domain image, the shooting parameters corresponding to the original raw domain image, and the light source color information annotated to the original raw domain image .
  • the training data for training the neural network model in the embodiment of the present application may include multi-channel images corresponding to the original raw domain images, scene semantic information extracted from the multi-channel images, shooting parameters corresponding to the original raw domain images, And the light source color information annotated to the original raw domain image.
  • the image in the training data may be a single frame image or a multi-frame image of a video frame sequence.
  • the data collection device 160 stores the training data in the database 130, and the training device 120 obtains the target model 101 (for example, the AWB neural network model in the embodiment of the present application) based on the training data maintained in the database 130.
  • the training device 120 inputs the training data into the target model 101 until the degree of difference between the predicted light source color information output by the training target model 101 and the light source color information labelled in the image meets a preset condition. For example, it may be that the angle error of the two corresponding color vectors is smaller than the preset threshold, or remain unchanged, or no longer reduce, so as to complete the training of the target model 101.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be used as a reference to the embodiments of this application. The limit.
  • the target model 101 obtained by training according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 5.
  • the execution device 110 may use the target model 101 to perform neural network calculations to realize the prediction of the color information of the light source.
  • the execution device 110 may be the electronic device 10 described above.
  • the input data of the execution device 110 may come from the data storage system 150, and the data storage system 150 may be a memory placed in the execution device 110, or may be an external memory independent of the execution device 110.
  • the input data may include, for example, a multi-channel image and shooting parameters; or, may include a multi-channel image, scene semantic information extracted from the image, and shooting parameters.
  • the execution device 110 realizes the prediction of the color information of the light source based on the input data.
  • the execution device 110 may be the cloud server 30 in the end-cloud system described above.
  • the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices.
  • I/O input/output
  • the user can input data to the I/O interface 212 through the client device 140, and the client device 140
  • the client device 140 may be the electronic device 20 in the end-cloud system.
  • the client device 140 may automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send input data and the user's authorization is required, the user can set the corresponding authority in the client device 140.
  • the input data may include, for example, a multi-channel image and shooting parameters; or, may include a multi-channel image, scene semantic information extracted from the image, and shooting parameters.
  • the execution device 110 realizes the prediction of the color information of the light source based on the input data. Subsequently, the predicted light source color information can be returned to the client device 140 through the I/O interface 112.
  • the associating function module 113 can be used to perform relevant processing according to the input data.
  • the associating function module 113 can extract scene semantic information from a multi-channel image.
  • the training device 120 can generate a corresponding target model 101 based on different training data for different goals or tasks, and the corresponding target model 101 can be used to achieve the above goals or complete the above tasks, thereby Provide users with the desired results. For example, it can be used to train the AWB neural network model as described in the embodiment of FIG. 11 or FIG. 13 below.
  • the execution device 110 may be configured with a chip as shown in FIG. 5 to complete the calculation work of the calculation module 111.
  • the training device 120 may also be configured with a chip as shown in FIG. 5 to complete the training work of the training device 120 and output the trained target model 101 to the execution device 110.
  • FIG. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • FIG. 7 shows another system architecture 400 provided by an embodiment of the present application.
  • the system architecture includes a local device 420, a local device 430, an execution device 410, and a data storage system 450.
  • the local device 420 and the local device 430 are connected to the execution device 410 through a communication network 440.
  • the execution device 410 may be implemented by one or more servers.
  • the execution device 410 can be used in conjunction with other computing devices.
  • data storage for example: data storage, routers, load balancers and other equipment.
  • the execution device 410 may be arranged on one physical site or distributed on multiple physical sites.
  • the execution device 410 may use the data in the data storage system 450 or call the program code in the data storage system 450 to implement the image processing method of the embodiment of the present application.
  • execution device 410 may also be a cloud server. At this time, the execution device 410 may be deployed in the cloud, and the execution device 410 may be the cloud server 30 described in the embodiment in FIG. It may be the electronic device 20 described in the embodiment in FIG. 3 above.
  • the automatic white balance method in the embodiment of the present application may be independently executed by the local device 420 or the local device 430.
  • the local device 420 and the local device 430 may obtain the relevant parameters of the aforementioned neural network model from the execution device 410, deploy the neural network model on the local device 420 and the local device 430, and use the neural network model to implement the AWB process.
  • the automatic white balance method of the embodiment of the present application may be coordinated by the local device 420 or the local device 430 through interaction with the execution device 410.
  • the user may operate respective user devices (for example, the local device 420 and the local device 430) to interact with the execution device 410.
  • Each local device can represent any computing device, for example, personal computers, computer workstations, smart phones, tablets, cameras, smart cameras, smart car devices or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles Wait.
  • computing device for example, personal computers, computer workstations, smart phones, tablets, cameras, smart cameras, smart car devices or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles Wait.
  • the local device of each user can interact with the execution device 410 through a communication network of any communication mechanism/communication standard.
  • the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • FIG. 8 is a schematic flowchart of an image automatic white balance method provided by an embodiment of the present application.
  • the method can be applied to an electronic device that includes a camera and a display screen.
  • the method includes but is not limited to The following steps:
  • S501 Acquire shooting parameters used when the camera shoots the original RAW domain image.
  • the original RAW image can be referred to as the RAW image.
  • the RAW image can be the raw data of a CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal, and the raw data has not been processed by an image signal processor (ISP).
  • ISP image signal processor
  • the RAW image may specifically be a bayer image in a Bayer format.
  • the shooting parameters indicate parameters used when performing shooting, such as shooting parameters used by a camera, an image sensor, and so on.
  • shooting parameters can also be understood as control parameters generated when the processor controls the camera and the image sensor during shooting.
  • the shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.
  • the color characteristics of the images acquired by the camera and image sensor of the electronic device under the same environment with different shooting parameter configurations will show differences, so the shooting parameters provide the physical conditions for the image at the time of shooting.
  • This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation.
  • S502 Acquire a multi-channel image corresponding to the original RAW domain image.
  • Multichannel (Multichannel) image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels.
  • the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.
  • the multi-channel image may specifically be a color three-channel image, such as an RGB three-channel image.
  • the multi-channel image may specifically be a four-channel image, for example, it may refer to an RGGB four-channel image; or, a BGGR four-channel image; or, a RYYB four-channel image.
  • the input data may be input to the neural network model to obtain the gain value of the white balance, the input data includes at least the shooting parameters of the camera and the multi-channel image; the first processing is performed on the multi-channel image to obtain the target image; wherein, The first processing includes white balance processing based on the multi-channel image and the first gain value.
  • the neural network model is used to obtain the gain value or light source color information required for the white balance processing at least according to the shooting parameters and the multi-channel image.
  • the neural network model described in the embodiments of this application may be a single neural network model, or a combination of two or more neural network models.
  • the neural network model can be a model built based on deep learning, for example, it can be a deep neural network (DNN) model, a convolutional neural network (CNN), or a long short-term memory network (LongShort). -Term Memory, LSTM) or Recurrent Neural Network (RNN), or a combination of multiple, etc.
  • DNN deep neural network
  • CNN convolutional neural network
  • LongShort long short-term memory network
  • LSTM LongShort
  • RNN Recurrent Neural Network
  • the neural network model provided by the embodiments of the present application can obtain the light source color information required in the white balance processing, such as the image light source value (r/g, 1, b/g), according to the shooting parameters and the multi-channel image.
  • the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, thereby realizing the correction of the image chromatic aberration caused by the light source color temperature, so that the color of the object in the image is close to its original
  • the color and the overall effect of the image conform to the visual and cognitive habits of the human eye.
  • the embodiment of the present application uses the multi-channel image corresponding to the RAW image as the input of the AWB neural network model, and provides more color information for the AWB neural network model.
  • the shooting parameters are added as the input of the AWB neural network model to provide shooting configuration information for light source estimation, which can improve the ability of the AWB neural network model to distinguish different light source scenes and ensure good light source estimation accuracy. Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
  • FIG. 10 is a schematic flowchart of a specific image automatic white balance method provided by an embodiment of the present application.
  • the method can be applied to an electronic device.
  • the method includes but is not limited to the following steps:
  • the original RAW image can be referred to as the RAW image.
  • the RAW image can be the raw data of a CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal, and the raw data has not been processed by an image signal processor (ISP).
  • ISP image signal processor
  • the RAW image may specifically be a bayer image in a Bayer format.
  • the shooting parameters refer to shooting parameters used when performing shooting, such as parameters used by a camera, an image sensor, and so on.
  • shooting parameters can also be understood as control parameters generated when the processor controls the camera and the image sensor during shooting.
  • the shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.
  • the color characteristics of the images acquired by the camera and image sensor of the electronic device under the same environment with different shooting parameter configurations will show differences, so the shooting parameters provide the physical conditions for the image at the time of shooting.
  • This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation.
  • S603 Process the RAW image into a multi-channel image.
  • a multi-channel image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels.
  • the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.
  • the multi-channel image may specifically be a color three-channel image, such as an RGB three-channel image.
  • the multi-channel image may specifically be a four-channel image, for example, it may refer to an RGGB four-channel image; or, a BGGR four-channel image; or, a RYYB four-channel image.
  • S604 Input the multi-channel image and shooting parameters into the neural network model to obtain light source color information.
  • the neural network model can obtain the light source color information required in the white balance processing according to the shooting parameters and the multi-channel image.
  • the neural network model described in the embodiments of the present application may be a single neural network model, or a combination of two or more neural network models.
  • the neural network model may be the AWB neural network model shown in FIG. 11.
  • the AWB neural network model specifically includes a first feature extraction network, a feature fusion network and a light source prediction network.
  • the first feature extraction network is used to perform feature extraction on the channel image corresponding to the RAW image to obtain the first feature; the first feature is used to characterize the color information of the channel image.
  • the first feature extraction network may include one or more convolution kernels, and a statistical operation on the pixels of the channel image is implemented through convolution processing, so as to obtain the first feature.
  • the feature fusion network is used to fuse the first feature and the shooting parameter to obtain the fused feature.
  • the fusion method is not limited to one or more combinations of operations such as concat function processing, conv2d function processing, elementwise multiply processing, and elementwise add processing.
  • the aforementioned two-way information (the first feature and the shooting parameter) can be weighted to obtain the fused feature.
  • the shooting parameters can be expanded into the form of a multi-dimensional array to match the array form of the first feature, so that the mathematical form of the two-way data is consistent, so as to facilitate data fusion deal with.
  • the light source prediction network is used to make predictions based on the fused features to obtain light source color information.
  • the light source color information can be used to indicate the color temperature of the light source or the color difference of the image, so the light source color information can be used in the subsequent AWB process.
  • the light source prediction network outputs the image light source value (r/g, 1, b/g), which can be used in the subsequent AWB processing process.
  • the AWB neural network model realizes the prediction of the color information of the light source by fusing the characteristics of the channel image and the shooting parameters.
  • the AWB neural network model can be configured in the electronic device, and the processor (such as CPU or NPU) in the electronic device is used to implement the neural network model calculation to obtain the Light source color information. Therefore, when the electronic device has sufficient computing resources, the computing power of the electronic device is fully utilized to perform neural network calculations, which improves processing efficiency and reduces white balance processing delay.
  • the specific hardware implementation process has been described in detail in the previous section, and will not be repeated here.
  • the AWB neural network model can be configured in the cloud server in the end-cloud system.
  • the electronic device can send the multi-channel image, the scene semantic information extracted from the image, and the shooting parameters to the cloud server, and use the processor (such as CPU or NPU) in the cloud server to realize the neural network model calculation to obtain the color information of the light source.
  • the server then feeds back the color information of the light source to the electronic device. Therefore, when the computing power of the electronic device is not strong enough, the computing power of the cloud server can also be used to calculate the neural network model to ensure the accuracy and stability of the white balance processing, so that the solution of this application can be applied to different types of devices. Improve user experience.
  • the specific implementation process has been described in detail above, and will not be repeated here.
  • S605 Perform white balance processing on the multi-channel image according to the color information of the light source to obtain a target image and display it on the display screen.
  • the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, so as to realize the image color difference caused by the light source color temperature. Correction so that the color of the object in the image is close to its original color, and the overall effect of the image is in line with the visual and cognitive habits of the human eye.
  • light source color information such as image light source value
  • the embodiment of the present application uses the multi-channel image corresponding to the RAW image instead of the statistical feature as the input of the AWB neural network model, which provides more color information for the AWB neural network model.
  • Added shooting parameters as the input of the AWB neural network model such as one or more of the shutter speed, exposure time, exposure value, ISO, aperture size, etc., to provide shooting configuration information for light source estimation, and to obtain The RAW image provides a reference for shooting conditions. Taking the shooting parameters as the input of the neural network model can help the network improve the accuracy of light source prediction. It can improve the discrimination ability of the AWB neural network model for different light source scenes and ensure good light source estimation accuracy.
  • the AWB neural network model in the embodiment of the present application can be applied to the entire scene, and a large amount of training data is used during model training.
  • the training data includes data obtained in a bright light scene and data obtained in a dark light scene.
  • the neural network it is difficult for the neural network to achieve high-precision fitting in the whole scene, and the added camera parameters can provide a priori information for the shooting scene, help the neural network to distinguish between bright light scenes and dark light scenes, thereby improving the two types The light source estimation accuracy of the scene.
  • the camera parameters can not only be used to distinguish bright light scenes and dark light scenes, but also other scenes, such as outdoor and indoor, day and night, etc. Scenes of different categories of attributes. Therefore, adding camera parameters as the input of the neural network can effectively refresh the model's light source estimation accuracy in these types of scenes, thereby improving the overall light source estimation accuracy.
  • the model input which parameters are selected among the camera parameters such as shutter speed, exposure time, exposure value, ISO, aperture size, etc., can be selected based on the actual available information of the electronic device.
  • One or more of the above-mentioned camera parameters can provide a reference for the shooting conditions of the image, which are all helpful to improve the accuracy of the network.
  • the actual application requires flexible selection according to the hardware and software conditions.
  • the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
  • FIG. 12 is a schematic flowchart of another method for image automatic white balance provided by an embodiment of the present application.
  • the method can be applied to electronic devices.
  • the main difference between this method and the method described in FIG. 10 is that the neural network
  • the calculation process of the model also uses scene semantic information to further improve the accuracy of light source color information prediction.
  • the method includes but is not limited to the following steps:
  • S704 Extract scene semantic information of the multi-channel image.
  • the color of the light source may be different.
  • the light source may be an incandescent lamp.
  • the light source may be the sun or street lights.
  • the embodiments of the present application may use the scene semantic information to provide a reference on the shooting scene for the light source color estimation.
  • the scene semantic information represents the semantic features related to the shooting scene represented by the image.
  • various types of shooting scenes can be defined.
  • shooting scenes can be classified based on light source types, such as cold light source scenes, warm light source scenes, single light source scenes, multiple light source scenes, and so on.
  • shooting scenes can be classified based on image content, such as portrait shooting scenes, non-portrait shooting scenes, object shooting scenes, landscape shooting scenes, and so on.
  • the shooting scene can also be a combination of the above-mentioned multiple scenes.
  • other types of shooting scenes may also be defined based on actual application needs, which is not limited in the embodiment of the present application.
  • one or more preset extraction algorithms can be used to extract scene semantic information from a multi-channel image.
  • the preset extraction algorithm may be one of a scene classification algorithm, an image scene segmentation algorithm, an object detection algorithm, a portrait segmentation algorithm, a face detection algorithm, a human detection algorithm, a skin color segmentation algorithm, or an object detection algorithm, etc. Or multiple combinations.
  • the scene classification algorithm is used to realize the classification of faces and non-faces, the classification of single light sources and multiple light sources, the color temperature classification of light sources, or the classification of indoor and outdoor scenes, and so on.
  • the image scene segmentation algorithm can be used to segment the picture to generate a mask map; alternatively, the scene classification algorithm, object detection algorithm, face detection algorithm, skin color segmentation algorithm and other technologies can also be used to generate the mask map.
  • the mask map can provide more information related to the shooting scene than a single frame of the AWB neural network model provided by this application, thereby enhancing the AWB neural network's attention to different shooting scenes, helping the neural network to fit and converge, and achieve better High prediction accuracy.
  • the scene information extraction of the image may not use the scene segmentation technology, but the object detection algorithm is used to extract the scene semantic information, and the generated object category box is generated to generate the scene category mask map and sent to the AWB neural network.
  • object detection technology can be used instead of scene segmentation to extract scene semantic information, which simplifies the complexity of scene information extraction, increases the calculation speed, reduces the calculation complexity, and reduces the performance overhead.
  • the scene semantic information as an auxiliary input is not necessarily in the form of a mask map, but may also be in other forms.
  • the output may be a series of classification confidences.
  • Degree vector, in the form of a vector as the input of the neural network model.
  • S705 Input the multi-channel image, scene semantic information and shooting parameters into the neural network model to obtain light source color information.
  • the neural network model can obtain the light source color information required in the white balance processing according to the shooting parameters, scene semantic information, and multi-channel images.
  • the neural network model described in the embodiments of the present application may be a single neural network model, or a combination of two or more neural network models.
  • the neural network model may be the AWB neural network model shown in FIG. 13.
  • the AWB neural network model specifically includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network.
  • the first feature extraction network is used to perform feature extraction on the channel image corresponding to the RAW image to obtain the first feature; the first feature is used to characterize the color information of the channel image.
  • the first feature extraction network may include one or more small convolution kernels, and the statistical operation on the pixels of the channel image is realized through convolution processing, so as to obtain the first feature.
  • the second feature extraction network is used to perform feature extraction on the scene semantic information to obtain a second feature, and the second feature is used to characterize the scene information corresponding to the channel image.
  • the second feature extraction network may include one or more large convolution kernels, and the analysis/perception of the scene information of the channel image is realized through convolution processing, so as to obtain the second feature.
  • the second feature extraction network can be set
  • the scale of the convolution kernel is larger than the convolution kernel in the first feature extraction network, so as to realize a larger range of image perception ability, so as to obtain more accurate scene information.
  • the feature fusion network is used to fuse the first feature, the second feature, and the shooting parameter to obtain the fused feature.
  • the fusion method is not limited to one or more combinations of operations such as concat function processing, conv2d function processing, elementwise multiply processing, and elementwise add processing.
  • the aforementioned three-way information (the first feature, the second feature, and the shooting parameters) can be weighted to obtain the fused feature.
  • the shooting parameters can be expanded into the form of a multi-dimensional array to match the array form of the first feature and the second feature, so that the mathematical form of the three-way data is consistent. To facilitate data fusion processing.
  • the light source prediction network is used to make predictions based on the fused features to obtain light source color information.
  • the light source color information can be used to indicate the color temperature of the light source or the color difference of the image, so the light source color information can be used in the subsequent AWB process.
  • the light source prediction network outputs the image light source value (r/g, 1, b/g), which can be used in the subsequent AWB processing process.
  • the AWB neural network model realizes the prediction of the color information of the light source by fusing the characteristics of the channel image, the characteristics of the scene semantic information, and the shooting parameters.
  • the AWB neural network model can be configured in the electronic device, and the processor (such as CPU or NPU) in the electronic device is used to implement the neural network model calculation to obtain the Light source color information.
  • the processor such as CPU or NPU
  • the specific hardware implementation process has been described in detail in the previous section, and will not be repeated here.
  • the AWB neural network model can be configured in the cloud server in the end-cloud system.
  • the electronic device can send the multi-channel image, the scene semantic information extracted from the image, and the shooting parameters to the cloud server, and use the processor (such as CPU or NPU) in the cloud server to realize the neural network model calculation to obtain the color information of the light source.
  • the server then feeds back the color information of the light source to the electronic device.
  • step S706 Perform white balance processing on the multi-channel image according to the color information of the light source to obtain the target image and display it on the display screen. For details, please refer to the description of step S605, which is not repeated here.
  • the embodiment of the present application uses the multi-channel image corresponding to the RAW image instead of the statistical feature as the input of the AWB neural network model, and provides more color information for the AWB neural network model.
  • the scene semantic information and shooting parameters are added as the input of the AWB neural network model, which provides more effective prior knowledge (shooting configuration information and scene information) for light source estimation, and greatly enhances the AWB neural network model for different light source scenes.
  • the discrimination ability improves the overall light source estimation accuracy and can effectively help the neural network to converge and fit.
  • the scene semantic information can provide a priori semantic information for the image to a large extent, help the AWB neural network to distinguish different scenes, and then improve the overall accuracy of the AWB neural network.
  • the network output is unstable, which affects the skin color sense.
  • face detection information is added as the scene semantic information into the neural network, the neural network will increase the attention of the face area during the training process. , So as to improve the fitting accuracy of the network in the face scene.
  • the neural network does not perform well in blue sky, grass and other scenes, image segmentation technology can be introduced, and the segmented sky area and grass area are input into the neural network as scene information, and the neural network will increase the number of sky scenes and grass scenes. Attention, so as to improve the accuracy of light source estimation in this scene.
  • scene semantic information many forms are provided in the embodiments of this application.
  • the specific types of scene semantic information to be adopted can be determined according to the needs of AWB in different scenarios.
  • This application does not make special decisions. Restriction, including the specific content of the scene semantic information and the method of obtaining it are not restricted.
  • one or more of the extraction techniques of image segmentation, instance segmentation, face detection, human body detection, skeleton detection, scene classification, etc. can be used.
  • the semantic information of the scene is obtained as the input of the AWB neural network.
  • the implementation of this application can improve the white balance accuracy of electronic equipment shooting in full scenes, improve the stability of AWB for single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
  • the method can be applied to an electronic device, including but not limited to the following steps:
  • the mobile phone collects a frame of RAW image in BAYER format while taking a photo, and at the same time obtains the corresponding shooting parameters when the picture is taken.
  • the selection of shooting parameters can choose the exposure value, shutter time, aperture size, ISO sensitivity and other parameters. Because the color characteristics of the pictures acquired by the sensor of the mobile phone under the same environment and different parameter configurations will show differences, the shooting parameters provide the conditions for the image when shooting, and provide a reference for the light source estimation algorithm.
  • S803 Perform preprocessing on the RAW image to obtain a color three-channel image, such as an RGB three-channel image.
  • a color three-channel image such as an RGB three-channel image.
  • each pixel has three components of red, green and blue.
  • the preprocessing process of the RAW image can be performed by, for example, the ISP of the electronic device, and the preprocessing process includes all the image processing steps experienced in generating the color three-channel image.
  • FIG. 15 shows an example of a preprocessing process, which may include Black Level Correction (BLC) and Lens Shade Correction (LSC).
  • BLC processing can reduce dark current.
  • the influence on the image signal can be eliminated by the LSC processing.
  • it also includes image down-sampling processing and noise reduction processing.
  • a specific implementation process is described as follows:
  • the RAW image is in Bayer format, and it needs to undergo demosaicing to obtain a color three-channel image. In order not to affect the color, the demosaicing operation can be simplified to an average green channel, and red, blue and green are re-arranged to obtain a color three-channel image.
  • preprocessing process may also include other processing algorithms, which are not limited in other embodiments of the present application.
  • S805 Input the multi-channel image, scene semantic information and shooting parameters into the neural network model to obtain light source color information.
  • S806 Perform white balance processing on the image by using the color information of the light source.
  • the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, so as to realize the image color difference caused by the light source color temperature. Correction so that the color of the object in the image is close to its original color, and the overall effect of the image is in line with the visual and cognitive habits of the human eye.
  • light source color information such as image light source value
  • S807 Further perform image enhancement processing on the image after the white balance processing, to obtain the final target image for display.
  • the process of image enhancement processing on the RAW image can be executed by the ISP of the electronic device, for example, or by other devices of the electronic device, such as a Field Programmable Gate Array (FPGA) or a digital signal.
  • the processor Digital Signal Processor, DSP executes.
  • the white balance processed image may also be post-processed through some image enhancement algorithms to further improve the image quality, obtain the final target image for display, and output it to the display screen of the electronic device for display.
  • the image enhancement algorithm may include, for example, operations such as gamma correction, contrast enhancement, dynamic range enhancement, or image sharpening.
  • post-processing process may also adopt other processing algorithms according to actual application needs, which is not limited in other embodiments of the present application.
  • the CPU 131 controls the camera 111 to collect light signals from the shooting environment, and the image sensor 112 converts the light signals captured by the camera 111 into digital signals, thereby obtaining one or more RAW images.
  • the ISP12 performs preprocessing to process the RAW image into a color three-channel image, and extracts scene semantic information of the color three-channel image.
  • the color three-channel image and the scene semantic information are further input to the NPU 132, and the control parameters (shutter, exposure time, aperture size, etc.) of the CPU 131 for the camera 111 and the image sensor 112 are also input to the NPU 132.
  • the NPU132 executes the calculation of the AWB neural network model to obtain the light source color value (r/g, 1, b/g); and returns the light source color value to the ISP12.
  • ISP12 performs white balance processing according to the color value of the light source, and uses image enhancement algorithms to further optimize the white balance processed image to obtain the target image.
  • the target image is further sent to the display device 14 through the CPU 131 for display.
  • the embodiment of the present application on the basis of achieving better AWB, also provides a refinement of the image preprocessing process and refinement of the image post-processing stage.
  • Method to realize The introduction of the preprocessing process not only facilitates the rapid and efficient generation of multi-channel images, so as to realize the realization of the AWB method of the present application, but also helps to improve the image quality (for example, reducing the influence of dark current, reducing noise, eliminating vignetting, etc. Etc.) and neural network algorithm computing speed.
  • the quality of the image can be further improved, the application needs of the user can be met, and the user's look and feel experience can be improved.
  • the solution includes but is not limited to the following steps:
  • the operation for instructing to perform shooting may be, for example, touch, click, voice control, key control, remote control, etc., for triggering the electronic device to shoot.
  • the operation used by the user to instruct the shooting behavior may include pressing the shooting button in the camera of the electronic device, or may include the user device instructing the electronic device to perform the shooting behavior through voice, or it may also mean that the user instructs the electronic device to perform the shooting behavior through a shortcut key.
  • the shooting behavior may also include other user instructing the electronic device to perform the shooting behavior. This application does not make specific restrictions.
  • the method further includes: detecting an operation for opening the camera by the user; in response to the operation, displaying a shooting interface on the display screen of the electronic device.
  • the electronic device After the electronic device detects that the user has clicked an icon of a camera application (application, APP) on the desktop, it can start the camera application and display the shooting interface.
  • a camera application application, APP
  • FIG. 18 shows a graphical user interface (GUI) of a shooting interface 91 of a mobile phone.
  • the shooting interface 91 includes a shooting control 93 and other shooting options. After the electronic device detects that the user clicks on the shooting control 93, the mobile phone executes the shooting process.
  • the shooting interface 91 may further include a viewing frame 92; after the electronic device starts the camera, in the preview state, the preview image can be displayed in the viewing frame 92 in real time.
  • the size of the viewfinder frame can be different in the photo mode and the video mode.
  • the viewfinder frame may be the viewfinder frame in the photographing mode.
  • the viewfinder frame In the video mode, the viewfinder frame can be the entire display screen.
  • the preview state that is, before the user turns on the camera and does not press the photo/video button, the preview image can be displayed in the viewfinder frame in real time.
  • the target image is obtained after white balance processing is implemented using a neural network model, and the neural network model is used to obtain light source color information required for the white balance processing according to input data.
  • the mobile phone in response to the user's instruction operation, executes the shooting process in the background, including: shooting through the camera to obtain a RAW image; performing preprocessing through the ISP to process the RAW image into a three-channel color Image; use the AWB neural network model to perform calculations based on input data to obtain light source color information, and implement white balance processing based on the light source color information.
  • image enhancement algorithms can be used to further optimize to obtain the target image.
  • the target image is displayed on the display screen.
  • FIG. 18 shows a GUI of a display interface 94 based on an album, and the target image 95 can be displayed on the display interface 94.
  • the input data of the model includes shooting parameters and multi-channel images.
  • the structure of the model and the process of performing the calculation can be similar to the description of the aforementioned embodiment in FIG. 11, which will not be repeated here.
  • the input data of the model includes shooting parameters, multi-channel images, and scene semantic information extracted from the multi-channel images.
  • the structure of the model and the process of performing the calculation can be similar to the description of the foregoing embodiment in FIG. 13, and details are not repeated here.
  • the software system architecture of the electronic device that can be used to implement the methods shown in FIG. 17 and FIG. 18 is further described below.
  • the software system can adopt a layered architecture, event-driven architecture, micro-kernel architecture, micro-service architecture, or cloud architecture.
  • the following takes the layered architecture of the Android system as an example for description. Refer to FIG. 19, which is a block diagram of a possible software structure of an electronic device in an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages. As shown in the figure, the application package may include applications such as a camera APP, an image beautification APP, and an album APP.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions. As shown in Figure 3, the application framework layer can include a window manager, a content provider, a resource manager, a view system, and so on. in:
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include image data, video data, and so on.
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, and so on.
  • the view system can be used to construct the display interface of the application.
  • the shooting interface of a camera APP presented through the view system is shown in Figure 18 (a).
  • the shooting interface 91 includes a shooting control 93, a preview box 92, and other related controls, such as image browsing controls. Front and rear camera switching controls, etc.
  • the preview frame 92 is used to preview the scene image to be shot.
  • the electronic device can be instructed to select the front camera or the rear camera for shooting.
  • the electronic device When the user clicks or touches the shooting control 93, the electronic device will drive the camera device to initiate a shooting operation, and instruct the lower-level system library to process the image and save it in the album.
  • the electronic device can call the album APP and display the image processed by the automatic white balance method proposed in this application.
  • the display interface of a photo album APP presented through the view system is shown in (b) of FIG. 18.
  • the target image 95 can be displayed on the display interface 94.
  • the scheduling and management of the Android system can include core libraries and virtual machines.
  • the core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
  • the application layer and application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), graphics engine, etc.
  • the surface manager is used to manage the display subsystem and provide layer fusion functions for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, sensor driver, etc.
  • the camera driver can be used to drive the camera of the electronic device for shooting
  • the display driver can be used to display the processed image on the display panel of the display screen.
  • the graphics engine is a drawing engine for image processing.
  • the graphics engine can be used to process the RAW image into a color three-channel image; extract the scene semantic information of the color three-channel image; combine the color three-channel image, shooting parameters and the scene Semantic information is input to a neural network to obtain light source color information; white balance processing is performed on the color three-channel image according to the light source color information to obtain an image for display.
  • the training process involved in the AWB neural network model can have multiple implementation forms. For example, two exemplary training processes are shown in FIG. 20.
  • a training process for the AWB neural network model may be: the training data includes the annotation of the light source color of the image, the multi-channel image obtained by preprocessing the RAW image, the shooting parameters, and optionally also the scene semantic information.
  • the model After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model.
  • the target model can be output.
  • the training data includes the labeling of the light source color of the image, the target image obtained by the preprocessing of the RAW image and the image enhancement algorithm processing, the shooting parameters, and optionally the scene Semantic information.
  • the model After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model.
  • the target model can be output.
  • the image used for training the AWB neural network model may not be a single frame image, but a labeled video sequence.
  • network structures such as LSTM and RNN can be introduced, and time-domain related strategies can also be used during model training.
  • the video sequence can be used as training data, and the AWB neural network model adds the images of the previous and subsequent frames of the current image as the model input.
  • the stability of the light source estimation of the AWB neural network model can be increased, and the white balance beating under the same light source can be reduced. Probability. Thereby, it can be used in video functions, increase the stability of white balance, and improve user experience.
  • the number of cameras configured in the electronic device is not limited.
  • the type of each camera is not limited.
  • the so-called “different types” can be cameras with different magnifications (shooting magnification or zoom magnification) or cameras with different focal lengths, such as conventional cameras, main cameras, telephoto cameras, wide-angle cameras, medium-telephoto cameras, ultra-telephoto cameras, Or ultra-wide-angle camera and so on.
  • the so-called “different type” can mean that the image sensor corresponding to each camera is different.
  • the image sensor corresponding to the wide-angle camera can be the RGGB module
  • the image sensor corresponding to the main camera can be the RYYB module
  • the image corresponding to the telephoto camera can be the image sensor corresponding to the telephoto camera.
  • the sensor can be a module of RGGB.
  • the automatic white balance method (or the method of obtaining image light source information) described in this application can be adjusted and adapted in multiple ways.
  • a shooting scene is shown in FIG. 23, and the user will perform a viewfinder operation when taking a photo.
  • the user can zoom in or zoom out the viewfinder (mobile phone screen) to achieve the effect of zooming in and out of the scene.
  • the example effects of several target images are shown in (1)(2)(3) in Figure 23 respectively.
  • (1) when the user needs to shoot the details of the distant view, the picture must be zoomed in.
  • the zoom is 10 times (10x) and above, the focal length of the main camera is not enough to provide a very clear effect.
  • the telephoto lens may use an RGGB module, and the sensitivity and spectral response curves will be different from the main camera.
  • the viewfinder in general shooting, if the viewfinder is in the range of 1x to 10x, the focal length of the main camera is sufficient to provide a clear effect. At this time, the RAW image collected by the main camera will be cropped according to the focal length to achieve the effect of zooming in. .
  • the main camera may use the RYYB module, which has better sensitivity, and the response curve of the spectrum will be different from that of the RGGB module.
  • the viewfinder is smaller than 1x, the current focal length of the main camera is not enough to provide a larger field of view (filed of view, FOV). If there is a wide-angle lens, the camera device will switch to a wide-angle camera to provide a larger viewing angle.
  • the wide-angle camera may use an RGGB module, and the wide-angle camera may also use a photosensitive module different from the main camera or telephoto. The sensitivity and spectral response will be different from the above two cameras.
  • FIG. 21 shows a possible shooting process.
  • an electronic device is configured with a first camera and a second camera as an example.
  • the first camera and the second camera may be different types of cameras.
  • the two cameras share the neural network model.
  • the electronic device is equipped with the first AWB neural network model.
  • the first AWB neural network model can be used by the first camera (or the same device, or similar to the first camera).
  • Equipment data collected by training.
  • the obtained RAW image of the first camera is preprocessed to obtain a multi-channel image, which can match the camera parameters of the first camera
  • the color value (or gain value) of the light source corresponding to the first camera is calculated.
  • the obtained RAW image of the second camera is preprocessed to obtain a multi-channel image.
  • the electronic device also performs an image migration operation based on the multi-channel image, that is, the image color of the multi-channel image Migrate to the image color that meets the shooting characteristics of the first camera. Specifically, based on the difference between the second camera and the first camera, perform a color migration operation on the multi-channel image corresponding to the second camera to obtain A transition image that fits the photosensitive characteristics of the image sensor corresponding to the first camera.
  • the migration image is matched with the camera parameters of the second camera as the input of the first AWB neural network, and the color value (or gain value) of the light source that meets the shooting characteristics of the first camera is calculated.
  • the color value of the light source is further (Or gain value) perform a migration operation, so as to migrate the light source color value (or gain value) to the light source color value (or gain value) corresponding to the second camera.
  • the model for the first AWB neural network you can use the image data collected by the first camera (or a device of the same model, or a device similar to the first camera), the camera parameters of the first camera, etc. Used as training data for model training.
  • the image data and camera parameters collected by the second camera or other cameras can also be used, but the collected image data needs to be migrated to the first camera to participate in the training.
  • first AWB neural network model is trained on the data collected by the second camera, it is equivalent to the roles of the first camera and the second camera in the embodiment of FIG. 21, and the implementation method will be similar. The above implementation process will not be repeated here.
  • FIG. 22 shows another possible shooting process.
  • the first camera and the second camera are configured by the electronic device as an example.
  • the first camera and the second camera may be different types of cameras.
  • the two cameras correspond to different neural network models.
  • the first camera corresponds to the first AWB neural network model
  • the second camera corresponds to the second AWB neural network model
  • the first AWB neural network model can be generated by the first camera ( Or the same device, or a device similar to the first camera).
  • the second AWB neural network model can be acquired by the second camera (or the same device, or a device similar to the second camera). Obtained from data training.
  • the obtained RAW image of the first camera is preprocessed to obtain a multi-channel image, and the multi-channel image is combined with the first camera's camera parameters as the first An AWB neural network is input to calculate the light source color value (or gain value) corresponding to the first camera.
  • the obtained RAW image of the second camera is preprocessed to obtain a multi-channel image.
  • the multi-channel image is combined with the camera parameters of the second camera as the input of the second AWB neural network, and the second AWB neural network is calculated.
  • the model for the first AWB neural network you can use the image data collected by the first camera (or a device of the same model, or a device similar to the first camera), the camera parameters of the first camera, etc. Used as training data for model training.
  • the image data and camera parameters collected by the second camera or other cameras can also be used, but the collected image data needs to be migrated to the first camera to participate in the training.
  • the model for the second AWB neural network you can use the image data collected by the second camera (or the device of the same model, or the device similar to the second camera), the camera parameters of the second camera, etc. as training The data is used for model training.
  • the image data and camera parameters collected by the first camera or other cameras can also be used, but the collected image data needs to be migrated to the second camera to participate in the training.
  • an embodiment of the present application also provides an apparatus for realizing automatic white balance of an image. See FIG. 24, which is a schematic structural diagram of an apparatus for automatic white balance of an image provided by an embodiment of the present application.
  • the device includes: a parameter acquisition module 1001, an image acquisition module 1002, and a processing module 1003.
  • the above-mentioned functional module may run in a processor of an electronic device having a camera (for example, it may be referred to as a first camera). in:
  • the parameter acquisition module 1001 is configured to acquire the shooting parameters used when the first camera shoots the original RAW domain image.
  • the image acquisition module 1002 is configured to acquire a multi-channel image corresponding to the original RAW domain image.
  • the processing module 1003 is configured to input input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel image; The multi-channel image is subjected to first processing to obtain a target image; wherein, the first processing includes white balance processing based on the multi-channel image and the first gain value.
  • the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  • the first neural network model implements the prediction of the first gain value by fusing the shooting parameters of the first camera and the image features of the multi-channel image.
  • the processing module is specifically configured to: according to the shooting parameters of the first camera and the multi-channel image, obtain the first neural network model configured in the electronic device. Gain value; use the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image.
  • the processing module is specifically configured to: send the shooting parameters of the first camera and the multi-channel image to a server; receive the first gain value from the server, and the The first gain value is obtained through the first neural network model configured on the server; the multi-channel image is white-balanced by using the first gain value; the white-balanced image is post-processed , To obtain the target image.
  • the first neural network model includes a first feature extraction network, a feature fusion network, and a light source prediction network; the processing module is specifically configured to: Perform feature extraction on the channel image to obtain the first feature; fuse the shooting parameters of the first camera and the first feature through the feature fusion network to obtain the fused feature; use the light source prediction network to obtain the fused feature according to the fusion To predict the features of, and obtain the first gain value.
  • the input data further includes scene semantic information represented by the multi-channel image; the first neural network model specifically combines the shooting parameters of the first camera, the multi-channel image The image characteristics of the image and the scene semantic information represented by the multi-channel image are used to predict the gain value.
  • the processing module is specifically configured to: extract scene semantic information from the multi-channel image; according to the shooting parameters of the first camera, the multi-channel image, and the scene semantic information , Obtaining the first gain value through a first neural network model configured in the electronic device; using the first gain value to perform white balance processing on the multi-channel image; performing white balance processing on the white balance processed image Post-processing to obtain the target image.
  • the processing module is specifically configured to: send the shooting parameters of the first camera, the multi-channel image, and the scene semantic information to a server; and receive the first camera from the server.
  • a gain value, the first gain value is obtained through a first neural network model configured in the server; the white balance processing is performed on the multi-channel image by using the first gain value; the white balance processing is performed
  • the post-processed image is performed to obtain the target image.
  • the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network;
  • the processing module is specifically configured to: pass the first feature
  • the extraction network performs feature extraction on the multi-channel image to obtain the first feature; performs feature extraction on the scene semantic information through the second feature extraction network to obtain the second feature; merges the shooting through the feature fusion network
  • the parameters, the first feature, and the second feature are used to obtain a fused feature; the light source prediction network is used to predict according to the fused feature to obtain the first gain value.
  • the processing module is specifically configured to perform at least one operation of object detection, scene classification, image scene segmentation, portrait segmentation, or face detection on the multi-channel image, so as to obtain the Describe the semantic information of the scene.
  • the image acquisition module is specifically configured to perform preprocessing on the original RAW domain image to obtain the multi-channel image, and the preprocessing includes demosaicing.
  • the multi-channel image is a three-channel image or a four-channel image.
  • the embodiments of the present application also provide another electronic device, the electronic device includes a camera, a display screen, a memory, and a processor, wherein: the camera is used to capture images; the display screen is used to display Image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute FIG. 8, FIG. 10, and FIG. 12, The method steps described in any of the method embodiments described in FIG. 14 and FIG. 17.
  • the embodiments of the present application also provide yet another electronic device.
  • the electronic device includes at least two cameras, a memory, and a processor.
  • the at least two cameras include a first camera and a second camera, wherein: The at least two cameras are both used to capture images; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it can be used to execute The method steps described in any of the method embodiments described in FIG. 21 or FIG. 22. Or it can be used to execute the method steps described in any of the method embodiments described in FIG. 8, FIG. 10, FIG. 12, FIG. 14, and FIG. 17.
  • the embodiment of the present application also provides a chip, which includes a transceiver unit and a processing unit.
  • the transceiver unit may be an input/output circuit or a communication interface;
  • the processing unit is a processor, microprocessor, or integrated circuit integrated on the chip.
  • the chip can execute the method steps described in any of the above-mentioned method embodiments in FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22.
  • the embodiment of the present application also provides a computer-readable storage medium on which an instruction is stored.
  • an instruction is stored.
  • the method embodiment of FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22 is executed.
  • the embodiment of the present application also provides a computer program product containing instructions that, when executed, execute any of the above-mentioned method embodiments in FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22 Method steps described
  • the size of the sequence number of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not be implemented in this application.
  • the implementation process of the example constitutes any limitation.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Devices (AREA)
  • Color Television Image Signal Generators (AREA)

Abstract

The present application provides an image auto white balance method and apparatus. The method comprises: obtaining a photographing parameter used when a first camera of an electronic device captures an original RAW domain image; obtaining a multi-channel image corresponding to the original RAW domain image; inputting input data into a first neutral network model to obtain a first gain value for white balance, the input data comprising at least the photographing parameter of the first camera, and the multi-channel image; and performing first processing on the multi-channel image to obtain a target image, the first processing comprising a white balance processing based on the multi-channel image and the gain value. The implementation of embodiments of the present application can improve the accuracy and the stability of image auto white balance of the electronic device, and improve the user experience.

Description

图像自动白平衡的方法及装置Method and device for image automatic white balance
本申请要求于2020年04月10日提交中国专利局,申请号为202010280949.7,申请名称“图像自动白平衡的方法及装置”,以及于2020年08月14日提交中国专利局,申请号为202010817963.6,申请名称“图像自动白平衡的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application is required to be submitted to the Chinese Patent Office on April 10, 2020, the application number is 202010280949.7, and the application name is "Method and Apparatus for Automatic Image White Balance", and it shall be submitted to the Chinese Patent Office on August 14, 2020, the application number is 202010817963.6 , The priority of the Chinese patent application titled "Method and Device for Automatic Image White Balance", the entire content of which is incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能领域,尤其涉及摄影技术领域的图像自动白平衡的方法及装置。This application relates to the field of artificial intelligence, and in particular to methods and devices for image automatic white balance in the field of photography technology.
背景技术Background technique
随着手机芯片的快速发展,手机的拍照功能变得越来越丰富,用户对于手机拍摄出的图片的基础质量(颜色,清晰度等)提出了更高的要求。其中,颜色是评价手机拍照质量好坏的重要因素之一,而自动白平衡(Auto White Balance,AWB)又是图片颜色形成的重要一环。With the rapid development of mobile phone chips, the camera functions of mobile phones have become more and more abundant, and users have put forward higher requirements for the basic quality (color, clarity, etc.) of pictures taken by mobile phones. Among them, color is one of the important factors to evaluate the quality of mobile phone photos, and the automatic white balance (AWB) is an important part of the formation of the color of the picture.
人类视觉系统具有颜色恒常性特点,即人的视觉系统可以抵制这种光源颜色变化,从而恒定的感知物体的颜色。但是图像传感器(Sensor)在不同光线下,物体呈现的颜色不同,例如,在自然环境下,同一物体在不同颜色的光的照射下会呈现不同的颜色,比如绿色的树叶在晨光照射下偏黄色,而在傍晚时分却偏蓝色。为了消除光源对于图像传感器成像的影响,模拟人类视觉系统的颜色恒常性,保证在任何场景下看到的白色是真正的白色,需要引入自动白平衡技术。The human visual system has the characteristics of color constancy, that is, the human visual system can resist the color change of the light source, so as to constantly perceive the color of the object. But the image sensor (Sensor) shows different colors of objects under different light. For example, in a natural environment, the same object will show different colors under the illumination of different colors of light, such as green leaves that turn yellow under the morning light. , But it is blue in the evening. In order to eliminate the influence of the light source on the imaging of the image sensor, simulate the color constancy of the human visual system, and ensure that the white seen in any scene is true white, it is necessary to introduce automatic white balance technology.
白平衡是描述显示屏中红、绿、蓝三基色混合生成后白色精确度的一项指标,自动白平衡技术主要用于解决不同光源下图像偏色的问题,使得图像中景物的影像符合人眼的色彩视觉习惯。自动白平衡处理中的计算性颜色恒常正是致力于解决这一问题,它的主要目的是计算任意一幅图像所表征的未知光源的颜色,然后用该光源颜色对输入图像进行颜色校正,实现在标准的白光下的显示。White balance is an indicator that describes the accuracy of the white color generated after the three primary colors of red, green, and blue are mixed in the display. The automatic white balance technology is mainly used to solve the problem of image color cast under different light sources, so that the image of the scene in the image is in line with people. The color vision habits of the eye. The computational color constancy in the automatic white balance processing is dedicated to solving this problem. Its main purpose is to calculate the color of an unknown light source represented by any image, and then use the light source color to perform color correction on the input image to achieve Display under standard white light.
目前,如何实现高要求的AWB是一种亟待解决的技术挑战。At present, how to achieve high-demand AWB is a technical challenge that needs to be solved urgently.
发明内容Summary of the invention
本申请实施例提供了图像自动白平衡的方法及装置,能够提高电子设备的图像白平衡的准确性和稳定性,提升用户使用体验。The embodiments of the present application provide a method and device for automatic image white balance, which can improve the accuracy and stability of the image white balance of an electronic device, and improve the user experience.
第一方面,本申请实施例提供了一种图像自动白平衡的方法,该方法应用于包括第一摄像头的电子设备,包括:获取所述第一摄像头拍摄原始RAW域图像时采用的拍摄参数;获取所述原始RAW域图像对应的多通道图像;将输入数据输入第一神经网络模型得到白平衡的第一增益值;所述输入数据至少包括所述第一摄像头的拍摄参数和所述多通道图像;对所述多通道图像进行第一处理,得到目标图像;其中,所述第一处理包括基于所述多通道图像和所述第一增益值的白平衡处理。In a first aspect, an embodiment of the present application provides a method for automatic image white balance, which is applied to an electronic device including a first camera, and includes: acquiring shooting parameters used when the first camera shoots an original RAW domain image; Obtain the multi-channel image corresponding to the original RAW domain image; input the input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel Image; performing first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value.
其中,原始RAW域图像可简称RAW图像,RAW图像可以是CMOS或者CCD图像传感器将摄像头捕捉到的光源信号转化为数字信号的原始数据。Among them, the original RAW image can be referred to as the RAW image, and the RAW image can be the raw data of the CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal.
拍摄参数表示执行拍摄时采用的参数,例如摄像头、图像传感器等等采用的拍摄参数。或者,拍摄参数还可以理解为执行拍摄时处理器对摄像头、图像传感器进行控制时产生的控 制参数。拍摄参数优选的可包括曝光值,可选的还可包括曝光时间(快门时间)、ISO感光度、光圈大小等中的一者或多者。The shooting parameters indicate parameters used when performing shooting, such as shooting parameters used by a camera, an image sensor, and so on. Alternatively, shooting parameters can also be understood as control parameters generated when the processor controls the camera and image sensor during shooting. The shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.
多通道图像是指每个像素点可以用多个图像通道的值(或颜色分量)来表示的图像。图像通道在RGB色彩模式下就是指在下就是指那单独的红色R、绿色G、蓝色B部分。A multi-channel image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels. In the RGB color mode, the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.
本申请可利用拍摄参数为光源颜色估计提供拍摄配置上的参考,以辅助白平衡的处理过程。所述处理包括利用神经网络模型实现的白平衡处理,神经网络模型用于至少根据所述拍摄参数和所述多通道图像获得白平衡的增益值或者白平衡处理所需的图像光源值(增益值与图像光源值互为倒数关系)。本申请实施例中描述的神经网络模型从类型上将可以是单个神经网络模型,也可以是两个或多个神经网络模型的组合。This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation, so as to assist the white balance processing process. The processing includes white balance processing implemented using a neural network model, and the neural network model is used to obtain a white balance gain value or an image light source value (gain value) required for white balance processing at least according to the shooting parameters and the multi-channel image. And the image light source value is the reciprocal relationship). The neural network model described in the embodiments of this application may be a single neural network model, or a combination of two or more neural network models.
输出增益值或图像光源值后,电子设备可利用该增益值或图像光源值对通道图像进行白平衡处理,从而实现对光源色温引起的图像色差的校正,以使得图像中物体的颜色接近其原本的颜色,图像整体呈现出的效果符合人眼的视觉习惯和认知习惯。After outputting the gain value or the image light source value, the electronic device can use the gain value or the image light source value to perform white balance processing on the channel image, thereby realizing the correction of the image chromatic aberration caused by the light source color temperature, so that the color of the object in the image is close to its original The overall effect of the image is in line with the visual and cognitive habits of the human eye.
可以看到,本申请实施例利用了RAW图像对应的多通道图像作为神经网络模型的输入,为AWB神经网络模型提供了更多的颜色信息。又增加了拍摄参数作为AWB神经网络模型的输入,为光源估计提供了拍摄配置信息,能提高AWB神经网络模型针对不同光源场景的辨别能力,保证了良好的光源估计精度。所以实施本申请有利于提高电子设备的白平衡准确性,提高单帧拍照以及视频场景的AWB的稳定性,以及在多光源等歧义场景下的倾向稳定性。It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image as the input of the neural network model, and provides more color information for the AWB neural network model. The shooting parameters are added as the input of the AWB neural network model to provide shooting configuration information for light source estimation, which can improve the ability of the AWB neural network model to distinguish different light source scenes and ensure good light source estimation accuracy. Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
基于第一方面,在可能的实施例中,神经网络模型可以是基于深度学习的方式构建的模型,例如可以是深度神经网络(deep neural network,DNN)模型、卷积神经网络(convolutional neuron network,CNN)、长短期记忆网络(Long Short-Term Memory,LSTM)或者循环神经网络(Recurrent Neural Network,RNN)中的一种,或多种的融合,等等。Based on the first aspect, in possible embodiments, the neural network model may be a model constructed based on deep learning, for example, it may be a deep neural network (DNN) model, a convolutional neural network, One of CNN), Long Short-Term Memory (LSTM), or Recurrent Neural Network (RNN), or a combination of multiple, etc.
基于第一方面,在一种模型实现方式中,所述第一神经网络模型是通过融合第一摄像头的拍摄参数和所述多通道图像的图像特征来实现对所述第一增益值的预测的。Based on the first aspect, in a model implementation manner, the first neural network model realizes the prediction of the first gain value by fusing the shooting parameters of the first camera and the image features of the multi-channel image .
一种实施例中,所述第一神经网络模型可包括第一特征提取网络、特征融合网络和光源预测网络;相应的,通过所述第一神经网络模型获得所述第一增益值的过程具体包括:通过所述第一特征提取网络对所述多通道图像进行特征提取(例如通过卷积处理实现对通道图像的像素的统计操作),获得第一特征;通过所述特征融合网络融合(融合方式例如可以是concat函数处理、conv2d函数处理、元素乘处理、元素加处理等操作中的一种或多种组合)第一摄像头的拍摄参数和所述第一特征,获得融合后的特征;通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值或者图像光源值,用于后续的白平衡处理过程。In an embodiment, the first neural network model may include a first feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the process of obtaining the first gain value through the first neural network model is specific It includes: performing feature extraction on the multi-channel image through the first feature extraction network (for example, performing statistical operations on the pixels of the channel image through convolution processing) to obtain the first feature; and fusing through the feature fusion network (fusion For example, the method may be one or more combinations of concat function processing, conv2d function processing, element multiplication processing, element addition processing, etc.) The shooting parameters of the first camera and the first feature are obtained to obtain the fused feature; by The light source prediction network performs prediction according to the fused features, and obtains the first gain value or the image light source value, which is used in the subsequent white balance processing process.
本申请实施例中AWB神经网络模型可适用于全场景,在模型训练时使用了大量的训练数据,训练数据中包含了在亮光场景下获得的数据和在暗光场景下获得的数据。在海量数据中,神经网络很难在全场景下实现高精度的拟合,而加入的摄像参数可以为拍摄场景提供先验信息,帮助神经网络区分亮光场景和暗光场景,从而提升这两类场景的光源估计精度。实施本申请有利于提高电子设备的白平衡准确性,提高单帧拍照以及视频场景的AWB的稳定性,以及在多光源等歧义场景下的倾向稳定性。The AWB neural network model in the embodiments of the present application can be applied to all scenes, and a large amount of training data is used during model training. The training data includes data obtained in a bright light scene and data obtained in a dark light scene. In the massive data, it is difficult for the neural network to achieve high-precision fitting in the whole scene, and the added camera parameters can provide a priori information for the shooting scene, help the neural network to distinguish between bright light scenes and dark light scenes, thereby improving the two types The light source estimation accuracy of the scene. The implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
基于上述模型实现方式,在可能的实施例中,本申请方案可应用于独立的电子设备,神经网络模型可配置在该电子设备中。相应的,所述第一处理具体包括:根据所述第一摄像头的拍摄参数和所述多通道图像,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图 像进行后处理,获得所述目标图像。从而,在电子设备具备足够计算资源的情况下,充分利用电子设备的计算能力进行神经网络计算,提高处理效率,降低白平衡处理时延。Based on the foregoing model implementation, in a possible embodiment, the solution of the present application can be applied to an independent electronic device, and the neural network model can be configured in the electronic device. Correspondingly, the first processing specifically includes: obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera and the multi-channel image; The first gain value performs white balance processing on the multi-channel image; performs post-processing on the white balance processed image to obtain the target image. Therefore, when the electronic device has sufficient computing resources, the computing power of the electronic device is fully utilized to perform neural network calculations, which improves processing efficiency and reduces white balance processing delay.
基于上述模型实现方式,在可能的实施例中,本申请方案可应用于端-云系统中的电子设备,神经网络模型可配置在端-云系统中的云端服务器中。相应的,所述第一处理具体包括:将所述第一摄像头的拍摄参数和所述多通道图像发送到服务器;接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图像进行后处理,获得所述目标图像。从而,在电子设备计算能力不够强的情况下,也能够利用云端服务器的计算能力进行神经网络模型计算,保证白平衡处理的准确性和稳定性,使得本申请方案能够适用于不同类型的设备,提升用户体验。Based on the foregoing model implementation, in possible embodiments, the solution of the present application can be applied to electronic devices in the end-cloud system, and the neural network model can be configured in the cloud server in the end-cloud system. Correspondingly, the first processing specifically includes: sending the shooting parameters of the first camera and the multi-channel image to a server; receiving the first gain value from the server, where the first gain value is Obtained through a first neural network model configured on the server; white balance processing is performed on the multi-channel image using the first gain value; post-processing the white balance processed image to obtain the target image. Therefore, when the computing power of the electronic device is not strong enough, the computing power of the cloud server can also be used to calculate the neural network model to ensure the accuracy and stability of the white balance processing, so that the solution of this application can be applied to different types of devices. Improve user experience.
基于第一方面,在又一种模型实现方式中,所述输入数据还包括由所述多通道图像表征的场景语义信息;所述第一神经网络模型具体是通过融合所述第一摄像头的拍摄参数、所述多通道图像的图像特征和所述多通道图像表征的场景语义信息来实现对所述第一增益值的预测的。Based on the first aspect, in yet another model implementation manner, the input data further includes scene semantic information represented by the multi-channel image; the first neural network model specifically integrates the shooting of the first camera Parameters, image features of the multi-channel image, and scene semantic information represented by the multi-channel image are used to predict the first gain value.
一种实施例中,所述第一神经网络模型包括第一特征提取网络、第二特征提取网络、特征融合网络和光源预测网络;相应的,通过所述第一神经网络获得所述第一增益值的过程具体包括:通过所述第一特征提取网络对所述多通道图像进行特征提取(例如通过卷积处理实现对通道图像的像素的统计操作),获得第一特征;通过所述第二特征提取网络对所述场景语义信息进行特征提取(例如通过卷积处理实现对通道图像的场景信息的解析/感知),获得第二特征;通过所述特征融合网络融合(融合方式例如可以是concat函数处理、conv2d函数处理、元素乘处理、元素加处理等操作中的一种或多种组合)拍摄参数、所述第一特征和所述第二特征,获得融合后的特征;通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值或者图像光源值,用于后续的白平衡处理过程。In an embodiment, the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the first gain is obtained through the first neural network The value process specifically includes: performing feature extraction on the multi-channel image through the first feature extraction network (for example, performing a statistical operation on the pixels of the channel image through convolution processing) to obtain the first feature; and through the second feature extraction network. The feature extraction network performs feature extraction on the scene semantic information (for example, through convolution processing to realize the analysis/perception of the scene information of the channel image), to obtain the second feature; through the feature fusion network fusion (for example, the fusion method may be concat One or more combinations of function processing, conv2d function processing, element multiplication processing, element addition processing, etc.) shooting parameters, the first feature and the second feature to obtain the fused feature; through the light source The prediction network makes predictions according to the fused features, and obtains the first gain value or the image light source value, which is used in the subsequent white balance processing process.
其中,场景语义信息表示由图像所表征的与拍摄场景相关的语义特征。具体实现中,可以定义多种形式的拍摄场景类型。例如,可将拍摄场景基于光源类型进行分类,例如分类为冷光源场景、暖光源场景、单光源场景、多光源场景,等等。又例如,可将拍摄场景基于图像内容进行分类,例如分类为人像拍摄场景、非人像拍摄场景、物体拍摄场景、景观拍摄场景,等等。场景语义信息可以很大程度的为图像提供先验的语义信息,帮助AWB神经网络对不同场景做以区分,进而提升AWB神经网络的整体精度。Among them, the scene semantic information represents the semantic features related to the shooting scene represented by the image. In specific implementation, various types of shooting scenes can be defined. For example, shooting scenes can be classified based on light source types, such as cold light source scenes, warm light source scenes, single light source scenes, multiple light source scenes, and so on. For another example, shooting scenes can be classified based on image content, such as portrait shooting scenes, non-portrait shooting scenes, object shooting scenes, landscape shooting scenes, and so on. The scene semantic information can provide a priori semantic information for the image to a large extent, help the AWB neural network to distinguish different scenes, and then improve the overall accuracy of the AWB neural network.
例如,在模型训练时,对于海量训练数据,神经网络很难在全场景下实现高精度的拟合。比如人脸在不同光源条件下,网络出值不稳,影响肤色感官,这时如果加入了人脸检测信息作为场景语义信息输入神经网络,神经网络在训练过程中会提升人脸区域的注意力,从而提升网络在人脸场景下的拟合精度。For example, during model training, it is difficult for neural networks to achieve high-precision fitting in the entire scene for massive training data. For example, under different light source conditions, the network output is unstable, which affects the skin color sense. At this time, if face detection information is added as the scene semantic information into the neural network, the neural network will increase the attention of the face area during the training process. , So as to improve the fitting accuracy of the network in the face scene.
基于上述模型实现方式,在可能的实施例中,本申请方案可应用于独立的电子设备,神经网络模型可配置在该电子设备中。所述第一处理具体包括:对所述多通道图像进行场景语义信息的提取;根据所述第一摄像头的拍摄参数、所述多通道图像和所述场景语义信息,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图像进行后处理,获得所述目标图像。Based on the foregoing model implementation, in a possible embodiment, the solution of the present application can be applied to an independent electronic device, and the neural network model can be configured in the electronic device. The first processing specifically includes: extracting scene semantic information from the multi-channel image; according to the shooting parameters of the first camera, the multi-channel image and the scene semantic information, by configuring in the electronic device Obtain the first gain value by the first neural network model; use the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image .
基于上述模型实现方式,本申请方案可应用于端-云系统中的电子设备,神经网络模型可配置在端-云系统中的云端服务器中,所述处理具体包括:将所述第一摄像头的拍摄参数、所 述多通道图像和所述场景语义信息发送到服务器;接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图像进行后处理,获得所述目标图像。Based on the foregoing model implementation, the solution of the present application can be applied to electronic devices in the end-cloud system, and the neural network model can be configured in the cloud server in the end-cloud system. The processing specifically includes: The shooting parameters, the multi-channel image, and the scene semantic information are sent to a server; the first gain value is received from the server, and the first gain value is passed through a first neural network model configured on the server Obtained; using the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image.
基于第一方面,在可能的实施例中,对所述多通道图像进行场景语义信息的提取,包括:对所述多通道图像执行物体检测、场景分类、图像场景分割、人像分割、或人脸检测中的至少一种操作,以获得所述场景语义信息。Based on the first aspect, in a possible embodiment, performing scene semantic information extraction on the multi-channel image includes: performing object detection, scene classification, image scene segmentation, portrait segmentation, or face segmentation on the multi-channel image At least one operation in the detection to obtain the scene semantic information.
比如,通过场景分类算法实现人脸与非人脸的分类、单光源与多光源的分类、光源色温分类、或室内外场景分类,等等。For example, the scene classification algorithm is used to realize the classification of faces and non-faces, the classification of single light sources and multiple light sources, the color temperature classification of light sources, or the classification of indoor and outdoor scenes, and so on.
又比如,可使用图像场景分割算法对图片进行分割,生成蒙版图;可选的也可以使用场景分类算法、物体检测算法、人脸检测算法、肤色分割算法等技术生成蒙版图。该蒙版图可以为本申请提供的AWB神经网络模型提供单帧图像以外更多的与拍摄场景相关的信息,从而提升AWB神经网络对不同拍摄场景的注意力,帮助神经网络拟合收敛,达到更高的预测精度。For another example, the image scene segmentation algorithm can be used to segment the picture to generate a mask map; alternatively, the scene classification algorithm, object detection algorithm, face detection algorithm, skin color segmentation algorithm and other technologies can also be used to generate the mask map. The mask map can provide more information related to the shooting scene than a single frame of the AWB neural network model provided by this application, thereby enhancing the AWB neural network's attention to different shooting scenes, helping the neural network to fit and converge, and achieve better High prediction accuracy.
基于第一方面,在可能的实施例中,所述获取所述原始RAW域图像对应的多通道图像,包括:对所述原始RAW域图像进行预处理获得所述多通道图像,所述预处理包括去马赛克处理。用简化的去马赛克操作会使多通道图像的长宽尺寸是下采样的RAW图像长宽的一半,可提升后续算法运算速度。Based on the first aspect, in a possible embodiment, the acquiring a multi-channel image corresponding to the original RAW domain image includes: preprocessing the original RAW domain image to obtain the multi-channel image, the preprocessing Including demosaicing. Using a simplified demosaicing operation will make the length and width of the multi-channel image half the length and width of the down-sampled RAW image, which can increase the speed of subsequent algorithms.
基于第一方面,在可能的实施例中,所述预处理过程还可包括黑电平校正(Black Level Correction,BLC)和镜头阴影校正(Lens Shade Correction,LSC),通过BLC处理可减少暗电流对图像信号的影响,通过LSC处理可消除渐晕现象给图像带来的影响。可选的,还可包括图像下采样处理和降噪处理。Based on the first aspect, in a possible embodiment, the pre-processing process may also include black level correction (BLC) and lens shadow correction (LSC), and dark current can be reduced through BLC processing. The influence on the image signal can be eliminated by the LSC processing. Optionally, it may also include image down-sampling processing and noise reduction processing.
基于第一方面,在可能的实施例中,对白平衡处理后的图像还可以通过一些图像增强算法来实现后处理,以进一步提升图像的质量,得到最后用于显示的目标图像,输出到电子设备的显示屏进行显示。图像增强算法例如可包括伽马矫正、对比度增强、动态范围增强或图像锐化等操作。Based on the first aspect, in a possible embodiment, the white balance processed image can also be post-processed through some image enhancement algorithms to further improve the quality of the image, obtain the final target image for display, and output it to the electronic device Display on the screen. The image enhancement algorithm may include, for example, operations such as gamma correction, contrast enhancement, dynamic range enhancement, or image sharpening.
基于第一方面,在可能的实施例中,所述多通道图像为三通道图像或者四通道图像。Based on the first aspect, in a possible embodiment, the multi-channel image is a three-channel image or a four-channel image.
基于第一方面,在可能的实施例中,一种对神经网络模型的训练过程可以是:训练数据包括对图像的光源颜色的标注、由RAW图进行预处理获得的多通道图像、拍摄参数、可选的还包括场景语义信息。训练数据输入到模型后,模型输出光源颜色信息。基于输出的光源颜色信息与标注的光源颜色进行比较,确定损失函数,利用损失函数对模型进行反向传播出来,从而更新模型参数,实现了模型的训练。Based on the first aspect, in a possible embodiment, a training process for a neural network model may be: the training data includes the annotation of the light source color of the image, the multi-channel image obtained by preprocessing the RAW image, the shooting parameters, Optionally, it also includes scene semantic information. After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model.
基于第一方面,在可能的实施例中,训练神经网络模型使用的图像,可以不是单帧图像,而是采用标注的视频序列。在AWB神经网络模型中可引入LSTM、RNN等网络结构,模型训练时也可采用时域相关的策略。也就是说,可以利用视频序列作为训练数据,AWB神经网络模型增加当前图像的前后帧的图像作为模型输入。通过利用视频序列训练、增加连续前后帧的输入、引入LSTM,RNN等结构以及增加时域相关的训练策略等,可以增加AWB神经网络模型光源估计的稳定性,减小相同光源下白平衡跳动的概率。从而可以扩展在视频功能中使用,增加了白平衡的稳定性,提升用户使用体验。Based on the first aspect, in a possible embodiment, the image used for training the neural network model may not be a single frame image, but a labeled video sequence. In the AWB neural network model, network structures such as LSTM and RNN can be introduced, and time-domain related strategies can also be used during model training. In other words, the video sequence can be used as training data, and the AWB neural network model adds the images of the previous and subsequent frames of the current image as the model input. By using video sequence training, increasing the input of consecutive frames before and after, introducing LSTM, RNN and other structures, and increasing time-domain related training strategies, the stability of the light source estimation of the AWB neural network model can be increased, and the white balance beating under the same light source can be reduced. Probability. Thereby, it can be used in video functions, increase the stability of white balance, and improve user experience.
第二方面,本申请实施例提供一种图像自动白平衡的方法,方法应用于包括至少两个摄像 头的电子设备,所述至少两个摄像头包括第一摄像头和第二摄像头,所述方法包括:根据用户的拍摄指令从所述至少两个摄像头中选择目标摄像头;所述拍摄指令包括拍摄倍率;当所述目标摄像头为所述第二摄像头时,获取所述第二摄像头拍摄第二原始RAW域图像时采用的拍摄参数和所述第二原始RAW域图像对应的第二多通道图像;对所述第二多通道图像进行颜色迁移,获得契合所述第一摄像头的迁移图像;至少将所述第二摄像头的拍摄参数和所述迁移图像输入第一神经网络模型,得到白平衡的第一增益值;所述第一神经网络模型与所述第一摄像头相关联,具体的,所述第一神经网络模型是根据所述第一摄像头采集的数据以及所述第一摄像头的拍摄参数训练得到的;将所述第一增益值处理成所述第二摄像头对应的第二增益值;对所述第二多通道图像进行第一处理,得到第二目标图像;其中,所述第一处理包括基于所述第二多通道图像和所述第二增益值的白平衡处理。In a second aspect, an embodiment of the present application provides a method for automatic image white balance, which is applied to an electronic device including at least two cameras, the at least two cameras including a first camera and a second camera, and the method includes: Select a target camera from the at least two cameras according to the user's shooting instruction; the shooting instruction includes a shooting magnification; when the target camera is the second camera, acquire the second camera to shoot the second original RAW domain The shooting parameters used in the image and the second multi-channel image corresponding to the second original RAW domain image; perform color migration on the second multi-channel image to obtain a migrated image that fits the first camera; at least transfer the The shooting parameters of the second camera and the migration image are input to the first neural network model to obtain the first gain value of the white balance; the first neural network model is associated with the first camera, specifically, the first The neural network model is trained according to the data collected by the first camera and the shooting parameters of the first camera; the first gain value is processed into the second gain value corresponding to the second camera; The second multi-channel image is subjected to first processing to obtain a second target image; wherein the first processing includes white balance processing based on the second multi-channel image and the second gain value.
其中,本申请实施例中,电子设备所配置的摄像头的数量并不做限定。在两个或多个摄像头的场景下,对各个摄像头的类型不做限定。例如所谓“类型不同”可以是拍摄倍率(或变焦倍率)或焦距不同的摄像头,例如可以是主摄像头、长焦摄像头、广角摄像头、中长焦摄像头、超长焦摄像头、或超广角摄像头等等。又例如所谓“类型不同”可以是各个摄像头对应的图像传感器不同,比如广角摄像头对应的图像传感器是RGGB的模组,常规摄像头对应的图像传感器是RYYB的模组。Among them, in the embodiment of the present application, the number of cameras configured in the electronic device is not limited. In the scenario of two or more cameras, the type of each camera is not limited. For example, the so-called "different types" can be cameras with different shooting magnifications (or zoom magnifications) or focal lengths, such as main cameras, telephoto cameras, wide-angle cameras, medium-telephoto cameras, ultra-telephoto cameras, or ultra-wide-angle cameras, etc. . For another example, the so-called "different types" may mean that the image sensors corresponding to each camera are different. For example, the image sensor corresponding to the wide-angle camera is an RGGB module, and the image sensor corresponding to a conventional camera is a RYYB module.
又例如,当所述第一摄像头和所述第二摄像头为主摄像头、长焦摄像头和广角摄像头中的两者时,以下至少一项成立:所述长焦摄像头对应的图像传感器包括RGGB的模组;所述主摄像头对应的图像传感器包括RYYB的模组;所述广角摄像头对应的图像传感器包括RGGB的模组;所述长焦摄像头的拍摄倍率大于所述主摄像头的拍摄倍率;所述主摄像头的拍摄倍率大于所述广角摄像头的拍摄倍率。For another example, when the first camera and the second camera are two of the main camera, the telephoto camera, and the wide-angle camera, at least one of the following is true: the image sensor corresponding to the telephoto camera includes the RGGB module The image sensor corresponding to the main camera includes a RYYB module; the image sensor corresponding to the wide-angle camera includes an RGGB module; the shooting magnification of the telephoto camera is greater than the shooting magnification of the main camera; the main The shooting magnification of the camera is greater than the shooting magnification of the wide-angle camera.
本申请实施例中,对所述第二多通道图像进行颜色迁移,获得契合所述第一摄像头的迁移图像,包括:基于所述第二摄像头与所述第一摄像头之间的差异,对所述第二多通道图像进行颜色迁移操作,以获得契合所述第一摄像头对应的图像传感器的感光特性的迁移图像。这样,迁移后的图像(可简称为迁移图像)配合第二摄像头的摄像参数作为第一AWB神经网络的输入,计算得到符合第一摄像头的拍摄特点的光源颜色值,在这基础上,进一步对该光源颜色值进行迁移操作,从而将该光源颜色值迁移至第二摄像头对应的光源颜色值。In the embodiment of the present application, performing color migration on the second multi-channel image to obtain a migration image that fits with the first camera includes: performing a color shift on the second camera based on the difference between the second camera and the first camera. The second multi-channel image performs a color shift operation to obtain a shifted image that fits the photosensitive characteristics of the image sensor corresponding to the first camera. In this way, the migrated image (which can be referred to as the migrated image for short) is used as the input of the first AWB neural network with the camera parameters of the second camera, and the color value of the light source that meets the shooting characteristics of the first camera is calculated. On this basis, further The color value of the light source undergoes a migration operation, so that the color value of the light source is migrated to the color value of the light source corresponding to the second camera.
现在的手机等电子设备都具备多摄像头,用户在进行拍摄行为时会进行放大缩小的操作或者选择摄像头的操作,而多个摄像头对应的图像传感器或摄像头的类型存在差异,相同场景下拍摄的RAW图值域可能存在着较大差异(同类型的图像传感器件可能差异较小)。本申请描述的自动白平衡方法可以使得神经网络模型同时兼容两个或多个摄像头的情况,拓展了可应用的场景,提升了对多镜头的适配能力,大大提升用户使用体验。Nowadays mobile phones and other electronic devices are equipped with multiple cameras. The user will zoom in or zoom out or select the camera when shooting. However, there are differences in the types of image sensors or cameras corresponding to multiple cameras. RAW taken in the same scene There may be large differences in the image value range (the image sensor devices of the same type may have small differences). The automatic white balance method described in this application can make the neural network model compatible with two or more cameras at the same time, expand the applicable scenarios, improve the adaptability to multiple lenses, and greatly improve the user experience.
基于第二方面,在可能的实施例中,当所述目标摄像头为所述第一摄像头时,所述方法还包括:获取所述第一摄像头拍摄第一原始RAW域图像时采用的拍摄参数和所述第一原始RAW域图像对应的第一多通道图像;至少将所述第一摄像头的拍摄参数和所述第一多通道图像输入所述第一神经网络模型,得到白平衡的第三增益值;根据所述第一多通道图像和所述第三增益值进行白平衡处理,以获得第一目标图像。Based on the second aspect, in a possible embodiment, when the target camera is the first camera, the method further includes: acquiring the shooting parameters and the shooting parameters used when the first camera shoots the first original RAW domain image. A first multi-channel image corresponding to the first original RAW domain image; at least the shooting parameters of the first camera and the first multi-channel image are input to the first neural network model to obtain a third gain of white balance Value; white balance processing is performed according to the first multi-channel image and the third gain value to obtain a first target image.
基于第二方面,在可能的实施例中,所述拍摄参数包括曝光值、快门时间、光圈大小、或ISO感光度中的至少一个。Based on the second aspect, in a possible embodiment, the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
基于第二方面,在可能的实施例中,所述多通道图像为三通道图像或者四通道图像。Based on the second aspect, in a possible embodiment, the multi-channel image is a three-channel image or a four-channel image.
第三方面,本申请实施例提供一种图像自动白平衡的方法,所述方法应用于包括至少两个摄像头的电子设备,所述至少两个摄像头包括第一摄像头和第二摄像头,所述方法包括:In a third aspect, an embodiment of the present application provides a method for automatic white balance of an image. The method is applied to an electronic device including at least two cameras. The at least two cameras include a first camera and a second camera. include:
根据用户的拍摄指令从所述至少两个摄像头中选择目标摄像头;所述拍摄指令包括拍摄倍率;获取所述目标摄像头拍摄原始RAW域图像时采用的拍摄参数和所述原始RAW域图像对应的多通道图像;确定所述目标摄像头对应的神经网络模型;其中,所述第一摄像头与第一神经网络模型关联,所述第二摄像头与第二神经网络模型关联,具体的,所述第一神经网络模型是根据所述第一摄像头采集的数据以及所述第一摄像头的拍摄参数训练得到的,所述第二神经网络模型是根据所述第二摄像头采集的数据以及所述第二摄像头的拍摄参数训练得到的;将输入数据输入所述神经网络模型得到白平衡的增益值;其中,所述输入数据至少包括所述目标摄像头的拍摄参数和所述多通道图像;对所述多通道图像进行第一处理,得到目标图像;其中,所述第一处理包括基于所述多通道图像和所述增益值的白平衡处理。The target camera is selected from the at least two cameras according to the user's shooting instruction; the shooting instruction includes a shooting magnification; the shooting parameters used when the target camera shoots the original RAW domain image and the number corresponding to the original RAW domain image are acquired. Channel image; determine the neural network model corresponding to the target camera; wherein, the first camera is associated with a first neural network model, and the second camera is associated with a second neural network model. Specifically, the first neural network model The network model is trained based on the data collected by the first camera and the shooting parameters of the first camera, and the second neural network model is obtained based on the data collected by the second camera and the shooting of the second camera. Parameter training is obtained; input data into the neural network model to obtain a white balance gain value; wherein, the input data includes at least the shooting parameters of the target camera and the multi-channel image; the multi-channel image is performed The first processing obtains a target image; wherein the first processing includes white balance processing based on the multi-channel image and the gain value.
其中,所述第一摄像头和所述第二摄像头各自的倍率不同,或者,所述第一摄像头和所述第二摄像头各自对应的图像传感器不同。或者所述第一摄像头和所述第二摄像头各自的摄像头类型不同,所述摄像头类型包括主摄像头、长焦摄像头、广角摄像头、中长焦摄像头、超长焦摄像头、超广角摄像头。Wherein, the magnifications of the first camera and the second camera are different, or the image sensors corresponding to the first camera and the second camera are different. Or the camera types of the first camera and the second camera are different, and the camera types include a main camera, a telephoto camera, a wide-angle camera, a medium-telephoto camera, an ultra-telephoto camera, and an ultra-wide-angle camera.
可以看到,实施该方案,可以为不同摄像头分别配置不同的神经网络模型,例如第一摄像头对应第一神经网络模型,所述第二摄像头对应第二神经网络模型;第一神经网络模型可以是由第一摄像头(或同款设备,或类似于第一摄像头的设备)采集的数据训练得到的,第二神经网络模型可以是由第二摄像头(或同款设备,或类似于第二摄像头的设备)采集的数据训练得到的。这样,可以使得不同摄像头的数据能分别得到独立处理,提升神经网络模型的针对性和精确性。It can be seen that in implementing this solution, different cameras can be configured with different neural network models, for example, the first camera corresponds to the first neural network model, and the second camera corresponds to the second neural network model; the first neural network model can be Trained from the data collected by the first camera (or the same device, or a device similar to the first camera), the second neural network model can be a second camera (or the same device, or a device similar to the second camera) Equipment) data collected by training. In this way, the data of different cameras can be processed independently, and the pertinence and accuracy of the neural network model can be improved.
基于第三方面,在可能的实施例中,所述拍摄参数包括曝光值、快门时间、光圈大小、或ISO感光度中的至少一个。Based on the third aspect, in a possible embodiment, the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
基于第三方面,在可能的实施例中,所述多通道图像为三通道图像或者四通道图像。Based on the third aspect, in a possible embodiment, the multi-channel image is a three-channel image or a four-channel image.
第四方面,本申请实施例提供一种实现图像自动白平衡的装置,包括:参数获取模块,用于获取所述第一摄像头拍摄原始RAW域图像时采用的拍摄参数;图像获取模块,用于获取所述原始RAW域图像对应的多通道图像;处理模块,用于将输入数据输入第一神经网络模型得到白平衡的第一增益值;所述输入数据至少包括所述第一摄像头的拍摄参数和所述多通道图像;还用于对所述多通道图像进行第一处理,得到目标图像;其中,所述第一处理包括基于所述多通道图像和所述第一增益值的白平衡处理。In a fourth aspect, an embodiment of the present application provides a device for realizing automatic white balance of an image, including: a parameter acquisition module for acquiring shooting parameters used when the first camera captures an original RAW domain image; an image acquisition module for Acquire a multi-channel image corresponding to the original RAW domain image; a processing module for inputting input data into a first neural network model to obtain a first gain value of white balance; the input data includes at least the shooting parameters of the first camera And the multi-channel image; also used to perform first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value .
其中,该装置的不同功能模块可互相配合以实现本申请第一方面任意实施例描述的方法。Among them, different functional modules of the device can cooperate with each other to implement the method described in any embodiment of the first aspect of the present application.
第五方面,本申请实施例提供一种电子设备,所述电子设备包括摄像头、存储器和处理器,可选的还包括显示屏,所述显示屏用于显示图像;其中:所述摄像头用于拍摄图像;所述存储器用于存储程序;所述处理器用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,具体用于执行本申请第一方面任意实施例描述的方法。In a fifth aspect, an embodiment of the present application provides an electronic device. The electronic device includes a camera, a memory, and a processor, and optionally a display screen. The display screen is used for displaying images; wherein: the camera is used for Take an image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute any embodiment of the first aspect of the present application Described method.
第六方面,本申请实施例提供一种电子设备,所述电子设备包括至少两个摄像头、存储器和处理器,所述至少两个摄像头包括第一摄像头和第二摄像头,可选的还包括显示屏,所述显示屏用于显示图像。其中:所述至少两个摄像头均用于拍摄图像;所述存储器用于存储程序;所述处理器用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,具体用于执行本申请第二方面的任意实施例描述的方法。In a sixth aspect, an embodiment of the present application provides an electronic device. The electronic device includes at least two cameras, a memory, and a processor. The at least two cameras include a first camera and a second camera, and optionally a display The display screen is used to display images. Wherein: the at least two cameras are both used to capture images; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, specifically It is used to execute the method described in any embodiment of the second aspect of the present application.
第七方面,本申请实施例提供一种电子设备,所述电子设备包括至少两个摄像头、存储器和处理器,所述至少两个摄像头包括第一摄像头和第二摄像头,可选的还包括显示屏,所述显示屏用于显示图像。其中:所述至少两个摄像头均用于拍摄图像;所述存储器用于存储程序;所述处理器用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,具体用于执行本申请第三方面的任意实施例描述的方法。In a seventh aspect, an embodiment of the present application provides an electronic device. The electronic device includes at least two cameras, a memory, and a processor. The at least two cameras include a first camera and a second camera, and optionally a display The display screen is used to display images. Wherein: the at least two cameras are both used to capture images; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, specifically It is used to execute the method described in any embodiment of the third aspect of the present application.
第八方面,本申请实施例提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如第一方面或第二方面或第三方面的任意实施例描述的方法。In an eighth aspect, an embodiment of the present application provides a chip. The chip includes a processor and a data interface, and the processor reads instructions stored in a memory through the data interface, so as to perform operations such as the first aspect or the second aspect. Or the method described in any embodiment of the third aspect.
第九方面,本发明实施例提供了又一种非易失性计算机可读存储介质;所述计算机可读存储介质用于存储第一方面或第二方面或第三方面的任意实施例描述的方法的实现代码。所述程序代码被计算设备执行时能够实现第一方面或第二方面或第三方面的任意实施例描述的方法。In the ninth aspect, an embodiment of the present invention provides yet another non-volatile computer-readable storage medium; the computer-readable storage medium is used to store the information described in the first aspect, the second aspect, or any embodiment of the third aspect The implementation code of the method. When the program code is executed by a computing device, the method described in any embodiment of the first aspect or the second aspect or the third aspect can be implemented.
第十方面,本发明实施例提供了一种计算机程序产品;该计算机程序产品包括程序指令,当该计算机程序产品被计算设备执行时,执行前述第一方面或第二方面或第三方面的任意实施例描述的方法。该计算机程序产品可以为一个软件安装包,可以下载该计算机程序产品并在控制器上执行该计算机程序产品,以实现第一方面或第二方面或第三方面的任意实施例描述的方法。In a tenth aspect, an embodiment of the present invention provides a computer program product; the computer program product includes program instructions, and when the computer program product is executed by a computing device, it executes any of the aforementioned first, second, or third aspects. The method described in the embodiment. The computer program product may be a software installation package, and the computer program product may be downloaded and executed on the controller to implement the method described in any embodiment of the first aspect or the second aspect or the third aspect.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.
图1是本申请实施例提供的一种电子设备的示例图;FIG. 1 is an exemplary diagram of an electronic device provided by an embodiment of the present application;
图2是本申请实施例提供的一种电子设备的结构示意图;FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图3是本申请实施例提供的一种端云交互场景的示例图;FIG. 3 is an example diagram of a terminal cloud interaction scenario provided by an embodiment of the present application;
图4是本申请实施例提供的一种端云交互场景的设备结构示意图;4 is a schematic diagram of the device structure of a device-cloud interaction scenario provided by an embodiment of the present application;
图5是本申请实施例提供的一种芯片的设备结构示意图;FIG. 5 is a schematic diagram of the device structure of a chip provided by an embodiment of the present application;
图6是本申请实施例提供的一种系统架构示意图;FIG. 6 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图7是本申请实施例提供的又一种系统架构示意图;FIG. 7 is a schematic diagram of another system architecture provided by an embodiment of the present application;
图8是本申请实施例提供的一种图像自动白平衡方法的流程示意图;FIG. 8 is a schematic flowchart of an image automatic white balance method provided by an embodiment of the present application;
图9是本申请实施例提供的一种RAW图像与三通道图像的示例图;FIG. 9 is an example diagram of a RAW image and a three-channel image provided by an embodiment of the present application;
图10是本申请实施例提供的又一种图像自动白平衡方法的流程示意图;FIG. 10 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application;
图11是本申请实施例提供的一种神经网络模型的结构与处理流程示意图;FIG. 11 is a schematic diagram of the structure and processing flow of a neural network model provided by an embodiment of the present application;
图12是本申请实施例提供的又一种图像自动白平衡方法的流程示意图;FIG. 12 is a schematic flowchart of another image automatic white balance method provided by an embodiment of the present application;
图13是本申请实施例提供的又一种神经网络模型的结构与处理流程示意图;FIG. 13 is a schematic diagram of the structure and processing flow of another neural network model provided by an embodiment of the present application;
图14是本申请实施例提供的又一种图像自动白平衡方法的流程示意图;FIG. 14 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application;
图15是本申请实施例提供的一种图像预处理过程的示例图;FIG. 15 is an example diagram of an image preprocessing process provided by an embodiment of the present application;
图16是本申请实施例提供的一种图像后处理过程的示例图;FIG. 16 is an example diagram of an image post-processing process provided by an embodiment of the present application;
图17是本申请实施例提供的又一种图像自动白平衡方法的流程示意图;FIG. 17 is a schematic flowchart of yet another image automatic white balance method provided by an embodiment of the present application;
图18是本申请实施例提供的一种用户操作场景的示意图;FIG. 18 is a schematic diagram of a user operation scenario provided by an embodiment of the present application;
图19是本申请实施例的终端的一种可能的软件结构框图;FIG. 19 is a block diagram of a possible software structure of a terminal according to an embodiment of the present application;
图20是本申请实施例提供的一些模型训练过程的示例图;FIG. 20 is an example diagram of some model training processes provided by embodiments of the present application;
图21是本申请实施例提供的一种在多摄像头场景下的处理流程示例图;FIG. 21 is an example diagram of a processing flow in a multi-camera scenario provided by an embodiment of the present application;
图22是本申请实施例提供的又一种在多摄像头场景下的处理流程示例图;FIG. 22 is another example diagram of a processing flow in a multi-camera scenario provided by an embodiment of the present application; FIG.
图23是本申请实施例提供的一种在不同拍摄倍率下的目标图像示例图;FIG. 23 is an example diagram of a target image under different shooting magnifications according to an embodiment of the present application; FIG.
图24是本申请实施例提供的一种装置结构示意图。FIG. 24 is a schematic diagram of the structure of an apparatus provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合本申请实施例中的附图对本申请实施例进行描述。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。需要说明的是,当在本说明书和所附权利要求书中使用时,术语“包括”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一系列单元/器件的系统、产品或者装置没有限定于已列出的单元/器件,而是可选地还包括没有列出的单元/器件,或者还可选地包括这些产品或者装置固有的其他单元/器件。The embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. The singular forms of "a", "said" and "the" used in the embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. It should be noted that when used in this specification and the appended claims, the term "including" and any variations thereof are intended to cover non-exclusive inclusions. For example, a system, product, or device that includes a series of units/devices is not limited to the listed units/devices, but optionally includes unlisted units/devices, or optionally includes these products or devices Inherent other units/devices.
还需要说明的是,本说明书和权利要求书中的术语“第一”“第二”“第三”等用于区别不同的对象,而并非用于描述特定的顺序或者特定的含义。It should also be noted that the terms "first", "second", "third" and the like in this specification and claims are used to distinguish different objects, and are not used to describe a specific sequence or specific meaning.
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。The terms used in the implementation mode part of this application are only used to explain specific embodiments of this application, and are not intended to limit this application.
在拍摄场景下,不同的光源具有不同的光谱成分和分布,在色度学上也可将光源颜色称为色温。示例性地,以3200K时黑体发出的颜色定义为白色,以5600K黑体发出的颜色定义为蓝色,等等。在成像中,环境中的物体(包括人、物、景等)通过反射入射光到图像传感器而呈现其颜色,所以环境中光源的颜色将会影响物体成像的颜色,直接或间接地改变了物体本身的颜色,形成色差。例如白色物体在低色温光线(例如白炽灯,蜡烛,日出日落等光源场景)照射下会偏红,在高色温光线(例如阴天、雪天、树荫等光源场景)照射下会偏蓝。In the shooting scene, different light sources have different spectral components and distributions, and the color of the light source can also be called color temperature in terms of colorimetry. Exemplarily, the color emitted by a black body at 3200K is defined as white, and the color emitted by a black body at 5600K is defined as blue, and so on. In imaging, objects in the environment (including people, objects, scenes, etc.) present their colors by reflecting incident light to the image sensor. Therefore, the color of the light source in the environment will affect the color of the image of the object, directly or indirectly changing the object Its own color, forming a chromatic aberration. For example, white objects will become reddish when illuminated by light with low color temperature (such as incandescent lamps, candles, sunrise and sunset light source scenes), and will become blue when illuminated by light with high color temperature (such as cloudy sky, snowy sky, tree shade and other light source scenes). .
自动白平衡(Auto White Balance,AWB)就是对摄像头拍摄的图像的颜色进行自动修正,所谓“白平衡”就是对不同色温引起的色差进行校正,以使得白色物体能呈现出原本的白色,而其他颜色物体也尽量接近其原本的颜色,从而使图像整体呈现出的效果符合人眼的视觉习惯和认知习惯。Auto White Balance (AWB) is to automatically correct the color of the image taken by the camera. The so-called "white balance" is to correct the chromatic aberration caused by different color temperatures so that white objects can appear as original white. Color objects are also as close to their original colors as possible, so that the overall effect of the image is in line with the visual and cognitive habits of the human eye.
例如,可以基于lambert反射模型实现白平衡处理。一种实施例中,白平衡处理的处理算法如下式①所示:For example, white balance processing can be implemented based on the lambert reflection model. In an embodiment, the processing algorithm of the white balance processing is shown in the following formula ①:
R=I/L①R=I/L①
其中,R代表经过白平衡处理后的图像对应的像素值(Rr,Gr,Br),R接近或者等于被拍摄物体在中性光下的呈现的颜色。Among them, R represents the pixel value (Rr, Gr, Br) corresponding to the image after white balance processing, and R is close to or equal to the color of the object under neutral light.
I代表由电子设备拍摄获得的图像(Ri,Gi,Bi),该图像可以是本申请实施例描述的多通道图像I represents an image (Ri, Gi, Bi) captured by an electronic device, and the image may be the multi-channel image described in the embodiment of the present application
L可代表光源颜色信息(Rl,Gl,Bl),例如具体可以是本申请实施例描述的图像光源值。需要说明的是,此处L是广义概念,在摄像头成像中,L还可包含了图像感光器件对物体颜色产生的偏置。L may represent light source color information (R1, G1, B1), for example, may specifically be the image light source value described in the embodiment of the present application. It should be noted that L here is a broad concept. In camera imaging, L can also include the bias of the image sensor to the color of the object.
该白平衡处理的任务是通过I以及可能的额外输入去估算L,进一步求得物体在中性光 下的颜色R,以尽量消除光源影响的成像色差,使得白色在不同光源下都要呈现出白色,而其他颜色物体也尽量接近其原本的颜色。The task of the white balance processing is to estimate L through I and possible additional inputs, and further obtain the color R of the object under neutral light, so as to eliminate the imaging chromatic aberration affected by the light source as much as possible, so that the white color will appear under different light sources. White, and other color objects are as close as possible to their original colors.
在又一种实施例中,白平衡处理的处理算法如下式②所示:In another embodiment, the processing algorithm of the white balance processing is shown in the following formula ②:
R=I*G②R=I*G②
其中,R代表经过白平衡处理后的图像对应的像素值(Rr,Gr,Br),R接近或者等于被拍摄物体在中性光下的呈现的颜色。Among them, R represents the pixel value (Rr, Gr, Br) corresponding to the image after white balance processing, and R is close to or equal to the color of the object under neutral light.
I代表由电子设备拍摄获得的图像(Ri,Gi,Bi),该图像可以是本申请实施例描述的多通道图像I represents an image (Ri, Gi, Bi) captured by an electronic device, and the image may be the multi-channel image described in the embodiment of the present application
G代表白平衡的增益值(1/Rl,1/Gl,1/Bl),通过比较上述①和②可以看到,该增益值与光源颜色信息可以是如下倒数关系:G represents the gain value of the white balance (1/Rl, 1/Gl, 1/Bl). By comparing the above ① and ②, it can be seen that the gain value and the light source color information can have the following reciprocal relationship:
G=1/L③G=1/L③
该白平衡处理的任务是通过I以及可能的额外输入去估算G,进一步求得物体在中性光下的颜色R,以尽量消除光源影响的成像色差,使得白色在不同光源下都要呈现出白色,而其他颜色物体也尽量接近其原本的颜色。The task of the white balance processing is to estimate G through I and possible additional inputs, and further obtain the color R of the object under neutral light, so as to eliminate the imaging chromatic aberration affected by the light source as much as possible, so that the white color will appear under different light sources. White, and other color objects are as close as possible to their original colors.
需要说明的是,为了描述方便,本文在白平衡处理中主要是以光源颜色信息为例进行方案的描述,而关于增益值方案的实现方式可类似实现,例如基于神经网络模型直接获得白平衡的增益值,或者基于神经网络模型获得图像颜色信息,根据图像颜色信息进一步获得白平衡的增益值。本文将不再展开描述。It should be noted that, for the convenience of description, the white balance processing in this article mainly uses the color information of the light source as an example to describe the solution, and the realization of the gain value solution can be implemented similarly, for example, the white balance is directly obtained based on the neural network model. Gain value, or obtain the image color information based on the neural network model, and further obtain the white balance gain value according to the image color information. This article will not expand the description.
现有技术提出了一些方法来确定光源颜色,例如通过灰度世界(gray world)算法、完美反射(perfect Reflector)算法、或动态阈值算法来确定光源颜色,或者利用图像的颜色直方图来确定光源颜色,等等。The prior art proposes some methods to determine the color of the light source, such as the gray world algorithm, the perfect reflector algorithm, or the dynamic threshold algorithm to determine the light source color, or the color histogram of the image to determine the light source Color, etc.
现在我们对自动白平衡算法和方法提出了更高的要求,更高的要求体现在以下一者或多者:(1)AWB需要在各类场景下表现出更高的光源估计的精度;(2)在多光源等歧义场景下无法通过估算一个光源值来满足图像中的所有光源区域,需要AWB算法在歧义场景表现出稳定的倾向;(3)相同光照条件下拍摄的照片,白平衡要尽量的稳定,避免颜色跳变;(4)AWB算法的计算性能开销须足够的小以满足实时性的要求。Now we have put forward higher requirements for automatic white balance algorithms and methods, and the higher requirements are reflected in one or more of the following: (1) AWB needs to show higher light source estimation accuracy in various scenes; ( 2) In ambiguous scenes such as multiple light sources, it is impossible to estimate a light source value to satisfy all light source areas in the image, and the AWB algorithm is required to show a stable tendency in ambiguous scenes; (3) For photos taken under the same lighting conditions, white balance is required Try to be as stable as possible to avoid color jumps; (4) The computational performance overhead of the AWB algorithm must be small enough to meet the real-time requirements.
本申请实施例提供了一种基于深度学习的可用于图像/视频的自动白平衡方法,能够克服上述技术缺陷,提高在全场景下的AWB的准确性,提高图像/视频的AWB的稳定性,以及保证在多光源等歧义场景下稳定性的倾向,满足实时性的要求。The embodiments of the application provide an automatic white balance method for images/videos based on deep learning, which can overcome the above technical defects, improve the accuracy of AWB in full scenes, and improve the stability of AWB of images/videos. And the tendency to ensure stability in ambiguous scenes such as multiple light sources to meet real-time requirements.
下面介绍本申请描述方法可能的应用场景。The following introduces possible application scenarios of the method described in this application.
参见图1,一种应用场景中,本申请描述方法可应用于独立的电子设备10。Referring to FIG. 1, in an application scenario, the method described in this application can be applied to an independent electronic device 10.
其中,上述电子设备10可以为移动的或固定的,例如,该电子设备10可以是具有图像处理功能的移动电话(手机)、平板个人电脑(tablet personal computer,TPC)、笔记本电脑、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、单反、摄像机、智能手表、监控设备、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备,可穿戴式设备(wearable device,WD)或者车载设备等,本申请实施例对此不作限定。The above-mentioned electronic device 10 may be mobile or fixed. For example, the electronic device 10 may be a mobile phone (mobile phone) with image processing function, a tablet personal computer (TPC), a notebook computer, or a media player. , Smart TV, laptop computer (LC), personal digital assistant (personal digital assistant, PDA), personal computer (PC), camera, SLR, video camera, smart watch, surveillance equipment, augmented reality (augmented reality) , AR) devices, virtual reality (virtual reality, VR) devices, wearable devices (WD) or in-vehicle devices, etc., which are not limited in the embodiments of the present application.
请参考图2以更好理解电子设备10的内部结构示意图。如图2所示,该电子设备10包括:至少一个通用处理器13,存储器15(一个或多个计算机可读存储介质),以及图像采集 装置11,图像信号处理器(Image Signal Processor,ISP)12和显示装置14,这些部件可在一个或多个通信总线上通信。其中:Please refer to FIG. 2 to better understand the internal structure diagram of the electronic device 10. As shown in FIG. 2, the electronic device 10 includes: at least one general-purpose processor 13, a memory 15 (one or more computer-readable storage media), and an image acquisition device 11, an image signal processor (ISP) 12 and display device 14, these components can communicate on one or more communication buses. in:
图像采集装置11中可包括有摄像头111、图像传感器(Sensor)112等元件,用于采集拍摄场景的图像或视频,图像采集装置11所采集的图像可以是一张或多张的原始RAW域图像,本文中原始RAW域图像可简称为RAW图像。其中所述多张的原始RAW域图像可形成图像帧序列。The image acquisition device 11 may include components such as a camera 111, an image sensor (Sensor) 112, etc., which are used to collect images or videos of the shooting scene. The images collected by the image acquisition device 11 may be one or more original RAW domain images. In this article, the original RAW domain image can be referred to as RAW image for short. The multiple original RAW domain images may form a sequence of image frames.
其中,摄像头111在具体中可以是单目摄像头或双目摄像头,设置于电子设备10机身主体的壳体上面的前方位置(即前置摄像头)或后方位置(即后置摄像头),Among them, the camera 111 may be a monocular camera or a binocular camera, and is arranged in a front position (i.e., a front camera) or a rear position (i.e., a rear camera) on the housing of the main body of the electronic device 10,
图像传感器112是一种感光元件,本申请不限定该感光元件的类型,例如可以是金属氧化物半导体元件(Complementary Metal-Oxide Semiconductor,CMOS)或电荷耦合元件(Charge Coupled Device,CCD)。图像传感器112的作用是对摄像头111采集的光学图像进行捕获并转换成后端ISP12可用的电信号。The image sensor 112 is a photosensitive element, and this application does not limit the type of the photosensitive element. For example, it may be a Complementary Metal-Oxide Semiconductor (CMOS) or a Charge Coupled Device (CCD). The function of the image sensor 112 is to capture the optical image collected by the camera 111 and convert it into an electrical signal usable by the back-end ISP 12.
图像传感器112可提供实际拍摄所需的拍摄参数,拍摄参数例如包括曝光值、快门时间、光圈大小、或ISO感光度中的至少一个。其中ISO感光度即国际标准化组织(International Standards Organization,ISO)规定的感光度,又称为ISO值,用于衡量传感器对于光的灵敏程度。The image sensor 112 may provide shooting parameters required for actual shooting. The shooting parameters include, for example, at least one of exposure value, shutter time, aperture size, or ISO sensitivity. The ISO sensitivity is the sensitivity specified by the International Standards Organization (ISO), also known as the ISO value, which is used to measure the sensitivity of the sensor to light.
ISP12主要作用是对前端图像传感器112输出的信号进行处理,本申请实施例中ISP12包含的算法主要包括自动白平衡(Auto White Balance,AWB)算法,此外,还可包括但不限于以下的一种或多种处理算法:自动曝光控制(Automatic Exposure Control,AEC)、自动增益控制(Automatic Gain Control,AGC)、色彩校正、镜头矫正、噪声去除/降噪、坏点去除、线性纠正、颜色插值、图像下采样、电平补偿,等等。此外在一些实例,还可包括一些图像增强算法,例如伽马(Gamma)矫正、对比度增强和锐化、在YUV色彩空间上彩噪去除与边缘加强、色彩加强、色彩空间转换(例如RGB转换为YUV)等等。The main function of ISP12 is to process the signal output by the front-end image sensor 112. The algorithms included in ISP12 in the embodiment of this application mainly include Auto White Balance (AWB) algorithm. In addition, it may also include but is not limited to one of the following Or multiple processing algorithms: Automatic Exposure Control (AEC), Automatic Gain Control (AGC), color correction, lens correction, noise removal/noise reduction, dead pixel removal, linear correction, color interpolation, Image downsampling, level compensation, etc. In addition, in some instances, it can also include some image enhancement algorithms, such as gamma correction, contrast enhancement and sharpening, color noise removal and edge enhancement in the YUV color space, color enhancement, and color space conversion (such as RGB to YUV) and so on.
需要说明是,在可能的实现中,也可以将上述ISP12中描述的一些算法集成至其他的元件中进行处理,举例来说,可将图像增强算法集成到现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)或数字信号处理器(Digital Signal Processor,DSP)中,配合ISP12共同完成图像的处理过程。It should be noted that in possible implementation, some algorithms described in ISP12 can also be integrated into other components for processing. For example, image enhancement algorithms can be integrated into Field Programmable Gate Array (Field Programmable Gate Array). Array, FPGA) or digital signal processor (Digital Signal Processor, DSP), cooperate with ISP12 to complete the image processing process together.
通用处理器13可以是能够处理电子指令的任何类型的装置,本申请中电子设备10可包括一个或者多个通用处理器13,例如包括中央处理器(Central Processing Unit,CPU)131和神经网络处理器(Neural-network Processing Unit,NPU)132中的一个或两个。此外,还可以包括图形处理器(Graphics Processing Unit,GPU)、微处理器、微控制器、主处理器、控制器以及ASIC(Application Specific Integrated Circuit,专用集成电路)等等中的一个或多个。通用处理器13执行各种类型的数字存储指令,例如存储在存储器813中的软件或者固件程序,它能使计算节点800提供较宽的多种服务。例如,处理器811能够执行程序或者处理数据,以执行本文讨论的方法的至少一部分。The general-purpose processor 13 may be any type of device capable of processing electronic instructions. The electronic device 10 in this application may include one or more general-purpose processors 13, such as a central processing unit (CPU) 131 and neural network processing. One or two of Neural-network Processing Unit (NPU) 132. In addition, it may also include one or more of graphics processing units (GPUs), microprocessors, microcontrollers, main processors, controllers, and ASICs (Application Specific Integrated Circuits), etc. . The general-purpose processor 13 executes various types of digital storage instructions, such as software or firmware programs stored in the memory 813, which enables the computing node 800 to provide a wide variety of services. For example, the processor 811 can execute programs or process data to perform at least a part of the methods discussed herein.
其中,CPU131的功能主要是解析计算机指令以及处理计算机软件中的数据,实现对电子设备10整体上的控制,对电子设备10的所有硬件资源(如存储资源、通信资源、I/0接口等)进行控制调配。Among them, the function of CPU131 is mainly to parse computer instructions and process data in computer software, realize overall control of electronic device 10, and control all hardware resources of electronic device 10 (such as storage resources, communication resources, I/0 interfaces, etc.) Perform control deployment.
NPU132是基于神经网络算法与加速的新型处理器总称,NPU专门为人工智能而设计,用于加速神经网络的运算,解决传统芯片在神经网络运算时效率低下的问题。NPU132 is a general term for a new type of processor based on neural network algorithms and acceleration. NPU is specifically designed for artificial intelligence to accelerate neural network operations and solve the problem of low efficiency of traditional chips in neural network operations.
需要说明的是,NPU132的名称并不构成对本申请的限定,例如在其他应用场景中,NPU132也可以变形、替换为其他类似功能的处理器,例如张量处理器(Tensor Processing Unit,TPU),深度学习处理器(Deep learning Processing Unit,DPU),等等。It should be noted that the name of NPU132 does not constitute a limitation on this application. For example, in other application scenarios, NPU132 can also be deformed and replaced with other processors with similar functions, such as Tensor Processing Unit (TPU). Deep learning processor (Deep learning Processing Unit, DPU), etc.
本申请一种实施例中,当存在NPU132时,NPU132可承担与神经网络计算相关的任务。例如可利用NPU132根据ISP12提供的图像信息(如多通道图像)和图像采集装置提供的信息(如拍摄参数)进行AWB神经网络的计算,得到光源颜色信息,进而将该光源颜色信息反馈到ISP12,以便于ISP12进一步完成AWB过程。In an embodiment of the present application, when there is an NPU 132, the NPU 132 can undertake tasks related to neural network calculations. For example, the NPU132 can be used to calculate the AWB neural network according to the image information provided by the ISP12 (such as multi-channel images) and the information provided by the image acquisition device (such as shooting parameters) to obtain the color information of the light source, and then feed the color information of the light source to the ISP12. So that ISP12 further completes the AWB process.
本申请又一种实施例中,当存在CPU131且不存在NPU132时,可由CPU131承担与神经网络计算相关的任务。即CPU131根据ISP12提供的图像信息(如多通道图像)和图像采集装置提供的信息(如拍摄参数)进行AWB神经网络的计算,得到光源颜色信息,进而将该光源颜色信息反馈到ISP12,以便于ISP12进一步完成AWB过程。In yet another embodiment of the present application, when the CPU 131 exists and the NPU 132 does not exist, the CPU 131 can undertake tasks related to neural network calculations. That is, the CPU131 performs AWB neural network calculations based on the image information provided by the ISP12 (such as multi-channel images) and the information provided by the image acquisition device (such as shooting parameters) to obtain the light source color information, and then feeds back the light source color information to the ISP12 to facilitate ISP12 further completes the AWB process.
显示装置14,用于显示用户需要进行拍摄时当前预览的拍摄场景、拍摄界面、或用于显示经过白平衡处理后的目标图像。显示装置14还可用于显示需要用户操作的信息或提供给用户的信息以及电子设备10的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。The display device 14 is used to display the currently previewed shooting scene and the shooting interface when the user needs to shoot, or to display the target image after the white balance processing. The display device 14 can also be used to display information requiring user operations or information provided to the user, as well as various graphical user interfaces of the electronic device 10. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.
显示装置14具体可包括显示屏(显示面板),可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板。The display device 14 may specifically include a display screen (display panel). Optionally, the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
显示装置14还可以是触控面板(触摸屏、触控屏),触控面板可包括显示屏和触敏表面,当触敏表面检测到在其上或附近的触摸操作后,传送给CPU131以确定触摸事件的类型,随后CPU131根据触摸事件的类型在显示装置14上提供相应的视觉输出。The display device 14 may also be a touch panel (touch screen, touch screen). The touch panel may include a display screen and a touch-sensitive surface. When the touch-sensitive surface detects a touch operation on or near it, it is sent to the CPU 131 to determine The type of the touch event, and then the CPU 131 provides a corresponding visual output on the display device 14 according to the type of the touch event.
存储器15可以包括易失性存储器(Volatile Memory),例如随机存取存储器(Random Access Memory,RAM)、高速缓存cache;存储器也可以包括非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);存储器604还可以包括上述种类的存储器的组合。存储器15可用于存储图像采集装置11所采集的RAW图像、白平衡处理后的目标图像、前后帧图像信息、拍摄参数、场景语义信息等数据;存储器15还可用于存储程序指令,以供处理器调用并执行本申请描述的图像自动白平衡的方法。The memory 15 may include a volatile memory (Volatile Memory), such as random access memory (Random Access Memory, RAM), and a high-speed cache; the memory may also include a non-volatile memory (Non-Volatile Memory), such as a read-only memory. (Read-Only Memory, ROM), Flash Memory (Flash Memory), Hard Disk Drive (HDD), or Solid-State Drive (SSD); the memory 604 may also include a combination of the foregoing types of memories. The memory 15 can be used to store the RAW image collected by the image collection device 11, the target image after white balance processing, the image information of the front and back frames, shooting parameters, scene semantic information and other data; the memory 15 can also be used to store program instructions for the processor Call and execute the method of image automatic white balance described in this application.
基于电子设备10的上述部件,可通过以下过程来实现图像的自动白平衡:电子设备10执行拍摄时,外界环境中的物体(人、物、景等)通过摄像头111采集的光学图像投射到图像传感器112表面上,转化成电信号,电信号经过模数转换(A/D转换)后变为数字图像信号,该数字图像信号为RAW图像(例如Bayer格式)。图像传感器112将RAW图像送到ISP12中进行加工处理。在ISP12需要进行AWB时,ISP12将图像信息(例如多通道图像)发给通用处理器13,图像采集装置11将拍摄参数发给通用处理器13。通用处理器13(例如CPU131或NPU132)可利用输入信息计算神经网络模型,获得图像对应的光源颜色信息。进而将光源颜色信息反馈回给ISP12,ISP12根据光源颜色信息完成AWB,以及执行其他方面的图像处理,得到目标图像,目标图像例如是YUV或者RGB格式的图像。然后,ISP12将目标图像通过I/O接口传输到CPU131,CPU131再将目标图像送到显示装置14进行显示。Based on the above components of the electronic device 10, the automatic white balance of the image can be achieved through the following process: when the electronic device 10 performs shooting, objects (people, objects, scenes, etc.) in the external environment are projected to the image through the optical image collected by the camera 111 On the surface of the sensor 112, it is converted into an electrical signal, and the electrical signal is converted into a digital image signal after analog-to-digital conversion (A/D conversion), and the digital image signal is a RAW image (for example, Bayer format). The image sensor 112 sends the RAW image to the ISP 12 for processing. When the ISP 12 needs to perform AWB, the ISP 12 sends image information (for example, a multi-channel image) to the general-purpose processor 13, and the image acquisition device 11 sends the shooting parameters to the general-purpose processor 13. The general-purpose processor 13 (for example, the CPU131 or the NPU132) can use the input information to calculate the neural network model to obtain the light source color information corresponding to the image. Furthermore, the color information of the light source is fed back to the ISP12, and the ISP12 completes AWB according to the color information of the light source and performs other image processing to obtain a target image, such as an image in YUV or RGB format. Then, the ISP 12 transmits the target image to the CPU 131 through the I/O interface, and the CPU 131 sends the target image to the display device 14 for display.
本领域技术人员应当理解的是,电子设备10还可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。图2中示出的设备结构并不构成对电子设备10的限定。Those skilled in the art should understand that the electronic device 10 may also include more or fewer components than those shown in the figure, or a combination of certain components, or a different component arrangement. The device structure shown in FIG. 2 does not constitute a limitation on the electronic device 10.
参见图2,又一种应用场景中,本申请描述方法可应用于端-云交互的场景。如图2所示,端-云系统包括电子设备20和云端服务器30。电子设备20与云端服务器30之间可进行通信,通信方式不限于有线或无线的方式。Referring to Fig. 2, in another application scenario, the method described in this application can be applied to a scenario of end-cloud interaction. As shown in FIG. 2, the end-cloud system includes an electronic device 20 and a cloud server 30. The electronic device 20 and the cloud server 30 can communicate with each other, and the communication method is not limited to a wired or wireless method.
其中,电子设备20可以为移动的或固定的,例如,该电子设备10可以是具有图像处理功能的移动电话(手机)、平板个人电脑、笔记本电脑、媒体播放器、智能电视、笔记本电脑、个人数字助理、个人计算机、照相机、单反、摄像机、智能手表、监控设备、增强现实设备、虚拟现实设备,可穿戴式设备或者车载设备等,本申请实施例不作限定。Among them, the electronic device 20 may be mobile or fixed. For example, the electronic device 10 may be a mobile phone (mobile phone), a tablet PC, a notebook computer, a media player, a smart TV, a notebook computer, or a personal computer with image processing functions. Digital assistants, personal computers, cameras, SLRs, camcorders, smart watches, monitoring devices, augmented reality devices, virtual reality devices, wearable devices, or in-vehicle devices, etc., are not limited in the embodiments of the present application.
云端服务器30可以包括一个或多个服务器,或者包括一个或多个处理节点,或者包括运行于服务器的一个或多个虚拟机,云端服务器30还可以被称为服务器集群、管理平台、数据处理中心等等,本申请实施例不做限定。The cloud server 30 may include one or more servers, or one or more processing nodes, or one or more virtual machines running on the server. The cloud server 30 may also be called a server cluster, a management platform, or a data processing center. Etc., the embodiment of the present application does not limit it.
请参考图4以更好理解端-云系统下的设备内部结构。如图4所示,端-云系统包括电子设备20和云端服务器30。该电子设备20包括:至少一个通用处理器23、存储器25、图像采集装置21、图像信号处理器ISP 22、显示装置24和通信装置26,这些部件可在一个或多个通信总线上通信以实现电子设备20的功能。云端服务器30包括存储器33、神经网络处理器NPU31和通信装置32,这些部件可在一个或多个通信总线上通信以实现云端服务器30的功能。电子设备20通过通信装置26与云端服务器30的通信装置32之间建立通信连接,通信方式不限于有线或者无线的方式。例如,通信装置26和通信装置32可用于相互发送和接收无线信号,无线通信方式包括但不限于:射频(Radio Frequency,RF)方式、数据通信方式、蓝牙方式、WiFi方式等等中的一个或多个。Please refer to Figure 4 to better understand the internal structure of the device under the end-cloud system. As shown in FIG. 4, the end-cloud system includes an electronic device 20 and a cloud server 30. The electronic equipment 20 includes: at least one general-purpose processor 23, a memory 25, an image acquisition device 21, an image signal processor ISP 22, a display device 24, and a communication device 26. These components can communicate on one or more communication buses to achieve Functions of the electronic device 20. The cloud server 30 includes a memory 33, a neural network processor NPU 31, and a communication device 32. These components can communicate on one or more communication buses to realize the functions of the cloud server 30. The electronic device 20 establishes a communication connection with the communication device 32 of the cloud server 30 through the communication device 26, and the communication method is not limited to a wired or wireless method. For example, the communication device 26 and the communication device 32 can be used to send and receive wireless signals to and from each other. The wireless communication methods include but are not limited to one of radio frequency (RF), data communication, Bluetooth, WiFi, etc. Multiple.
可以看到,该端-云系统与前文所述图2中的电子设备10相比,主要区别在于,电子设备10在本地实现对神经网络的计算功能,而端-云系统中将该功能放在云端服务器30上实现,即由云端服务器30的NPU31执行神经网络计算。所以端-云系统中电子设备20可以不包括NPU。本申请实施例充分利用了云端服务器的计算资源,有利于降低电子设备20的运行负担和配置需求,提升用户使用体验。It can be seen that the main difference between this end-cloud system and the aforementioned electronic device 10 in FIG. It is implemented on the cloud server 30, that is, the NPU 31 of the cloud server 30 performs neural network calculations. Therefore, the electronic device 20 in the end-cloud system may not include the NPU. The embodiments of the present application make full use of the computing resources of the cloud server, which is beneficial to reduce the operating burden and configuration requirements of the electronic device 20, and improve the user experience.
基于端-云系统的上述部件,可通过以下过程来实现图像的自动白平衡:电子设备20执行拍摄时,外界环境中的物体(人、物、景等)通过图像采集装置21中的摄像头采集的光学图像投射到图像采集装置21中的图像传感器上,转化成电信号,电信号经过模数转换(A/D转换)后变为数字图像信号,该数字图像信号为RAW图像(例如Bayer格式)。图像采集装置21将RAW图像送到ISP22中进行加工处理。在ISP22需要进行AWB时,ISP22将图像信息(例如多通道图像)发给通用处理器23,图像采集装置21将拍摄参数发给通用处理器23。通用处理器23(例如CPU231)可进一步通过通信装置26将上述信息发给云端服务器30。云端服务器30通过通信装置32收到上述信息后,通过NPU31利用上述输入信息(多通道图像、拍摄参数等)计算神经网络模型,获得图像对应的光源颜色信息。进而将光源颜色信息通过通信装置32反馈回给电子设备20,该光源颜色信息被送到ISP22,ISP22根据光源颜色信息完成AWB,以及执行其他方面的图像处理,得到目标图像,目标图像例如是YUV或者RGB格式的图像。然后,ISP22将目标图像通过I/O接口又传输到CPU231,CPU231再将目标图像送到显示装置24进行显示。Based on the above components of the end-cloud system, the automatic white balance of the image can be realized through the following process: when the electronic device 20 performs shooting, objects (people, objects, scenes, etc.) in the external environment are collected by the camera in the image collection device 21 The optical image is projected onto the image sensor in the image acquisition device 21 and converted into an electrical signal. The electrical signal is converted into a digital image signal after analog-to-digital conversion (A/D conversion). The digital image signal is a RAW image (such as Bayer format). ). The image capture device 21 sends the RAW image to the ISP 22 for processing. When the ISP 22 needs to perform AWB, the ISP 22 sends image information (for example, a multi-channel image) to the general-purpose processor 23, and the image acquisition device 21 sends the shooting parameters to the general-purpose processor 23. The general-purpose processor 23 (for example, the CPU 231) may further send the above-mentioned information to the cloud server 30 through the communication device 26. After the cloud server 30 receives the above-mentioned information through the communication device 32, the NPU 31 uses the above-mentioned input information (multi-channel image, shooting parameters, etc.) to calculate a neural network model to obtain the light source color information corresponding to the image. Furthermore, the color information of the light source is fed back to the electronic device 20 through the communication device 32. The color information of the light source is sent to the ISP 22. The ISP 22 performs AWB according to the color information of the light source and performs other image processing to obtain the target image. The target image is for example YUV. Or an image in RGB format. Then, the ISP 22 transmits the target image to the CPU 231 through the I/O interface, and the CPU 231 sends the target image to the display device 24 for display.
需要说明的是,端-云系统中电子设备20的相关部件的功能可类似参考图2中电子设备10的相关部件的描述,为了说明书的简洁,本文不再赘述。It should be noted that the functions of the relevant components of the electronic device 20 in the end-cloud system may be similar to the description of the relevant components of the electronic device 10 in FIG.
本领域技术人员应当理解的是,电子设备20和云端服务器30还可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。图2中示出的设备结构并不构成对对本申请的限定。Those skilled in the art should understand that the electronic device 20 and the cloud server 30 may also include more or less components than those shown in the figure, or a combination of certain components, or a different component arrangement. The device structure shown in FIG. 2 does not constitute a limitation to the present application.
图5是本申请实施例提供的一种芯片的硬件结构,该芯片包括神经网络处理器NPU300。FIG. 5 is a hardware structure of a chip provided by an embodiment of the present application. The chip includes a neural network processor NPU300.
在一种实现中,该NPU300可以被设置在如图2所示的电子设备10中,用于完成神经网络的计算工作,此时NPU300即为图2所描述的NPU132。In an implementation, the NPU 300 may be set in the electronic device 10 as shown in FIG. 2 to complete the calculation work of the neural network. In this case, the NPU 300 is the NPU 132 described in FIG. 2.
在又一种实现中,该NPU300可以被设置在如图4所示的云端服务器30中,用于完成神经网络的计算工作,此时NPU300即为图4所描述的NPU31。In another implementation, the NPU 300 can be set in the cloud server 30 as shown in FIG. 4 to complete the calculation of the neural network. In this case, the NPU 300 is the NPU 31 described in FIG. 4.
NPU300可作为协处理器挂载到主中央处理器(centralprocessingunit,CPU)上,由主CPU分配任务。NPU300的核心部分为运算电路303,通过控制器304控制运算电路303提取存储器中的矩阵数据并进行乘法运算。The NPU300 can be mounted on a central processing unit (CPU) as a coprocessor, and the main CPU distributes tasks. The core part of the NPU 300 is the arithmetic circuit 303. The arithmetic circuit 303 is controlled by the controller 304 to extract matrix data from the memory and perform multiplication operations.
在一些实现中,运算电路303内部包括多个处理单元(processengine,PE)。在一些实现中,运算电路303是二维脉动阵列;运算电路303还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路303是通用的矩阵处理器。In some implementations, the arithmetic circuit 303 includes multiple processing units (processengine, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array; the arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C;运算电路303从权重存储器302中取矩阵B相应的数据,并缓存在运算电路303中每一个PE上;运算电路303从输入存储器301中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器308(accumulator)中。For example, suppose there is an input matrix A, a weight matrix B, and an output matrix C; the arithmetic circuit 303 fetches the corresponding data of the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303; the arithmetic circuit 303 receives the input The memory 301 takes the data of matrix A and matrix B to perform matrix operations, and the partial results or final results of the obtained matrix are stored in an accumulator 308 (accumulator).
向量计算单元307可以对运算电路303的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。例如,向量计算单元307可以用于神经网络中非卷积/非FC层的网络计算,如池化(pooling),批归一化(batchnormalization),局部响应归一化(local response normalization)等。The vector calculation unit 307 can perform further processing on the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. For example, the vector calculation unit 307 may be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, and so on.
在一些实现种,向量计算单元能307将经处理的输出的向量存储到统一存储器306。例如,向量计算单元307可以将非线性函数应用到运算电路303的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元307生成归一化的值、合并值,或二者均有。In some implementations, the vector calculation unit 307 can store the processed output vector to the unified memory 306. For example, the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 307 generates a normalized value, a combined value, or both.
在一些实现中,处理过的输出的向量能够用作到运算电路303的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the processed output vector can be used as an activation input to the arithmetic circuit 303, for example for use in a subsequent layer in a neural network.
统一存储器306用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(directmemoryaccesscontroller,DMAC)305将外部存储器中的输入数据存入到输入存储器401和/或统一存储器406、将外部存储器中的权重数据存入权重存储器302,以及将统一存储器306中的数据存入外部存储器。The unified memory 306 is used to store input data and output data. The weight data directly stores the input data in the external memory into the input memory 401 and/or the unified memory 406 through the storage unit access controller (DMAC) 305, stores the weight data in the external memory into the weight memory 302, and stores The data in the unified memory 306 is stored in the external memory.
总线接口单元310(businterfaceunit,BIU),用于通过总线实现主CPU、DMAC和取指存储器309之间进行交互。The bus interface unit 310 (bus interface unit, BIU) is used to implement the interaction between the main CPU, the DMAC, and the fetch memory 309 through the bus.
与控制器304连接的取指存储器(instructionfetchbuffer)309用于存储控制器304使用的指令;控制器304用于调用取指存储器309中缓存的指令,实现控制该运算加速器的工作过程。The instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304; the controller 304 is used to call the instructions buffered in the instruction fetch memory 309 to control the working process of the computing accelerator.
一般地,统一存储器306,输入存储器301,权重存储器302以及取指存储器309均为片上(On-Chip)存储器,外部存储器为该NPU外部的存储器,该外部存储器可以为双倍数据率同步动态随机存储器(doubledataratesynchronousdynamicrandomaccessmemory,DDRSDRAM)、高带宽存储器(highbandwidthmemory,HBM)或其他可读可写的存储器。Generally, unified memory 306, input memory 301, weight memory 302, and fetch memory 309 are all on-chip (On-Chip) memories. Memory (doubledataratesynchronousdynamicrandomaccessmemory, DDRSDRAM), high-bandwidth memory (highbandwidthmemory, HBM) or other readable and writable memory.
具体的,本申请实施例描述的神经网络模型(即后文描述的AWB神经网络模型)中各层的运算可以由运算电路303或向量计算单元307执行。Specifically, the calculation of each layer in the neural network model described in the embodiment of the present application (ie, the AWB neural network model described later) may be performed by the arithmetic circuit 303 or the vector calculation unit 307.
由于本申请实施例涉及神经网络的应用,为了更好理解本申请实施例描述的神经网络的工作原理,下面描述本申请中上神经网路的实现过程。Since the embodiment of the present application relates to the application of a neural network, in order to better understand the working principle of the neural network described in the embodiment of the present application, the implementation process of the neural network in the present application is described below.
首先对本申请实施例涉及的神经网络的相关术语和概念进行介绍。First, the relevant terms and concepts of the neural network involved in the embodiments of the present application are introduced.
(1)神经网络模型(1) Neural network model
本文中神经网络和神经网络模型可以视为同一概念,两者基于表达上的便利而选择性地使用。本申请实施例描述的神经网络模型可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为: In this article, the neural network and neural network model can be regarded as the same concept, and the two are used selectively based on the convenience of expression. The neural network model described in the embodiments of this application may be composed of neural units. The neural unit may refer to an arithmetic unit that takes x s and intercept 1 as inputs, and the output of the arithmetic unit may be:
Figure PCTCN2021085966-appb-000001
Figure PCTCN2021085966-appb-000001
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。 Among them, s=1, 2,...n, n is a natural number greater than 1, W s is the weight of x s , and b is the bias of the neural unit. f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal. The output signal of the activation function can be used as the input of the next layer. The activation function can be a sigmoid function. A neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field. The local receptive field can be a region composed of several neural units.
本申请实施例中,神经网络模型可以是基于深度学习的方式构建的模型,例如可以是深度神经网络(deep neural network,DNN)模型、卷积神经网络(convolutional neuron network,CNN)或者循环神经网络(Recurrent Neural Network,RNN)中的一种,或多种的融合,等等。In the embodiments of the present application, the neural network model may be a model constructed based on deep learning, for example, it may be a deep neural network (DNN) model, a convolutional neural network (CNN) or a recurrent neural network (Recurrent Neural Network, RNN), or a combination of multiple, etc.
示例性地,以卷积神经网络模型为例,卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元可以共享权重。卷积核可以以随机大小的矩阵的形式初始化,也可以采用全零初始化或者其他通用的初始化方法,这里不做限定。在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Illustratively, taking a convolutional neural network model as an example, a convolutional neural network (convolutional neural network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer. The feature extractor can be regarded as a filter. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can be connected to only part of the neighboring neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units on the same feature plane can share weights. The convolution kernel can be initialized in the form of a matrix of random size, or can be initialized with all zeros or other general initialization methods, which are not limited here. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
本申请中,关于神经网络模型的一些具体实现方式可参考后文描述的AWB神经网络模型。In this application, for some specific implementations of the neural network model, reference may be made to the AWB neural network model described later.
(2)损失函数(2) Loss function
在训练神经网络模型的过程中,因为希望神经网络模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为神经网络模型中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是 损失函数(lossfunction)或目标函数(objectivefunction),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。In the process of training the neural network model, because we hope that the output of the neural network model is as close as possible to the value that we really want to predict, we can compare the predicted value of the current network with the target value that we really want, and then based on the difference between the two To update the weight vector of each layer of neural network (of course, there is usually an initialization process before the first update, that is, pre-configured parameters for each layer in the neural network model), for example, if the predicted value of the network If it is high, adjust the weight vector to make it predict lower, and keep adjusting until the neural network can predict the really wanted target value or a value very close to the really wanted target value. Therefore, it is necessary to predefine "how to compare the difference between the predicted value and the target value". This is the loss function (loss function) or objective function (objective function), which is an important equation used to measure the difference between the predicted value and the target value. . Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, then the training of the deep neural network becomes a process of reducing this loss as much as possible.
(3)反向传播算法(3) Backpropagation algorithm
神经网络可以采用误差反向传播(backpropagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。The neural network can use the backpropagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged. The back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
参见图6,图6示出了本申请实施例提供的一种用于神经网络模型训练的系统架构100。Refer to FIG. 6, which shows a system architecture 100 for neural network model training provided by an embodiment of the present application.
在图6中,数据采集设备160用于采集训练数据。针对本申请实施例的方法来说,可以通过训练数据对神经网络模型(即后文描述的AWB神经网络模型)进行进一步训练。In FIG. 6, a data collection device 160 is used to collect training data. For the method of the embodiment of the present application, the neural network model (ie, the AWB neural network model described later) can be further trained through training data.
一种示例中,在本申请实施例中训练神经网络模型的训练数据可以包括原始raw域图像对应的多通道图像、原始raw域图像对应的拍摄参数、以及对原始raw域图像标注的光源颜色信息。In an example, the training data for training the neural network model in the embodiment of the present application may include the multi-channel image corresponding to the original raw domain image, the shooting parameters corresponding to the original raw domain image, and the light source color information annotated to the original raw domain image .
又一种示例中,在本申请实施例中训练神经网络模型的训练数据可以包括原始raw域图像对应的多通道图像、由多通道图像提取的场景语义信息、原始raw域图像对应的拍摄参数、以及对原始raw域图像标注的光源颜色信息。In another example, the training data for training the neural network model in the embodiment of the present application may include multi-channel images corresponding to the original raw domain images, scene semantic information extracted from the multi-channel images, shooting parameters corresponding to the original raw domain images, And the light source color information annotated to the original raw domain image.
需要说明的是,训练数据中的图像可以是单帧图像,也可以是视频帧序列的多帧图像。It should be noted that the image in the training data may be a single frame image or a multi-frame image of a video frame sequence.
在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型101(例如本申请实施例中的AWB神经网络模型)。训练设备120将训练数据输入目标模型101,直到训练目标模型101输出的预测的光源颜色信息与图像标注的光源颜色信息之间的差异程度满足预设条件。例如,可以是两者对应的颜色向量的角度误差小于预设阈值,或者保持不变,或者不再减少,从而完成目标模型101的训练。After the training data is collected, the data collection device 160 stores the training data in the database 130, and the training device 120 obtains the target model 101 (for example, the AWB neural network model in the embodiment of the present application) based on the training data maintained in the database 130. The training device 120 inputs the training data into the target model 101 until the degree of difference between the predicted light source color information output by the training target model 101 and the light source color information labelled in the image meets a preset condition. For example, it may be that the angle error of the two corresponding color vectors is smaller than the preset threshold, or remain unchanged, or no longer reduce, so as to complete the training of the target model 101.
需要说明的是,在实际的应用中,所述数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。It should be noted that in actual applications, the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。In addition, it should be noted that the training device 120 does not necessarily perform the training of the target model 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training. The above description should not be used as a reference to the embodiments of this application. The limit.
根据训练设备120训练得到的目标模型101可以应用于不同的系统或设备中,如应用于图5所示的执行设备110。执行设备110可利用目标模型101执行神经网络计算,以实现对光源颜色信息的预测。The target model 101 obtained by training according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 5. The execution device 110 may use the target model 101 to perform neural network calculations to realize the prediction of the color information of the light source.
在图1所示的独立电子设备的应用场景下,所述执行设备110可以是前文描述的电子设备10。所述执行设备110的输入数据可来源于数据存储系统150,数据存储系统150可以是置于执行设备110中的存储器,也可以是独立于执行设备110的外部存储器。所述输入数据在本申请实施例中例如可以包括:多通道图像和拍摄参数;或者,可以包括多通道图像、由图像提取的场景语义信息和拍摄参数。从而,所述执行设备110基于输入数据实现对光源颜色信息的预测。In the application scenario of an independent electronic device shown in FIG. 1, the execution device 110 may be the electronic device 10 described above. The input data of the execution device 110 may come from the data storage system 150, and the data storage system 150 may be a memory placed in the execution device 110, or may be an external memory independent of the execution device 110. In the embodiment of the present application, the input data may include, for example, a multi-channel image and shooting parameters; or, may include a multi-channel image, scene semantic information extracted from the image, and shooting parameters. Thus, the execution device 110 realizes the prediction of the color information of the light source based on the input data.
在图3所示的端-云应用场景下,所述执行设备110可以是前文描述的端-云系统中的云端服务器30。此时,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,例如用户可以通过客户设备140向I/O接口212输入数据,客户设备140例如可以是端-云系统中的电子设备20。在一种情况下,客户设备140可以自动地向I/O接口112发送输入数据。如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。所述输入数据在本申请实施例中例如可以包括:多通道图像和拍摄参数;或者,可以包括多通道图像、由图像提取的场景语义信息和拍摄参数。从而,所述执行设备110基于输入数据实现对光源颜色信息的预测。后续可通过I/O接口112将预测的光源颜色信息返回给客户设备140。In the end-cloud application scenario shown in FIG. 3, the execution device 110 may be the cloud server 30 in the end-cloud system described above. At this time, the execution device 110 is configured with an input/output (I/O) interface 112 for data interaction with external devices. For example, the user can input data to the I/O interface 212 through the client device 140, and the client device 140 For example, it may be the electronic device 20 in the end-cloud system. In one case, the client device 140 may automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send input data and the user's authorization is required, the user can set the corresponding authority in the client device 140. In the embodiment of the present application, the input data may include, for example, a multi-channel image and shooting parameters; or, may include a multi-channel image, scene semantic information extracted from the image, and shooting parameters. Thus, the execution device 110 realizes the prediction of the color information of the light source based on the input data. Subsequently, the predicted light source color information can be returned to the client device 140 through the I/O interface 112.
关联功能模块113可用于根据输入数据进行相关处理,例如在本申请一种实施例中,关联功能模块113可实现根据多通道图像提取出场景语义信息。The associating function module 113 can be used to perform relevant processing according to the input data. For example, in an embodiment of the present application, the associating function module 113 can extract scene semantic information from a multi-channel image.
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型101,该相应的目标模型101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。例如可用于训练如下文图11或图13实施例描述的AWB神经网络模型。It is worth noting that the training device 120 can generate a corresponding target model 101 based on different training data for different goals or tasks, and the corresponding target model 101 can be used to achieve the above goals or complete the above tasks, thereby Provide users with the desired results. For example, it can be used to train the AWB neural network model as described in the embodiment of FIG. 11 or FIG. 13 below.
在一种实现中,执行设备110中可配置如图5所示的芯片,用以完成计算模块111的计算工作。In an implementation, the execution device 110 may be configured with a chip as shown in FIG. 5 to complete the calculation work of the calculation module 111.
又一种实现中,训练设备120也可以配置如图5所示的芯片,用以完成训练设备120的训练工作并输出训练好的目标模型101到执行设备110。In another implementation, the training device 120 may also be configured with a chip as shown in FIG. 5 to complete the training work of the training device 120 and output the trained target model 101 to the execution device 110.
值得注意的是,图6仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。It is worth noting that FIG. 6 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
参见图7,图7所示是本申请实施例提供了又一种系统架构400。该系统架构包括本地设备420、本地设备430以及执行设备410和数据存储系统450,其中,本地设备420和本地设备430通过通信网络440与执行设备410连接。Referring to FIG. 7, FIG. 7 shows another system architecture 400 provided by an embodiment of the present application. The system architecture includes a local device 420, a local device 430, an execution device 410, and a data storage system 450. The local device 420 and the local device 430 are connected to the execution device 410 through a communication network 440.
示例性地,执行设备410可以由一个或多个服务器实现。Exemplarily, the execution device 410 may be implemented by one or more servers.
可选的,执行设备410可以与其它计算设备配合使用。例如:数据存储器、路由器、负载均衡器等设备。执行设备410可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备410可以使用数据存储系统450中的数据,或者调用数据存储系统450中的程序代码来实现本申请实施例的图像处理方法。Optionally, the execution device 410 can be used in conjunction with other computing devices. For example: data storage, routers, load balancers and other equipment. The execution device 410 may be arranged on one physical site or distributed on multiple physical sites. The execution device 410 may use the data in the data storage system 450 or call the program code in the data storage system 450 to implement the image processing method of the embodiment of the present application.
需要说明的是,上述执行设备410也可以为云端服务器,此时执行设备410可以部署在云端,执行设备410可以是前文图3实施例描述的云端服务器30,此时本地设备420/本地设备430可以是前文图3实施例描述的电子设备20。It should be noted that the above-mentioned execution device 410 may also be a cloud server. At this time, the execution device 410 may be deployed in the cloud, and the execution device 410 may be the cloud server 30 described in the embodiment in FIG. It may be the electronic device 20 described in the embodiment in FIG. 3 above.
在一种可能的实现方式中,本申请实施例的自动白平衡方法可以是由本地设备420或者本地设备430独立执行。例如,本地设备420、本地设备430可以从执行设备410获取到上述神经网络模型的相关参数,将神经网络模型部署在本地设备420、本地设备430上,利用该神经网络模型实现AWB过程。In a possible implementation manner, the automatic white balance method in the embodiment of the present application may be independently executed by the local device 420 or the local device 430. For example, the local device 420 and the local device 430 may obtain the relevant parameters of the aforementioned neural network model from the execution device 410, deploy the neural network model on the local device 420 and the local device 430, and use the neural network model to implement the AWB process.
在又一种可能的实现方式中,本申请实施例的自动白平衡方法可以是由本地设备420或者本地设备430通过与执行设备410交互而协同完成的。例如,用户可以操作各自的用户设备(例如,本地设备420和本地设备430)与执行设备410进行交互。In another possible implementation manner, the automatic white balance method of the embodiment of the present application may be coordinated by the local device 420 or the local device 430 through interaction with the execution device 410. For example, the user may operate respective user devices (for example, the local device 420 and the local device 430) to interact with the execution device 410.
每个本地设备可以表示任何计算设备,例如,个人计算机、计算机工作站、智能手机、平板电脑、相机、智能摄像头、智能车载设备或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。Each local device can represent any computing device, for example, personal computers, computer workstations, smart phones, tablets, cameras, smart cameras, smart car devices or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles Wait.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备410进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。The local device of each user can interact with the execution device 410 through a communication network of any communication mechanism/communication standard. The communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
应理解,上述为对应用场景的举例说明,并不对本申请的应用场景作任何限定。It should be understood that the foregoing is an example of an application scenario, and does not limit the application scenario of the present application in any way.
基于上文描述的应用场景、设备及系统,下面描述本申请实施例提供的一些自动白平衡方法。Based on the application scenarios, devices, and systems described above, some automatic white balance methods provided in the embodiments of the present application are described below.
参见图8,图8是本申请实施例提供的一种图像自动白平衡的方法的流程示意图,所述方法可应用于电子设备,所述电子设备包括摄像头和显示屏,该方法包括但不限于以下步骤:Referring to FIG. 8, FIG. 8 is a schematic flowchart of an image automatic white balance method provided by an embodiment of the present application. The method can be applied to an electronic device that includes a camera and a display screen. The method includes but is not limited to The following steps:
S501、获取所摄像头拍摄原始RAW域图像时采用的拍摄参数。S501: Acquire shooting parameters used when the camera shoots the original RAW domain image.
本文中原始RAW域图像可简称RAW图像。RAW图像可以是CMOS或者CCD图像传感器将摄像头捕捉到的光源信号转化为数字信号的原始数据,该原始数据尚未经过图像信号处理器(ISP)处理。该RAW图像具体可以是采用拜耳(bayer)格式的bayer图像。In this article, the original RAW image can be referred to as the RAW image. The RAW image can be the raw data of a CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal, and the raw data has not been processed by an image signal processor (ISP). The RAW image may specifically be a bayer image in a Bayer format.
拍摄参数表示执行拍摄时采用的参数,例如摄像头、图像传感器等等采用的拍摄参数。或者,拍摄参数还可以理解为执行拍摄时处理器对摄像头、图像传感器进行控制时产生的控制参数。The shooting parameters indicate parameters used when performing shooting, such as shooting parameters used by a camera, an image sensor, and so on. Alternatively, shooting parameters can also be understood as control parameters generated when the processor controls the camera and the image sensor during shooting.
拍摄参数优选的可包括曝光值,可选的还可包括曝光时间(快门时间)、ISO感光度、光圈大小等中的一者或多者。The shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.
电子设备的摄像头、图像传感器在相同环境下采用不一样的拍摄参数配置获取的图像,颜色特性会呈现出差异,因而拍摄参数为图像提供了拍摄时的物理条件。本申请可利用拍摄参数为光源颜色估计提供拍摄配置上的参考。The color characteristics of the images acquired by the camera and image sensor of the electronic device under the same environment with different shooting parameter configurations will show differences, so the shooting parameters provide the physical conditions for the image at the time of shooting. This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation.
S502、获取所述原始RAW域图像对应的多通道图像。S502: Acquire a multi-channel image corresponding to the original RAW domain image.
电子设备获取到RAW图像后,可将RAW图像处理为多通道图像,如图9所示。多通道(Multichannel)图像是指每个像素点可以用多个图像通道的值(或颜色分量)来表示的图像。图像通道在RGB色彩模式下就是指在下就是指那单独的红色R、绿色G、蓝色B部分。After the electronic device obtains the RAW image, it can process the RAW image into a multi-channel image, as shown in Figure 9. Multichannel (Multichannel) image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels. In the RGB color mode, the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.
例如,在一个示例中,所述多通道图像具体可以是彩色三通道图像,例如RGB三多通道图像。For example, in an example, the multi-channel image may specifically be a color three-channel image, such as an RGB three-channel image.
在又一个示例中,所述多通道图像具体可以是四通道图像,例如,可以是指RGGB四通道图像;或者,BGGR四通道图像;或者,RYYB四通道图像。In another example, the multi-channel image may specifically be a four-channel image, for example, it may refer to an RGGB four-channel image; or, a BGGR four-channel image; or, a RYYB four-channel image.
S503、对所述多通道图像进行处理,获得用于在所述显示屏显示的目标图像。具体的,可将输入数据输入神经网络模型得到白平衡的增益值,所述输入数据至少包括摄像头的拍摄参数和多通道图像;对所述多通道图像进行第一处理,得到目标图像;其中,所述第一处理包括基于所述多通道图像和所述第一增益值的白平衡处理。S503: Process the multi-channel image to obtain a target image for display on the display screen. Specifically, the input data may be input to the neural network model to obtain the gain value of the white balance, the input data includes at least the shooting parameters of the camera and the multi-channel image; the first processing is performed on the multi-channel image to obtain the target image; wherein, The first processing includes white balance processing based on the multi-channel image and the first gain value.
其中,所述神经网络模型用于至少根据所述拍摄参数和所述多通道图像获得所述白平衡处理所需的增益值或者光源颜色信息。Wherein, the neural network model is used to obtain the gain value or light source color information required for the white balance processing at least according to the shooting parameters and the multi-channel image.
本申请实施例中描述的神经网络模型从类型上将可以是单个神经网络模型,也可以是两个或多个神经网络模型的组合。The neural network model described in the embodiments of this application may be a single neural network model, or a combination of two or more neural network models.
例如,神经网络模型可以是基于深度学习的方式构建的模型,例如可以是深度神经网络 (deep neural network,DNN)模型、卷积神经网络(convolutional neuron network,CNN)、长短期记忆网络(Long Short-Term Memory,LSTM)或者循环神经网络(Recurrent Neural Network,RNN)中的一种,或多种的融合,等等。For example, the neural network model can be a model built based on deep learning, for example, it can be a deep neural network (DNN) model, a convolutional neural network (CNN), or a long short-term memory network (LongShort). -Term Memory, LSTM) or Recurrent Neural Network (RNN), or a combination of multiple, etc.
本申请实施例提供的神经网络模型能够根据拍摄参数和多通道图像获得白平衡处理中所需的光源颜色信息,例如图像光源值(r/g,1,b/g)。输出光源颜色信息后,电子设备可通过自身配置的ISP利用该光源颜色信息对通道图像进行白平衡处理,从而实现对光源色温引起的图像色差的校正,以使得图像中物体的颜色接近其原本的颜色,图像整体呈现出的效果符合人眼的视觉习惯和认知习惯。The neural network model provided by the embodiments of the present application can obtain the light source color information required in the white balance processing, such as the image light source value (r/g, 1, b/g), according to the shooting parameters and the multi-channel image. After outputting the light source color information, the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, thereby realizing the correction of the image chromatic aberration caused by the light source color temperature, so that the color of the object in the image is close to its original The color and the overall effect of the image conform to the visual and cognitive habits of the human eye.
可以看到,本申请实施例利用了RAW图像对应的多通道图像作为AWB神经网络模型的输入,为AWB神经网络模型提供了更多的颜色信息。又增加了拍摄参数作为AWB神经网络模型的输入,为光源估计提供了拍摄配置信息,能提高AWB神经网络模型针对不同光源场景的辨别能力,保证了良好的光源估计精度。所以实施本申请有利于提高电子设备的白平衡准确性,提高单帧拍照以及视频场景的AWB的稳定性,以及在多光源等歧义场景下的倾向稳定性。It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image as the input of the AWB neural network model, and provides more color information for the AWB neural network model. The shooting parameters are added as the input of the AWB neural network model to provide shooting configuration information for light source estimation, which can improve the ability of the AWB neural network model to distinguish different light source scenes and ensure good light source estimation accuracy. Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
参见图10,图10是本申请实施例提供的一种具体的图像自动白平衡的方法的流程示意图,所述方法可应用于电子设备,该方法包括但不限于以下步骤:Referring to FIG. 10, FIG. 10 is a schematic flowchart of a specific image automatic white balance method provided by an embodiment of the present application. The method can be applied to an electronic device. The method includes but is not limited to the following steps:
S601、拍摄至少一张原始RAW域图像。在单帧拍摄时可以是拍照场景,在多帧拍摄时可以是视频录像场景或者延时拍摄场景。S601. Shoot at least one original RAW domain image. In the case of single-frame shooting, it can be a photographing scene, and in multi-frame shooting, it can be a video recording scene or a time-lapse shooting scene.
本文中原始RAW域图像可简称RAW图像。RAW图像可以是CMOS或者CCD图像传感器将摄像头捕捉到的光源信号转化为数字信号的原始数据,该原始数据尚未经过图像信号处理器(ISP)处理。该RAW图像具体可以是采用拜耳(bayer)格式的bayer图像。In this article, the original RAW image can be referred to as the RAW image. The RAW image can be the raw data of a CMOS or CCD image sensor that converts the light source signal captured by the camera into a digital signal, and the raw data has not been processed by an image signal processor (ISP). The RAW image may specifically be a bayer image in a Bayer format.
S602、获取拍摄RAW图像采用的拍摄参数。S602. Acquire shooting parameters used for shooting the RAW image.
具体的,所述拍摄参数表示执行拍摄时采用的拍摄参数,例如摄像头、图像传感器等等采用的参数。或者,拍摄参数还可以理解为执行拍摄时处理器对摄像头、图像传感器进行控制时产生的控制参数。Specifically, the shooting parameters refer to shooting parameters used when performing shooting, such as parameters used by a camera, an image sensor, and so on. Alternatively, shooting parameters can also be understood as control parameters generated when the processor controls the camera and the image sensor during shooting.
拍摄参数优选的可包括曝光值,可选的还可包括曝光时间(快门时间)、ISO感光度、光圈大小等中的一者或多者。The shooting parameter may preferably include an exposure value, and optionally may also include one or more of exposure time (shutter time), ISO sensitivity, aperture size, and the like.
电子设备的摄像头、图像传感器在相同环境下采用不一样的拍摄参数配置获取的图像,颜色特性会呈现出差异,因而拍摄参数为图像提供了拍摄时的物理条件。本申请可利用拍摄参数为光源颜色估计提供拍摄配置上的参考。The color characteristics of the images acquired by the camera and image sensor of the electronic device under the same environment with different shooting parameter configurations will show differences, so the shooting parameters provide the physical conditions for the image at the time of shooting. This application can use shooting parameters to provide a reference for shooting configuration for light source color estimation.
S603、将RAW图像处理为多通道图像。S603: Process the RAW image into a multi-channel image.
多通道图像是指每个像素点可以用多个图像通道的值(或颜色分量)来表示的图像。图像通道在RGB色彩模式下就是指在下就是指那单独的红色R、绿色G、蓝色B部分。A multi-channel image refers to an image in which each pixel can be represented by the values (or color components) of multiple image channels. In the RGB color mode, the image channel refers to the bottom, which refers to the individual red R, green G, and blue B parts.
例如,在一个示例中,所述多通道图像具体可以是彩色三通道图像,例如RGB三多通道图像。For example, in an example, the multi-channel image may specifically be a color three-channel image, such as an RGB three-channel image.
在又一个示例中,所述多通道图像具体可以是四通道图像,例如,可以是指RGGB四通道图像;或者,BGGR四通道图像;或者,RYYB四通道图像。In another example, the multi-channel image may specifically be a four-channel image, for example, it may refer to an RGGB four-channel image; or, a BGGR four-channel image; or, a RYYB four-channel image.
S604、将多通道图像和拍摄参数输入到神经网络模型,获得光源颜色信息。S604: Input the multi-channel image and shooting parameters into the neural network model to obtain light source color information.
也就是说,该神经网络模型能够根据所述拍摄参数和多通道图像获得白平衡处理中所需 的光源颜色信息。In other words, the neural network model can obtain the light source color information required in the white balance processing according to the shooting parameters and the multi-channel image.
其中,本申请实施例中描述的神经网络模型从类型上将可以是单个神经网络模型,也可以是两个或多个神经网络模型的组合。Among them, the neural network model described in the embodiments of the present application may be a single neural network model, or a combination of two or more neural network models.
参见图11,所述神经网络模型可以为图11所示的AWB神经网络模型。该AWB神经网络模型具体包括第一特征提取网络、特征融合网络和光源预测网络。Referring to FIG. 11, the neural network model may be the AWB neural network model shown in FIG. 11. The AWB neural network model specifically includes a first feature extraction network, a feature fusion network and a light source prediction network.
第一特征提取网络用于对RAW图像对应的通道图像进行特征提取,获得第一特征;所述第一特征用于表征该通道图像的颜色信息。具体实现中,第一特征提取网络可包括一个或多个卷积核,通过卷积处理实现对通道图像的像素的统计操作,从而获得所述第一特征。The first feature extraction network is used to perform feature extraction on the channel image corresponding to the RAW image to obtain the first feature; the first feature is used to characterize the color information of the channel image. In a specific implementation, the first feature extraction network may include one or more convolution kernels, and a statistical operation on the pixels of the channel image is implemented through convolution processing, so as to obtain the first feature.
特征融合网络用于对所述第一特征和所述拍摄参数进行融合,获得融合后的特征。融合方式不限于concat函数处理、conv2d函数处理、元素乘(elementwise multiply)处理、元素加(elementwise add)处理等操作中的一种或多种组合。例如,可以对上述二路信息(第一特征和拍摄参数)进行加权处理,获得融合后的特征。The feature fusion network is used to fuse the first feature and the shooting parameter to obtain the fused feature. The fusion method is not limited to one or more combinations of operations such as concat function processing, conv2d function processing, elementwise multiply processing, and elementwise add processing. For example, the aforementioned two-way information (the first feature and the shooting parameter) can be weighted to obtain the fused feature.
需要说明的是,在特征融合网络实现融合过程中,可以将拍摄参数进行扩充变成多维数组的形式,以匹配第一特征的数组形式,从而使二路数据的数学形式一致,以便于数据融合处理。It should be noted that in the process of fusion of the feature fusion network, the shooting parameters can be expanded into the form of a multi-dimensional array to match the array form of the first feature, so that the mathematical form of the two-way data is consistent, so as to facilitate data fusion deal with.
光源预测网络用于根据所述融合后的特征进行预测,获得光源颜色信息。光源颜色信息可用于指示光源的色温或者图像的色差,所以光源颜色信息可用于后续的AWB处理过程。The light source prediction network is used to make predictions based on the fused features to obtain light source color information. The light source color information can be used to indicate the color temperature of the light source or the color difference of the image, so the light source color information can be used in the subsequent AWB process.
例如,融合后的特征经光源预测网络处理后,光源预测网络输出图像光源值(r/g,1,b/g),该图像光源值可用于后续的AWB处理过程。For example, after the fused features are processed by the light source prediction network, the light source prediction network outputs the image light source value (r/g, 1, b/g), which can be used in the subsequent AWB processing process.
由上可以看到,AWB神经网络模型是通过融合通道图像的特征和拍摄参数来实现对光源颜色信息的预测的。As can be seen from the above, the AWB neural network model realizes the prediction of the color information of the light source by fusing the characteristics of the channel image and the shooting parameters.
在前述图1所示的独立的电子设备的应用场景下,AWB神经网络模型可配置在该电子设备中,利用电子设备中的处理器(例如CPU或者NPU)实现神经网络模型计算,从而获得该光源颜色信息。从而,在电子设备具备足够计算资源的情况下,充分利用电子设备的计算能力进行神经网络计算,提高处理效率,降低白平衡处理时延。具体硬件实现过程已在前文做了详细描述,这里不再赘述。In the application scenario of the independent electronic device shown in Figure 1 above, the AWB neural network model can be configured in the electronic device, and the processor (such as CPU or NPU) in the electronic device is used to implement the neural network model calculation to obtain the Light source color information. Therefore, when the electronic device has sufficient computing resources, the computing power of the electronic device is fully utilized to perform neural network calculations, which improves processing efficiency and reduces white balance processing delay. The specific hardware implementation process has been described in detail in the previous section, and will not be repeated here.
在前述图3所示的端-云应用场景下,AWB神经网络模型可配置在端-云系统中的云端服务器中。电子设备可以将多通道图像、由图像提取的场景语义信息和拍摄参数发给云端服务器,利用云端服务器中的处理器(例如CPU或者NPU)实现神经网络模型计算,从而获得该光源颜色信息,云端服务器再将光源颜色信息反馈到电子设备。从而,在电子设备计算能力不够强的情况下,也能够利用云端服务器的计算能力进行神经网络模型计算,保证白平衡处理的准确性和稳定性,使得本申请方案能够适用于不同类型的设备,提升用户体验。具体实现过程已在前文做了详细描述,这里不再赘述。In the aforementioned end-cloud application scenario shown in Figure 3, the AWB neural network model can be configured in the cloud server in the end-cloud system. The electronic device can send the multi-channel image, the scene semantic information extracted from the image, and the shooting parameters to the cloud server, and use the processor (such as CPU or NPU) in the cloud server to realize the neural network model calculation to obtain the color information of the light source. The server then feeds back the color information of the light source to the electronic device. Therefore, when the computing power of the electronic device is not strong enough, the computing power of the cloud server can also be used to calculate the neural network model to ensure the accuracy and stability of the white balance processing, so that the solution of this application can be applied to different types of devices. Improve user experience. The specific implementation process has been described in detail above, and will not be repeated here.
S605、根据所述光源颜色信息对多通道图像进行白平衡处理,获得目标图像并通过显示屏进行显示。S605: Perform white balance processing on the multi-channel image according to the color information of the light source to obtain a target image and display it on the display screen.
具体的,AWB神经网络模型输出光源颜色信息(例如图像光源值)后,电子设备可通过自身配置的ISP利用该光源颜色信息对通道图像进行白平衡处理,从而实现对光源色温引起的图像色差的校正,以使得图像中物体的颜色接近其原本的颜色,图像整体呈现出的效果符合人眼的视觉习惯和认知习惯。Specifically, after the AWB neural network model outputs light source color information (such as image light source value), the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, so as to realize the image color difference caused by the light source color temperature. Correction so that the color of the object in the image is close to its original color, and the overall effect of the image is in line with the visual and cognitive habits of the human eye.
可以看到,本申请实施例利用了RAW图像对应的多通道图像而非统计特征作为AWB神 经网络模型的输入,为AWB神经网络模型提供了更多的颜色信息。又增加了拍摄参数作为AWB神经网络模型的输入,例如快门速度,曝光时间,曝光值,ISO,光圈大小等参数等等中的一个或多个,为光源估计提供了拍摄配置信息,为拍摄获得的RAW图提供了拍照条件的参考。以拍摄参数作为神经网络模型的输入可以帮助网络提升光源预测的准确性。能提高AWB神经网络模型针对不同光源场景的辨别能力,保证了良好的光源估计精度。It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image instead of the statistical feature as the input of the AWB neural network model, which provides more color information for the AWB neural network model. Added shooting parameters as the input of the AWB neural network model, such as one or more of the shutter speed, exposure time, exposure value, ISO, aperture size, etc., to provide shooting configuration information for light source estimation, and to obtain The RAW image provides a reference for shooting conditions. Taking the shooting parameters as the input of the neural network model can help the network improve the accuracy of light source prediction. It can improve the discrimination ability of the AWB neural network model for different light source scenes and ensure good light source estimation accuracy.
例如,本申请实施例中AWB神经网络模型可适用于全场景,在模型训练时使用了大量的训练数据,训练数据中包含了在亮光场景下获得的数据和在暗光场景下获得的数据。在海量数据中,神经网络很难在全场景下实现高精度的拟合,而加入的摄像参数可以为拍摄场景提供先验信息,帮助神经网络区分亮光场景和暗光场景,从而提升这两类场景的光源估计精度。For example, the AWB neural network model in the embodiment of the present application can be applied to the entire scene, and a large amount of training data is used during model training. The training data includes data obtained in a bright light scene and data obtained in a dark light scene. In the massive data, it is difficult for the neural network to achieve high-precision fitting in the whole scene, and the added camera parameters can provide a priori information for the shooting scene, help the neural network to distinguish between bright light scenes and dark light scenes, thereby improving the two types The light source estimation accuracy of the scene.
需要说明的是,以上仅仅提供了一种示例作为说明,摄像参数不仅仅可以用于区分亮光场景和暗光场景,还可以用于区别其他的场景,例如可以用来区分室外室内,白天夜晚等不同类别属性的场景。所以加入摄像参数作为神经网络的输入,可以有效提神模型在这些类别场景下的光源估计精度,从而提升总体的光源估计精度。It should be noted that the above only provides an example as an illustration. The camera parameters can not only be used to distinguish bright light scenes and dark light scenes, but also other scenes, such as outdoor and indoor, day and night, etc. Scenes of different categories of attributes. Therefore, adding camera parameters as the input of the neural network can effectively refresh the model's light source estimation accuracy in these types of scenes, thereby improving the overall light source estimation accuracy.
另外,模型输入中,对于快门速度、曝光时间、曝光值、ISO、光圈大小等摄像参数中选择哪些参数,可以根据电子设备的实际可获取信息作为选择的基础。以上的提到的摄像参数中的一者或多者皆可为图像提供拍摄条件的参考,均对提升网络精度有一定的帮助,实际运用需要根据硬件软件条件弹性做出选择。In addition, in the model input, which parameters are selected among the camera parameters such as shutter speed, exposure time, exposure value, ISO, aperture size, etc., can be selected based on the actual available information of the electronic device. One or more of the above-mentioned camera parameters can provide a reference for the shooting conditions of the image, which are all helpful to improve the accuracy of the network. The actual application requires flexible selection according to the hardware and software conditions.
所以,实施本申请有利于提高电子设备的白平衡准确性,提高单帧拍照以及视频场景的AWB的稳定性,以及在多光源等歧义场景下的倾向稳定性。Therefore, the implementation of this application is beneficial to improve the white balance accuracy of electronic devices, the stability of AWB in single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
参见图12,图12是本申请实施例提供的又一种图像自动白平衡的方法的流程示意图,所述方法可应用于电子设备,该方法与前述图10描述方法的主要区别在于,神经网络模型的计算过程还利用了场景语义信息,以进一步提高光源颜色信息预测的准确性。该方法包括但不限于以下步骤:Referring to FIG. 12, FIG. 12 is a schematic flowchart of another method for image automatic white balance provided by an embodiment of the present application. The method can be applied to electronic devices. The main difference between this method and the method described in FIG. 10 is that the neural network The calculation process of the model also uses scene semantic information to further improve the accuracy of light source color information prediction. The method includes but is not limited to the following steps:
S701、拍摄至少一帧原始RAW域图像。S701. Take at least one frame of original RAW domain image.
S702、获取拍摄RAW图像时采用的拍摄参数。S702. Acquire shooting parameters used when shooting the RAW image.
S703、对RAW图进行处理,获得多通道图像。S703. Process the RAW image to obtain a multi-channel image.
上述S701-S703的实现过程可类似参考前文步骤S601-S603的描述,这里不再赘述。The implementation process of the foregoing S701-S703 can be similar to the description of the foregoing steps S601-S603, which will not be repeated here.
S704、提取多通道图像的场景语义信息。S704: Extract scene semantic information of the multi-channel image.
在不同的拍摄场景下,光源颜色可能有所不同,例如对于在室内人像拍摄场景,光源可能是白炽灯。对于室外景观拍摄场景,光源可能是太阳或路灯。本申请实施例为了进一步提高光源颜色信息预测的准确性,可利用场景语义信息为光源颜色估计提供拍摄场景上的参考。In different shooting scenes, the color of the light source may be different. For example, for indoor portrait shooting scenes, the light source may be an incandescent lamp. For outdoor landscape shooting scenes, the light source may be the sun or street lights. In order to further improve the accuracy of the light source color information prediction, the embodiments of the present application may use the scene semantic information to provide a reference on the shooting scene for the light source color estimation.
本申请实施例中,场景语义信息表示由图像所表征的与拍摄场景相关的语义特征。具体实现中,可以定义多种形式的拍摄场景类型。In the embodiment of the present application, the scene semantic information represents the semantic features related to the shooting scene represented by the image. In specific implementation, various types of shooting scenes can be defined.
例如,可将拍摄场景基于光源类型进行分类,例如分类为冷光源场景、暖光源场景、单光源场景、多光源场景,等等。For example, shooting scenes can be classified based on light source types, such as cold light source scenes, warm light source scenes, single light source scenes, multiple light source scenes, and so on.
又例如,可将拍摄场景基于图像内容进行分类,例如分类为人像拍摄场景、非人像拍摄场景、物体拍摄场景、景观拍摄场景,等等。For another example, shooting scenes can be classified based on image content, such as portrait shooting scenes, non-portrait shooting scenes, object shooting scenes, landscape shooting scenes, and so on.
此外,拍摄场景还可以是上述多种场景的综合。另外,还可以基于实际应用需要定义其 他形式的拍摄场景,本申请实施例对此不做限定。In addition, the shooting scene can also be a combination of the above-mentioned multiple scenes. In addition, other types of shooting scenes may also be defined based on actual application needs, which is not limited in the embodiment of the present application.
具体的,可以通过一种或多种预设提取算法对多通道图像进行场景语义信息的提取。Specifically, one or more preset extraction algorithms can be used to extract scene semantic information from a multi-channel image.
例如,所述预设提取算法可以是场景分类算法、图像场景分割算法、物体检测算法、人像分割算法、人脸检测算法、人体检测算法、肤色分割算法、或物体检测算法等等中的一种或多种组合。For example, the preset extraction algorithm may be one of a scene classification algorithm, an image scene segmentation algorithm, an object detection algorithm, a portrait segmentation algorithm, a face detection algorithm, a human detection algorithm, a skin color segmentation algorithm, or an object detection algorithm, etc. Or multiple combinations.
比如,通过场景分类算法实现人脸与非人脸的分类、单光源与多光源的分类、光源色温分类、或室内外场景分类,等等。For example, the scene classification algorithm is used to realize the classification of faces and non-faces, the classification of single light sources and multiple light sources, the color temperature classification of light sources, or the classification of indoor and outdoor scenes, and so on.
又比如,可使用图像场景分割算法对图片进行分割,生成蒙版图;可选的也可以使用场景分类算法、物体检测算法、人脸检测算法、肤色分割算法等技术生成蒙版图。该蒙版图可以为本申请提供的AWB神经网络模型提供单帧图像以外更多的与拍摄场景相关的信息,从而提升AWB神经网络对不同拍摄场景的注意力,帮助神经网络拟合收敛,达到更高的预测精度。For another example, the image scene segmentation algorithm can be used to segment the picture to generate a mask map; alternatively, the scene classification algorithm, object detection algorithm, face detection algorithm, skin color segmentation algorithm and other technologies can also be used to generate the mask map. The mask map can provide more information related to the shooting scene than a single frame of the AWB neural network model provided by this application, thereby enhancing the AWB neural network's attention to different shooting scenes, helping the neural network to fit and converge, and achieve better High prediction accuracy.
又比如,对图像的场景信息的提取可以不采用场景分割技术,而是采用物体检测算法提取场景语义信息,将生成的物体类别框生成场景类别蒙版图送入AWB神经网络。这样,可以利用物体检测技术代替场景分割进行场景语义信息的提取,简化了场景信息提取的复杂度,提升了运算速度,降低了运算复杂度,减小了性能开销。For another example, the scene information extraction of the image may not use the scene segmentation technology, but the object detection algorithm is used to extract the scene semantic information, and the generated object category box is generated to generate the scene category mask map and sent to the AWB neural network. In this way, object detection technology can be used instead of scene segmentation to extract scene semantic information, which simplifies the complexity of scene information extraction, increases the calculation speed, reduces the calculation complexity, and reduces the performance overhead.
需要说明的是,以上实施例中,场景语义信息作为辅助输入不一定是蒙版图的形式,还可以是其他形式,例如,采用场景分类算法对图像进行处理后,输出的可以是一串分类置信度(向量),以向量的形式作为神经网络模型的输入。It should be noted that in the above embodiment, the scene semantic information as an auxiliary input is not necessarily in the form of a mask map, but may also be in other forms. For example, after the image is processed by a scene classification algorithm, the output may be a series of classification confidences. Degree (vector), in the form of a vector as the input of the neural network model.
S705、将多通道图像、场景语义信息和拍摄参数输入到神经网络模型,获得光源颜色信息。S705: Input the multi-channel image, scene semantic information and shooting parameters into the neural network model to obtain light source color information.
也就是说,该神经网络模型能够根据所述拍摄参数、场景语义信息和多通道图像获得白平衡处理中所需的光源颜色信息。In other words, the neural network model can obtain the light source color information required in the white balance processing according to the shooting parameters, scene semantic information, and multi-channel images.
其中,本申请实施例中描述的神经网络模型从类型上将可以是单个神经网络模型,也可以是两个或多个神经网络模型的组合。Among them, the neural network model described in the embodiments of the present application may be a single neural network model, or a combination of two or more neural network models.
参见图13,所述神经网络模型可以为图13所示的AWB神经网络模型。该AWB神经网络模型具体包括第一特征提取网络、第二特征提取网络、特征融合网络和光源预测网络。Referring to FIG. 13, the neural network model may be the AWB neural network model shown in FIG. 13. The AWB neural network model specifically includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network.
第一特征提取网络用于对RAW图像对应的通道图像进行特征提取,获得第一特征;所述第一特征用于表征该通道图像的颜色信息。The first feature extraction network is used to perform feature extraction on the channel image corresponding to the RAW image to obtain the first feature; the first feature is used to characterize the color information of the channel image.
在可选的实施例中,第一特征提取网络可包括一个或多个小卷积核,通过卷积处理实现对通道图像的像素的统计操作,从而获得所述第一特征。In an alternative embodiment, the first feature extraction network may include one or more small convolution kernels, and the statistical operation on the pixels of the channel image is realized through convolution processing, so as to obtain the first feature.
第二特征提取网络用于对场景语义信息进行特征提取,获得第二特征,所述第二特征用于表征该通道图像对应的场景信息。The second feature extraction network is used to perform feature extraction on the scene semantic information to obtain a second feature, and the second feature is used to characterize the scene information corresponding to the channel image.
在可选的实施例中,第二特征提取网络可包括一个或多个大卷积核,通过卷积处理实现对通道图像的场景信息的解析/感知,从而获得所述第二特征。In an optional embodiment, the second feature extraction network may include one or more large convolution kernels, and the analysis/perception of the scene information of the channel image is realized through convolution processing, so as to obtain the second feature.
需要注意的是,所谓的“大卷积核”和“小卷积核”在概念上是彼此相对而言的,也就是说,在可选的方案中,可以设置第二特征提取网络中的卷积核的规模大于第一特征提取网络中的卷积核,以实现对图像更大范围的感知能力,以便于获得更加准确的场景信息。It should be noted that the so-called "large convolution kernel" and "small convolution kernel" are conceptually relative to each other, that is to say, in an optional solution, the second feature extraction network can be set The scale of the convolution kernel is larger than the convolution kernel in the first feature extraction network, so as to realize a larger range of image perception ability, so as to obtain more accurate scene information.
特征融合网络用于对所述第一特征、所述第二特征和所述拍摄参数进行融合,获得融合后的特征。融合方式不限于concat函数处理、conv2d函数处理、元素乘(elementwise multiply) 处理、元素加(elementwise add)处理等操作中的一种或多种组合。例如,可以对上述三路信息(第一特征、第二特征和拍摄参数)进行加权处理,获得融合后的特征。The feature fusion network is used to fuse the first feature, the second feature, and the shooting parameter to obtain the fused feature. The fusion method is not limited to one or more combinations of operations such as concat function processing, conv2d function processing, elementwise multiply processing, and elementwise add processing. For example, the aforementioned three-way information (the first feature, the second feature, and the shooting parameters) can be weighted to obtain the fused feature.
需要说明的是,在特征融合网络实现融合过程中,可以将拍摄参数进行扩充变成多维数组的形式,以匹配第一特征、第二特征的数组形式,从而使三路数据的数学形式一致,以便于数据融合处理。It should be noted that in the process of fusion of the feature fusion network, the shooting parameters can be expanded into the form of a multi-dimensional array to match the array form of the first feature and the second feature, so that the mathematical form of the three-way data is consistent. To facilitate data fusion processing.
光源预测网络用于根据所述融合后的特征进行预测,获得光源颜色信息。光源颜色信息可用于指示光源的色温或者图像的色差,所以光源颜色信息可用于后续的AWB处理过程。The light source prediction network is used to make predictions based on the fused features to obtain light source color information. The light source color information can be used to indicate the color temperature of the light source or the color difference of the image, so the light source color information can be used in the subsequent AWB process.
例如,融合后的特征经光源预测网络处理后,光源预测网络输出图像光源值(r/g,1,b/g),该图像光源值可用于后续的AWB处理过程。For example, after the fused features are processed by the light source prediction network, the light source prediction network outputs the image light source value (r/g, 1, b/g), which can be used in the subsequent AWB processing process.
由上可以看到,AWB神经网络模型是通过融合通道图像的特征、场景语义信息的特征和拍摄参数来实现对光源颜色信息的预测的。It can be seen from the above that the AWB neural network model realizes the prediction of the color information of the light source by fusing the characteristics of the channel image, the characteristics of the scene semantic information, and the shooting parameters.
在前述图1所示的独立的电子设备的应用场景下,AWB神经网络模型可配置在该电子设备中,利用电子设备中的处理器(例如CPU或者NPU)实现神经网络模型计算,从而获得该光源颜色信息。具体硬件实现过程已在前文做了详细描述,这里不再赘述。In the application scenario of the independent electronic device shown in Figure 1 above, the AWB neural network model can be configured in the electronic device, and the processor (such as CPU or NPU) in the electronic device is used to implement the neural network model calculation to obtain the Light source color information. The specific hardware implementation process has been described in detail in the previous section, and will not be repeated here.
在前述图3所示的端-云应用场景下,AWB神经网络模型可配置在端-云系统中的云端服务器中。电子设备可以将多通道图像、由图像提取的场景语义信息和拍摄参数发给云端服务器,利用云端服务器中的处理器(例如CPU或者NPU)实现神经网络模型计算,从而获得该光源颜色信息,云端服务器再将光源颜色信息反馈到电子设备。具体实现过程已在前文做了详细描述,这里不再赘述。In the aforementioned end-cloud application scenario shown in Figure 3, the AWB neural network model can be configured in the cloud server in the end-cloud system. The electronic device can send the multi-channel image, the scene semantic information extracted from the image, and the shooting parameters to the cloud server, and use the processor (such as CPU or NPU) in the cloud server to realize the neural network model calculation to obtain the color information of the light source. The server then feeds back the color information of the light source to the electronic device. The specific implementation process has been described in detail above, and will not be repeated here.
S706、根据光源颜色信息对多通道图像进行白平衡处理,获得目标图像并通过显示屏进行显示。具体可类似参考前述步骤S605的描述,这里不再赘述。S706: Perform white balance processing on the multi-channel image according to the color information of the light source to obtain the target image and display it on the display screen. For details, please refer to the description of step S605, which is not repeated here.
可以看到,本申请实施例利用了RAW图像对应的多通道图像而非统计特征作为AWB神经网络模型的输入,为AWB神经网络模型提供了更多的颜色信息。又增加了场景语义信息和拍摄参数作为AWB神经网络模型的输入,为光源估计提供了更多有效的先验知识(拍摄配置信息和场景信息),大大增强了AWB神经网络模型针对不同光源场景的辨别能力,提升了整体的光源估计精度,能有效帮助神经网络收敛拟合。It can be seen that the embodiment of the present application uses the multi-channel image corresponding to the RAW image instead of the statistical feature as the input of the AWB neural network model, and provides more color information for the AWB neural network model. The scene semantic information and shooting parameters are added as the input of the AWB neural network model, which provides more effective prior knowledge (shooting configuration information and scene information) for light source estimation, and greatly enhances the AWB neural network model for different light source scenes. The discrimination ability improves the overall light source estimation accuracy and can effectively help the neural network to converge and fit.
其中,场景语义信息可以很大程度的为图像提供先验的语义信息,帮助AWB神经网络对不同场景做以区分,进而提升AWB神经网络的整体精度。Among them, the scene semantic information can provide a priori semantic information for the image to a large extent, help the AWB neural network to distinguish different scenes, and then improve the overall accuracy of the AWB neural network.
例如,在模型训练时,对于海量训练数据,神经网络很难在全场景下实现高精度的拟合。比如人脸在不同光源条件下,网络出值不稳,影响肤色感官,这时如果加入了人脸检测信息作为场景语义信息输入神经网络,神经网络在训练过程中会提升人脸区域的注意力,从而提升网络在人脸场景下的拟合精度。For example, during model training, it is difficult for neural networks to achieve high-precision fitting in the entire scene for massive training data. For example, under different light source conditions, the network output is unstable, which affects the skin color sense. At this time, if face detection information is added as the scene semantic information into the neural network, the neural network will increase the attention of the face area during the training process. , So as to improve the fitting accuracy of the network in the face scene.
又例如,如果神经网络在蓝天、草地等场景下表现不好,可以引入图像分割技术,将分割得到的天空区域草地区域作为场景信息输入神经网络,神经网络就会增加在天空场景以及草地场景的注意力,从而提升该场景下的光源估计精度。For another example, if the neural network does not perform well in blue sky, grass and other scenes, image segmentation technology can be introduced, and the segmented sky area and grass area are input into the neural network as scene information, and the neural network will increase the number of sky scenes and grass scenes. Attention, so as to improve the accuracy of light source estimation in this scene.
需要说明的是,本申请实施例中提供了很多种形式的场景语义信息,而在实际应用中,具体采取哪些类型的场景语义信息可根据AWB在不同场景下的需求决定,本申请不做特别限定,包括场景语义信息的具体内容以及获得方式均不做限定,比如可以使用图像分割、实例分割、人脸检测、人体检测、骨架检测、场景分类等等提取技术中的一种或多做来获得场景语义信息作为AWB神经网络的输入。It should be noted that many forms of scene semantic information are provided in the embodiments of this application. In actual applications, the specific types of scene semantic information to be adopted can be determined according to the needs of AWB in different scenarios. This application does not make special decisions. Restriction, including the specific content of the scene semantic information and the method of obtaining it are not restricted. For example, one or more of the extraction techniques of image segmentation, instance segmentation, face detection, human body detection, skeleton detection, scene classification, etc. can be used. The semantic information of the scene is obtained as the input of the AWB neural network.
所以,实施本申请能够提高电子设备在全场景下拍摄的白平衡准确性,提高单帧拍照以及视频场景的AWB的稳定性,以及在多光源等歧义场景下的倾向稳定性。Therefore, the implementation of this application can improve the white balance accuracy of electronic equipment shooting in full scenes, improve the stability of AWB for single-frame photography and video scenes, and the stability of tendencies in ambiguous scenes such as multiple light sources.
为了更好理解本申请上述实施例提供的方法,下面描述一种更加详细的实施例,参见图14,该方法可应用于电子设备,包括但不限于以下步骤:In order to better understand the method provided by the foregoing embodiment of the present application, a more detailed embodiment is described below. Referring to FIG. 14, the method can be applied to an electronic device, including but not limited to the following steps:
S801、拍摄至少一张RAW图像。S801. Shoot at least one RAW image.
S802、获取拍摄RAW图像采用的拍摄参数。S802. Acquire shooting parameters used for shooting the RAW image.
当用户在终端的交互界面上进行拍照,在进行拍照动作时,手机采集一帧BAYER格式的RAW图,同时获取到拍摄该图片时对应的拍摄参数。When the user takes a photo on the interactive interface of the terminal, the mobile phone collects a frame of RAW image in BAYER format while taking a photo, and at the same time obtains the corresponding shooting parameters when the picture is taken.
拍摄参数的选用可以选择曝光值,快门时间,光圈大小,ISO感光度等参数。因为手机的传感器在相同环境不一样的参数配置下获取的图片,颜色特性会呈现出差异,因而拍摄参数为图像提供了拍摄时的条件,为光源估计算法提供了参考。The selection of shooting parameters can choose the exposure value, shutter time, aperture size, ISO sensitivity and other parameters. Because the color characteristics of the pictures acquired by the sensor of the mobile phone under the same environment and different parameter configurations will show differences, the shooting parameters provide the conditions for the image when shooting, and provide a reference for the light source estimation algorithm.
S803、对RAW图进行预处理,获得彩色三通道图像,例如RGB三多通道图像,对于RGB三多通道图像,每个像素都有红绿蓝三个分量。S803: Perform preprocessing on the RAW image to obtain a color three-channel image, such as an RGB three-channel image. For an RGB three-channel image, each pixel has three components of red, green and blue.
参见图15,对RAW图进行预处理过程例如可由电子设备的ISP执行,预处理过程包括了生成彩色三通道图像所经历的所有图像处理环节。Referring to FIG. 15, the preprocessing process of the RAW image can be performed by, for example, the ISP of the electronic device, and the preprocessing process includes all the image processing steps experienced in generating the color three-channel image.
具体的,图15示出了一种预处理过程的示例,该过程可包括黑电平校正(Black Level Correction,BLC)和镜头阴影校正(Lens Shade Correction,LSC),通过BLC处理可减少暗电流对图像信号的影响,通过LSC处理可消除渐晕现象给图像带来的影响。可选的,还包括图像下采样处理和降噪处理。一种具体的实现过程描述如下:Specifically, FIG. 15 shows an example of a preprocessing process, which may include Black Level Correction (BLC) and Lens Shade Correction (LSC). BLC processing can reduce dark current. The influence on the image signal can be eliminated by the LSC processing. Optionally, it also includes image down-sampling processing and noise reduction processing. A specific implementation process is described as follows:
可以先将RAW图下采样至适合网络输入的尺寸大小,以提升后续的运算速度。再对下采样后的图进行简单的降噪处理(降噪处理过程尽量避免影响图像颜色)。之后再进行BLC处理和LSC处理,以消除图像传感器的电平偏置和摄像头的凸透镜成像带来的亮度以及颜色不均匀的影响。经过上述处理之后的RAW图是Bayer格式,需要经过去马赛克的操作得到彩色三通道图像。为了不影响颜色,去马赛克的操作可以简化成平均绿色通道,并对红蓝绿进行重新的排布,从而获得彩色三通道图像。You can first downsample the RAW image to a size suitable for the network input to improve the subsequent calculation speed. Then perform a simple noise reduction process on the down-sampled image (the process of noise reduction process should try to avoid affecting the color of the image). After that, BLC processing and LSC processing are performed to eliminate the influence of the level offset of the image sensor and the brightness and color unevenness caused by the imaging of the convex lens of the camera. After the above processing, the RAW image is in Bayer format, and it needs to undergo demosaicing to obtain a color three-channel image. In order not to affect the color, the demosaicing operation can be simplified to an average green channel, and red, blue and green are re-arranged to obtain a color three-channel image.
用简化的去马赛克操作会使彩色三通道图像的长宽尺寸是下采样的RAW图像长宽的一半(参考前图9所示),可提升后续算法运算速度。Using a simplified demosaicing operation will make the length and width of the color three-channel image be half the length and width of the down-sampled RAW image (refer to Figure 9 above), which can increase the speed of subsequent algorithms.
需要说明的是,预处理的过程还可以包括其他的处理算法,本申请其他实施例对此不作限制。It should be noted that the preprocessing process may also include other processing algorithms, which are not limited in other embodiments of the present application.
S804、提取多通道图像的场景语义信息。S804: Extract scene semantic information of the multi-channel image.
S805、将多通道图像、场景语义信息和拍摄参数输入到神经网络模型,获得光源颜色信息。S805: Input the multi-channel image, scene semantic information and shooting parameters into the neural network model to obtain light source color information.
上述S804-S805的具体实现可类似参考S704-705的描述,这里不再赘述。The specific implementation of the above S804-S805 can be similar to the description of S704-705, which will not be repeated here.
S806、利用光源颜色信息对图像进行白平衡处理。S806: Perform white balance processing on the image by using the color information of the light source.
具体的,AWB神经网络模型输出光源颜色信息(例如图像光源值)后,电子设备可通过自身配置的ISP利用该光源颜色信息对通道图像进行白平衡处理,从而实现对光源色温引起的图像色差的校正,以使得图像中物体的颜色接近其原本的颜色,图像整体呈现出的效果符合人眼的视觉习惯和认知习惯。Specifically, after the AWB neural network model outputs light source color information (such as image light source value), the electronic device can use the light source color information to perform white balance processing on the channel image through its own configured ISP, so as to realize the image color difference caused by the light source color temperature. Correction so that the color of the object in the image is close to its original color, and the overall effect of the image is in line with the visual and cognitive habits of the human eye.
S807、对白平衡处理后的图像进一步进行图像增强处理,得到最后用于显示的目标图像。S807: Further perform image enhancement processing on the image after the white balance processing, to obtain the final target image for display.
参见图15,对RAW图进行图像增强处理的过程例如可由电子设备的ISP执行,也可以由电子设备的其他器件执行,例如由现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)或数字信号处理器(Digital Signal Processor,DSP)执行。Referring to FIG. 15, the process of image enhancement processing on the RAW image can be executed by the ISP of the electronic device, for example, or by other devices of the electronic device, such as a Field Programmable Gate Array (FPGA) or a digital signal. The processor (Digital Signal Processor, DSP) executes.
本申请实施例中,对白平衡处理后的图像还可以通过一些图像增强算法来实现后处理,以进一步提升图像的质量,得到最后用于显示的目标图像,输出到电子设备的显示屏进行显示。In the embodiments of the present application, the white balance processed image may also be post-processed through some image enhancement algorithms to further improve the image quality, obtain the final target image for display, and output it to the display screen of the electronic device for display.
其中,图像增强算法例如可包括伽马矫正、对比度增强、动态范围增强或图像锐化等操作。Among them, the image enhancement algorithm may include, for example, operations such as gamma correction, contrast enhancement, dynamic range enhancement, or image sharpening.
需要说明的是,后处理过程还可以根据实际应用需要采用其他的处理算法,本申请其他实施例对此不作限制。It should be noted that the post-processing process may also adopt other processing algorithms according to actual application needs, which is not limited in other embodiments of the present application.
为了更好理解本申请方案,下面以图2所示的电子设备10的结构为例对方案做一种示例性的描述:In order to better understand the solution of the present application, the following takes the structure of the electronic device 10 shown in FIG. 2 as an example to describe the solution as an example:
在用户使用电子设备10执行拍摄时,CPU131控制摄像头111采集拍摄环境的光信号,图像传感器112将摄像头111捕捉到的光信号转化为数字信号,从而获得一张或多张RAW图像,RAW图像进一步被送到ISP12。ISP12执行预处理将所述RAW图像处理为彩色三通道图像,并提取所述彩色三通道图像的场景语义信息。所述彩色三通道图像和所述场景语义信息被进一步输入到NPU132,以及,CPU131对摄像头111、图像传感器112的控制参数(快门,曝光时间,光圈大小等)也被送入到NPU132。NPU132根据输入数据,执行AWB神经网络模型的计算,获得光源颜色值(r/g,1,b/g);并将光源颜色值返回给ISP12。ISP12根据光源颜色值执行白平衡处理,并对白平衡处理后的图像采用图像增强算法进一步优化,获得目标图像。目标图像通过CPU131进一步被送到显示装置14进行显示。When the user uses the electronic device 10 to perform shooting, the CPU 131 controls the camera 111 to collect light signals from the shooting environment, and the image sensor 112 converts the light signals captured by the camera 111 into digital signals, thereby obtaining one or more RAW images. Was sent to ISP12. The ISP12 performs preprocessing to process the RAW image into a color three-channel image, and extracts scene semantic information of the color three-channel image. The color three-channel image and the scene semantic information are further input to the NPU 132, and the control parameters (shutter, exposure time, aperture size, etc.) of the CPU 131 for the camera 111 and the image sensor 112 are also input to the NPU 132. According to the input data, the NPU132 executes the calculation of the AWB neural network model to obtain the light source color value (r/g, 1, b/g); and returns the light source color value to the ISP12. ISP12 performs white balance processing according to the color value of the light source, and uses image enhancement algorithms to further optimize the white balance processed image to obtain the target image. The target image is further sent to the display device 14 through the CPU 131 for display.
可以看到,本申请实施例相比图12所示的实施例而言,在实现较佳的AWB的基础上,还提供了图像预处理过程的细化实现方式和图像后处理阶段的细化实现方式。通过预处理过程的引入,不仅便于快速、高效地生成多通道图像,以便于实现本申请的AWB方法的实现,还有利于提升图像质量(例如减少暗电流影响、降噪、消除渐晕现象等等)和神经网络算法运算速度。通过后处理过程的引入,能够进一步提升图像的质量,满足用户的应用需求,提升用户的观感体验。It can be seen that, compared with the embodiment shown in FIG. 12, the embodiment of the present application, on the basis of achieving better AWB, also provides a refinement of the image preprocessing process and refinement of the image post-processing stage. Method to realize. The introduction of the preprocessing process not only facilitates the rapid and efficient generation of multi-channel images, so as to realize the realization of the AWB method of the present application, but also helps to improve the image quality (for example, reducing the influence of dark current, reducing noise, eliminating vignetting, etc. Etc.) and neural network algorithm computing speed. Through the introduction of the post-processing process, the quality of the image can be further improved, the application needs of the user can be met, and the user's look and feel experience can be improved.
为了更全面理解本申请方案,下面从电子设备(例如手机)的应用程序角度更加显性地对AWB方案进行描述。结合参见图17和图18,该方案包括但不限于以下步骤:In order to fully understand the solution of the present application, the following describes the AWB solution more explicitly from the perspective of the application program of an electronic device (such as a mobile phone). With reference to Figure 17 and Figure 18, the solution includes but is not limited to the following steps:
S901、检测到用户指示所述摄像头执行拍摄的操作。S901: It is detected that the user instructs the camera to perform a shooting operation.
其中,所述指示执行拍摄的操作例如可以是触摸、点击、声控、键控、遥控等等用于触发电子设备拍摄的方式。例如用户用于指示拍摄行为的操作可以包括按下电子设备的相机中的拍摄按钮,也可以包括用户设备通过语音指示电子设备进行拍摄行为,或者,也可以是指用户通过快捷键指示电子设备进行拍摄行为,还可以包括用户其它的指示电子设备进行拍摄行为。本申请不做具体限定。Wherein, the operation for instructing to perform shooting may be, for example, touch, click, voice control, key control, remote control, etc., for triggering the electronic device to shoot. For example, the operation used by the user to instruct the shooting behavior may include pressing the shooting button in the camera of the electronic device, or may include the user device instructing the electronic device to perform the shooting behavior through voice, or it may also mean that the user instructs the electronic device to perform the shooting behavior through a shortcut key. The shooting behavior may also include other user instructing the electronic device to perform the shooting behavior. This application does not make specific restrictions.
在一种可能的实现方式中,在步骤S901之前还包括:检测到用户用于打开相机的操作;响应于该操作,在电子设备的显示屏上显示拍摄界面。In a possible implementation manner, before step S901, the method further includes: detecting an operation for opening the camera by the user; in response to the operation, displaying a shooting interface on the display screen of the electronic device.
例如,电子设备可以检测到用户点击桌面上的相机应用(application,APP)的图标的操作后,可以启动相机应用,显示拍摄界面。For example, after the electronic device detects that the user has clicked an icon of a camera application (application, APP) on the desktop, it can start the camera application and display the shooting interface.
如图18中的(a)示出了手机的一种拍摄界面91的图形用户界面(graphical user interface,GUI),该拍摄界面91包括拍摄控件93,以及其它拍摄选项。在电子设备检测到用户点击拍摄控件93后,手机执行拍摄流程。(A) in FIG. 18 shows a graphical user interface (GUI) of a shooting interface 91 of a mobile phone. The shooting interface 91 includes a shooting control 93 and other shooting options. After the electronic device detects that the user clicks on the shooting control 93, the mobile phone executes the shooting process.
示例性地,该拍摄界面91上还可以包括取景框92;电子设备在启动相机后,在预览状态下,该取景框92内可以实时显示预览图像。可以理解的是,在拍照模式和录像模式下,取景框的大小可以不同。例如,取景框可以为拍照模式下的取景框。在录像模式下,取景框可以为整个显示屏。在预览状态下即可以是用户打开相机且未按下拍照/录像按钮之前,该取景框内可以实时显示预览图像。Exemplarily, the shooting interface 91 may further include a viewing frame 92; after the electronic device starts the camera, in the preview state, the preview image can be displayed in the viewing frame 92 in real time. It is understandable that the size of the viewfinder frame can be different in the photo mode and the video mode. For example, the viewfinder frame may be the viewfinder frame in the photographing mode. In the video mode, the viewfinder frame can be the entire display screen. In the preview state, that is, before the user turns on the camera and does not press the photo/video button, the preview image can be displayed in the viewfinder frame in real time.
S902、响应于所述操作,在显示屏上显示目标图像。S902. In response to the operation, display the target image on the display screen.
其中,所述目标图像是利用神经网络模型实现白平衡处理后获得的,所述神经网络模型用于根据输入数据获得所述白平衡处理所需的光源颜色信息。Wherein, the target image is obtained after white balance processing is implemented using a neural network model, and the neural network model is used to obtain light source color information required for the white balance processing according to input data.
示例性地,如图18所示,响应于用户的指示操作,手机在后台执行拍摄流程,包括:通过摄像头进行拍摄,获得RAW图像;通过ISP执行预处理将所述RAW图像处理为彩色三通道图像;利用执行AWB神经网络模型根据输入数据进行计算,获得光源颜色信息,并基于光源颜色信息实现白平衡处理,后续还可以采用图像增强算法进一步优化,获得目标图像。目标图像通过显示屏进行显示。例如图18中的(b)示出了一种基于相册的显示界面94的GUI,该显示界面94中可显示目标图像95。Exemplarily, as shown in FIG. 18, in response to the user's instruction operation, the mobile phone executes the shooting process in the background, including: shooting through the camera to obtain a RAW image; performing preprocessing through the ISP to process the RAW image into a three-channel color Image; use the AWB neural network model to perform calculations based on input data to obtain light source color information, and implement white balance processing based on the light source color information. Later, image enhancement algorithms can be used to further optimize to obtain the target image. The target image is displayed on the display screen. For example, (b) in FIG. 18 shows a GUI of a display interface 94 based on an album, and the target image 95 can be displayed on the display interface 94.
在一种实施例中,模型的输入数据包括拍摄参数和多通道图像。模型的结构和执行计算的过程可类似参考前述图11实施例的描述,这里不再赘述。In an embodiment, the input data of the model includes shooting parameters and multi-channel images. The structure of the model and the process of performing the calculation can be similar to the description of the aforementioned embodiment in FIG. 11, which will not be repeated here.
在又一种方式中,模型的输入数据包括拍摄参数、多通道图像和由多通道图像提取的场景语义信息。模型的结构和执行计算的过程可类似参考前述图13实施例的描述,这里不再赘述。In yet another manner, the input data of the model includes shooting parameters, multi-channel images, and scene semantic information extracted from the multi-channel images. The structure of the model and the process of performing the calculation can be similar to the description of the foregoing embodiment in FIG. 13, and details are not repeated here.
应理解,上述方法的具体实现过程可类似参考前文图8-图16实施例的描述,即前文图8-图16中的相关内容的扩展、限定、解释和说明也适用于图17、图18中相同的内容,此处不再赘述。It should be understood that the specific implementation process of the above method can be similar to the description of the previous embodiment in Figures 8-16, that is, the expansion, definition, explanation and description of the relevant content in the previous Figure 8-16 are also applicable to Figures 17 and 18 I won’t repeat the same content here.
下面进一步描述可用于实现图17、图18所示方法的电子设备的软件系统架构。软件系统可以采用分层架构、事件驱动架构、微核架构、微服务架构、或云架构。下面以分层架构的安卓(Android)系统为例进行说明。参见图19,图19是本申请实施例中电子设备的一种可能的软件结构框图。The software system architecture of the electronic device that can be used to implement the methods shown in FIG. 17 and FIG. 18 is further described below. The software system can adopt a layered architecture, event-driven architecture, micro-kernel architecture, micro-service architecture, or cloud architecture. The following takes the layered architecture of the Android system as an example for description. Refer to FIG. 19, which is a block diagram of a possible software structure of an electronic device in an embodiment of the present application.
如图所示,分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。As shown in the figure, the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
应用程序层可以包括一系列应用程序包。如图所示,应用程序包可以包括相机APP,图像美化APP,相册APP等应用程序。The application layer can include a series of application packages. As shown in the figure, the application package may include applications such as a camera APP, an image beautification APP, and an album APP.
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。如图3所示,应用程序框架层可以包括窗口管理器,内容提供器,资源管理器,视图系统,等。其中:The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions. As shown in Figure 3, the application framework layer can include a window manager, a content provider, a resource manager, a view system, and so on. in:
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕等。The window manager is used to manage window programs. The window manager can obtain the size of the display screen, determine whether there is a status bar, lock the screen, etc.
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括图像数据,视频数据等。The content provider is used to store and retrieve data and make these data accessible to applications. The data may include image data, video data, and so on.
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序的显示界面。The view system includes visual controls, such as controls that display text, controls that display pictures, and so on. The view system can be used to construct the display interface of the application.
举例来说,通过视图系统呈现的一种拍照APP的拍摄界面如图18中的(a)所示,拍摄界面91包括拍摄控件93、预览框92,以及其他一些相关控件,如图像浏览控件,前后摄像头切换控件等。预览框92用于预览所要拍摄的场景图像。For example, the shooting interface of a camera APP presented through the view system is shown in Figure 18 (a). The shooting interface 91 includes a shooting control 93, a preview box 92, and other related controls, such as image browsing controls. Front and rear camera switching controls, etc. The preview frame 92 is used to preview the scene image to be shot.
当用户点击或触摸前后摄像头切换控件,可指示电子设备选择前置摄像头或后置摄像头进行拍摄。When the user clicks or touches the front and rear camera switching controls, the electronic device can be instructed to select the front camera or the rear camera for shooting.
当用户点击或触摸拍摄控件93,电子设备将驱动摄像装置发起拍摄操作,指示下层的系统库对图像进行加工处理,保存到相册中。When the user clicks or touches the shooting control 93, the electronic device will drive the camera device to initiate a shooting operation, and instruct the lower-level system library to process the image and save it in the album.
当用户点击或触摸图像浏览控件,电子设备可调用相册APP并显示经过本申请提出的自动白平衡方法处理后的图像。When the user clicks or touches the image browsing control, the electronic device can call the album APP and display the image processed by the automatic white balance method proposed in this application.
举例来说,通过视图系统呈现的一种相册APP的显示界面如图18中的(b)所示。显示界面94中可显示目标图像95。For example, the display interface of a photo album APP presented through the view system is shown in (b) of FIG. 18. The target image 95 can be displayed on the display interface 94.
安卓运行时(Android Runtime)安卓系统的调度和管理,可包括核心库和虚拟机。核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。Android Runtime The scheduling and management of the Android system can include core libraries and virtual machines. The core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。The application layer and application framework layer run in a virtual machine. The virtual machine executes the java files of the application layer and the application framework layer as binary files. The virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),图形引擎等。The system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), graphics engine, etc.
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供图层融合功能。The surface manager is used to manage the display subsystem and provide layer fusion functions for multiple applications.
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。The media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files. The media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动等。其中,摄像头驱动可用于驱动电子设备的摄像头进行拍摄,显示驱动可用于将经处理后的图像显示在显示屏的显示面板上。The kernel layer is the layer between hardware and software. The kernel layer contains at least display driver, camera driver, audio driver, sensor driver, etc. Among them, the camera driver can be used to drive the camera of the electronic device for shooting, and the display driver can be used to display the processed image on the display panel of the display screen.
图形引擎是用于进行图像加工处理的绘图引擎。本申请实施例中,图形引擎例如可用于实现将所述RAW图像处理为彩色三通道图像;提取所述彩色三通道图像的场景语义信息;将所述彩色三通道图像、拍摄参数和所述场景语义信息输入到神经网络,获得光源颜色信息;根据所述光源颜色信息对所述彩色三通道图像进行白平衡处理,获得用于显示的图像。The graphics engine is a drawing engine for image processing. In the embodiment of this application, the graphics engine can be used to process the RAW image into a color three-channel image; extract the scene semantic information of the color three-channel image; combine the color three-channel image, shooting parameters and the scene Semantic information is input to a neural network to obtain light source color information; white balance processing is performed on the color three-channel image according to the light source color information to obtain an image for display.
需要说明的是,AWB神经网络模型所涉及的训练过程可以有多种实现形式,例如,由图20示出了两种示例性的训练过程。It should be noted that the training process involved in the AWB neural network model can have multiple implementation forms. For example, two exemplary training processes are shown in FIG. 20.
一种对AWB神经网络模型的训练过程可以是:训练数据包括对图像的光源颜色的标注、由RAW图进行预处理获得的多通道图像、拍摄参数、可选的还包括场景语义信息。训练数据输入到模型后,模型输出光源颜色信息。基于输出的光源颜色信息与标注的光源颜色进行 比较,确定损失函数,利用损失函数对模型进行反向传播出来,从而更新模型参数,实现了模型的训练。当通过大量的模型训练使得模型符合应用指标时,即可输出目标模型。A training process for the AWB neural network model may be: the training data includes the annotation of the light source color of the image, the multi-channel image obtained by preprocessing the RAW image, the shooting parameters, and optionally also the scene semantic information. After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model. When the model meets the application index through a large amount of model training, the target model can be output.
又一种对AWB神经网络模型的训练过程可以是:训练数据包括对图像的光源颜色的标注、由RAW图进行预处理和图像增强算法处理获得的目标图像、拍摄参数、可选的还包括场景语义信息。训练数据输入模型后,模型输出光源颜色信息。基于输出的光源颜色信息与标注的光源颜色进行比较,确定损失函数,利用损失函数对模型进行反向传播出来,从而更新模型参数,实现了模型的训练。当通过大量的模型训练使得模型符合应用指标时,即可输出目标模型。Another training process for the AWB neural network model can be: the training data includes the labeling of the light source color of the image, the target image obtained by the preprocessing of the RAW image and the image enhancement algorithm processing, the shooting parameters, and optionally the scene Semantic information. After the training data is input to the model, the model outputs the color information of the light source. Based on the comparison between the output light source color information and the labeled light source color, the loss function is determined, and the loss function is used to backpropagate the model, thereby updating the model parameters and realizing the training of the model. When the model meets the application index through a large amount of model training, the target model can be output.
另外需要说明的是,在一种可能的实现中,训练AWB神经网络模型使用的图像,可以不是单帧图像,而是采用标注的视频序列。在AWB神经网络模型中可引入LSTM、RNN等网络结构,模型训练时也可采用时域相关的策略。也就是说,可以利用视频序列作为训练数据,AWB神经网络模型增加当前图像的前后帧的图像作为模型输入。通过利用视频序列训练、增加连续前后帧的输入、引入LSTM,RNN等结构以及增加时域相关的训练策略等,可以增加AWB神经网络模型光源估计的稳定性,减小相同光源下白平衡跳动的概率。从而可以扩展在视频功能中使用,增加了白平衡的稳定性,提升用户使用体验。In addition, it should be noted that, in a possible implementation, the image used for training the AWB neural network model may not be a single frame image, but a labeled video sequence. In the AWB neural network model, network structures such as LSTM and RNN can be introduced, and time-domain related strategies can also be used during model training. In other words, the video sequence can be used as training data, and the AWB neural network model adds the images of the previous and subsequent frames of the current image as the model input. By using video sequence training, increasing the input of consecutive frames before and after, introducing LSTM, RNN and other structures, and increasing time-domain related training strategies, the stability of the light source estimation of the AWB neural network model can be increased, and the white balance beating under the same light source can be reduced. Probability. Thereby, it can be used in video functions, increase the stability of white balance, and improve user experience.
另外,本申请实施例中,电子设备所配置的摄像头的数量并不做限定。在两个或多个摄像头的场景下,对各个摄像头的类型不做限定。例如所谓“类型不同”可以是摄像头的倍率(拍摄倍率或变焦倍率)或焦距不同的摄像头,例如可以是常规摄像头、主摄像头、长焦摄像头、广角摄像头、中长焦摄像头、超长焦摄像头、或超广角摄像头等等。又例如所谓“类型不同”可以是各个摄像头对应的图像传感器不同,比如广角摄像头对应的图像传感器可以是RGGB的模组,主摄像头对应的图像传感器可以是RYYB的模组,长焦摄像头对应的图像传感器可以是RGGB的模组。In addition, in the embodiments of the present application, the number of cameras configured in the electronic device is not limited. In the scenario of two or more cameras, the type of each camera is not limited. For example, the so-called "different types" can be cameras with different magnifications (shooting magnification or zoom magnification) or cameras with different focal lengths, such as conventional cameras, main cameras, telephoto cameras, wide-angle cameras, medium-telephoto cameras, ultra-telephoto cameras, Or ultra-wide-angle camera and so on. For another example, the so-called "different type" can mean that the image sensor corresponding to each camera is different. For example, the image sensor corresponding to the wide-angle camera can be the RGGB module, the image sensor corresponding to the main camera can be the RYYB module, and the image corresponding to the telephoto camera. The sensor can be a module of RGGB.
现在的手机等电子设备都具备多摄像头,用户在进行拍摄行为时会进行放大缩小的操作或者选择摄像头的操作,而多个摄像头对应的图像传感器或摄像头的类型存在差异,相同场景下拍摄的RAW图值域可能存在着较大差异(同类型的图像传感器件可能差异较小)。在两个或多个摄像头的情况下,本申请描述的自动白平衡方法(或者图像光源信息的获得方式)可以有多种方式的调整和适配。Nowadays mobile phones and other electronic devices are equipped with multiple cameras. The user will zoom in or zoom out or select the camera when shooting. However, there are differences in the types of image sensors or cameras corresponding to multiple cameras. RAW taken in the same scene There may be large differences in the image value range (the image sensor devices of the same type may have small differences). In the case of two or more cameras, the automatic white balance method (or the method of obtaining image light source information) described in this application can be adjusted and adapted in multiple ways.
例如,一种拍摄场景如图23所示,用户在进行拍照行为时会进行取景的操作。用户可以对取景设备(手机屏幕)进行放大或缩小的操作,以实现场景拉近拉远的效果。几种目标图像的示例效果分别如图23中的(1)(2)(3)所示。其中,对于(1),当用户需要拍摄远景细节时,要进行图片放大动作,当时放大到10倍(10x)及以上时,主摄像头的焦段不足以提供很清晰的效果,这时会切换至长焦镜头进行取景拍摄。长焦镜头可能采用RGGB模组,感光性和光谱的响应曲线都会与主摄存在差异。对于(2),在一般的拍摄中,取景要是在1x到10x的区间,主摄像头的焦段足以提供清晰的效果,这时会根据焦段对主摄采集的RAW图进行裁切以实现放大的效果。主摄可能采用RYYB模组,其感光性更佳,光谱的响应曲线都会与RGGB模组存在差异。对于(3),取景若小于1x,目前主摄的焦段不足以提供更大的视野(filed of view,FOV),如果有广角镜头,相机设备将切换至广角摄像头以提供更大的视角。广角摄像头可能采用RGGB模组,广角摄像头也可能采用不同于主摄或长焦的感光模组,感光性和光谱响应会与以上两种摄像头存在差异。For example, a shooting scene is shown in FIG. 23, and the user will perform a viewfinder operation when taking a photo. The user can zoom in or zoom out the viewfinder (mobile phone screen) to achieve the effect of zooming in and out of the scene. The example effects of several target images are shown in (1)(2)(3) in Figure 23 respectively. Among them, for (1), when the user needs to shoot the details of the distant view, the picture must be zoomed in. When the zoom is 10 times (10x) and above, the focal length of the main camera is not enough to provide a very clear effect. At this time, it will switch to Telephoto lens for viewfinder shooting. The telephoto lens may use an RGGB module, and the sensitivity and spectral response curves will be different from the main camera. For (2), in general shooting, if the viewfinder is in the range of 1x to 10x, the focal length of the main camera is sufficient to provide a clear effect. At this time, the RAW image collected by the main camera will be cropped according to the focal length to achieve the effect of zooming in. . The main camera may use the RYYB module, which has better sensitivity, and the response curve of the spectrum will be different from that of the RGGB module. For (3), if the viewfinder is smaller than 1x, the current focal length of the main camera is not enough to provide a larger field of view (filed of view, FOV). If there is a wide-angle lens, the camera device will switch to a wide-angle camera to provide a larger viewing angle. The wide-angle camera may use an RGGB module, and the wide-angle camera may also use a photosensitive module different from the main camera or telephoto. The sensitivity and spectral response will be different from the above two cameras.
参见图21,图21示出了一种可能的拍摄流程,该场景以电子设备配置第一摄像头和第二摄像头为例,第一摄像头和第二摄像头可以是类型不同的摄像头。这两个摄像头共用神经网络模型,在该场景中,电子设备配置了第一AWB神经网络模型,第一AWB神经网络模型可以是由第一摄像头(或同款设备,或类似于第一摄像头的设备)采集的数据训练得到的。Referring to FIG. 21, FIG. 21 shows a possible shooting process. In this scenario, an electronic device is configured with a first camera and a second camera as an example. The first camera and the second camera may be different types of cameras. The two cameras share the neural network model. In this scenario, the electronic device is equipped with the first AWB neural network model. The first AWB neural network model can be used by the first camera (or the same device, or similar to the first camera). Equipment) data collected by training.
如图21所示,在实际拍摄时,若用户选择了第一摄像头进行拍摄,则获得的第一摄像头的RAW图像经预处理得到多通道图像,该多通道图像可配合第一摄像头的摄像参数作为第一AWB神经网络的输入,计算得到第一摄像头对应的光源颜色值(或者增益值)。As shown in Figure 21, in actual shooting, if the user selects the first camera for shooting, the obtained RAW image of the first camera is preprocessed to obtain a multi-channel image, which can match the camera parameters of the first camera As the input of the first AWB neural network, the color value (or gain value) of the light source corresponding to the first camera is calculated.
若用户选择了第二摄像头进行拍摄,则获得的第二摄像头的RAW图像经预处理得到多通道图像,此外,电子设备还根据该多通道图像进行图像迁移操作,即将该多通道图像的图像颜色迁移至符合第一摄像头的拍摄特点的图像颜色,具体的,可基于所述第二摄像头与所述第一摄像头之间的差异,对第二摄像头对应的多通道图像进行颜色迁移操作,以获得契合第一摄像头对应的图像传感器的感光特性的迁移图像。然后,迁移图像配合第二摄像头的摄像参数作为第一AWB神经网络的输入,计算得到符合第一摄像头的拍摄特点的光源颜色值(或者增益值),在这基础上,进一步对该光源颜色值(或者增益值)进行迁移操作,从而将该光源颜色值(或者增益值)迁移至第二摄像头对应的光源颜色值(或者增益值)。If the user selects the second camera for shooting, the obtained RAW image of the second camera is preprocessed to obtain a multi-channel image. In addition, the electronic device also performs an image migration operation based on the multi-channel image, that is, the image color of the multi-channel image Migrate to the image color that meets the shooting characteristics of the first camera. Specifically, based on the difference between the second camera and the first camera, perform a color migration operation on the multi-channel image corresponding to the second camera to obtain A transition image that fits the photosensitive characteristics of the image sensor corresponding to the first camera. Then, the migration image is matched with the camera parameters of the second camera as the input of the first AWB neural network, and the color value (or gain value) of the light source that meets the shooting characteristics of the first camera is calculated. On this basis, the color value of the light source is further (Or gain value) perform a migration operation, so as to migrate the light source color value (or gain value) to the light source color value (or gain value) corresponding to the second camera.
需要说明的是,在针对第一AWB神经网络进行模型训练时,可以使用第一摄像头(或同款的设备,或类似于第一摄像头的设备)采集的图像数据、第一摄像头的摄像参数等作为训练数据应用于模型训练。也可以使用第二摄像头或其他摄像头采集的图像数据及摄像参数等,但是需要将采集的图像数据迁移至第一摄像头才可参与训练。It should be noted that when training the model for the first AWB neural network, you can use the image data collected by the first camera (or a device of the same model, or a device similar to the first camera), the camera parameters of the first camera, etc. Used as training data for model training. The image data and camera parameters collected by the second camera or other cameras can also be used, but the collected image data needs to be migrated to the first camera to participate in the training.
还需要说明的是,当第一AWB神经网络模型由第二摄像头采集的数据训练得到时,相当于上述图21实施例中第一摄像头和第二摄像头的角色做了调换,其实现方式将类似上述实现过程,这里不再赘述。It should also be noted that when the first AWB neural network model is trained on the data collected by the second camera, it is equivalent to the roles of the first camera and the second camera in the embodiment of FIG. 21, and the implementation method will be similar. The above implementation process will not be repeated here.
参见图22,图22示出了又一种可能的拍摄流程,该场景同样以电子设备配置第一摄像头和第二摄像头为例,第一摄像头和第二摄像头可以是类型不同的摄像头。这两个摄像头分别对应不同的神经网络模型,例如第一摄像头对应第一AWB神经网络模型,所述第二摄像头对应第二AWB神经网络模型;第一AWB神经网络模型可以是由第一摄像头(或同款设备,或类似于第一摄像头的设备)采集的数据训练得到的,第二AWB神经网络模型可以是由第二摄像头(或同款设备,或类似于第二摄像头的设备)采集的数据训练得到的。Referring to FIG. 22, FIG. 22 shows another possible shooting process. In this scenario, the first camera and the second camera are configured by the electronic device as an example. The first camera and the second camera may be different types of cameras. The two cameras correspond to different neural network models. For example, the first camera corresponds to the first AWB neural network model, and the second camera corresponds to the second AWB neural network model; the first AWB neural network model can be generated by the first camera ( Or the same device, or a device similar to the first camera). The second AWB neural network model can be acquired by the second camera (or the same device, or a device similar to the second camera). Obtained from data training.
如图21所示,在实际拍摄时,若用户选择了第一摄像头进行拍摄,则获得的第一摄像头的RAW图经预处理得到多通道图像,该多通道图像配合第一摄像头摄像参数作为第一AWB神经网络的输入,从而计算得到第一摄像头对应的光源颜色值(或者增益值)。As shown in Figure 21, in actual shooting, if the user selects the first camera for shooting, the obtained RAW image of the first camera is preprocessed to obtain a multi-channel image, and the multi-channel image is combined with the first camera's camera parameters as the first An AWB neural network is input to calculate the light source color value (or gain value) corresponding to the first camera.
若用户选择了第二摄像头进行拍摄,则获得的第二摄像头的RAW图经预处理得到多通道图像,该多通道图像配合第二摄像头摄像参数作为第二AWB神经网络的输入,计算得到第二摄像头对应的光源颜色值(或者增益值)。If the user selects the second camera for shooting, the obtained RAW image of the second camera is preprocessed to obtain a multi-channel image. The multi-channel image is combined with the camera parameters of the second camera as the input of the second AWB neural network, and the second AWB neural network is calculated. The color value (or gain value) of the light source corresponding to the camera.
需要说明的是,在针对第一AWB神经网络进行模型训练时,可以使用第一摄像头(或同款的设备,或类似于第一摄像头的设备)采集的图像数据、第一摄像头的摄像参数等作为训练数据应用于模型训练。也可以使用第二摄像头或其他摄像头采集的图像数据及摄像参数等,但是需要将采集的图像数据迁移至第一摄像头才可参与训练。It should be noted that when training the model for the first AWB neural network, you can use the image data collected by the first camera (or a device of the same model, or a device similar to the first camera), the camera parameters of the first camera, etc. Used as training data for model training. The image data and camera parameters collected by the second camera or other cameras can also be used, but the collected image data needs to be migrated to the first camera to participate in the training.
同理,在针对第二AWB神经网络进行模型训练时,可以使用第二摄像头(或同款的设 备,或类似于第二摄像头的设备)采集的图像数据、第二摄像头的摄像参数等作为训练数据应用于模型训练。也可以使用第一摄像头或其他摄像头采集的图像数据及摄像参数等,但是需要将采集的图像数据迁移至第二摄像头才可参与训练。Similarly, when training the model for the second AWB neural network, you can use the image data collected by the second camera (or the device of the same model, or the device similar to the second camera), the camera parameters of the second camera, etc. as training The data is used for model training. The image data and camera parameters collected by the first camera or other cameras can also be used, but the collected image data needs to be migrated to the second camera to participate in the training.
上文图21和图22实施例仅用于解释本申请的方案而非限定,例如实际应用中上述流程可类似推广至多个(两个以上)摄像头的情况;或者实际应用中模型的训练以及模型的使用还需要利用到场景语义信息,具体实现方式可以结合前文图12、图13实施例的描述,这里不再赘述。The embodiments in Figures 21 and 22 above are only used to explain the solution of the present application and not to limit it. For example, the above process can be similarly extended to multiple (more than two) cameras in practical applications; or the training and model training of models in practical applications The use of also needs to utilize the scene semantic information, and the specific implementation method can be combined with the description of the embodiments in Figure 12 and Figure 13 above, and will not be repeated here.
基于相同申请构思,本申请实施例还提供了一种用于实现图像自动白平衡的装置,参见图24,图24是本申请实施例提供的一种图像自动白平衡的装置的结构示意图。该装置包括:参数获取模块1001,图像获取模块1002和处理模块1003。在一种示例中,上述功能模块可运行于具有摄像头(例如可称为第一摄像头)的电子设备的处理器中。其中:Based on the same application concept, an embodiment of the present application also provides an apparatus for realizing automatic white balance of an image. See FIG. 24, which is a schematic structural diagram of an apparatus for automatic white balance of an image provided by an embodiment of the present application. The device includes: a parameter acquisition module 1001, an image acquisition module 1002, and a processing module 1003. In an example, the above-mentioned functional module may run in a processor of an electronic device having a camera (for example, it may be referred to as a first camera). in:
参数获取模块1001,用于获取所述第一摄像头拍摄原始RAW域图像时采用的拍摄参数。The parameter acquisition module 1001 is configured to acquire the shooting parameters used when the first camera shoots the original RAW domain image.
图像获取模块1002,用于获取所述原始RAW域图像对应的多通道图像。The image acquisition module 1002 is configured to acquire a multi-channel image corresponding to the original RAW domain image.
处理模块1003,用于将输入数据输入第一神经网络模型得到白平衡的第一增益值;所述输入数据至少包括所述第一摄像头的拍摄参数和所述多通道图像;还用于对所述多通道图像进行第一处理,得到目标图像;其中,所述第一处理包括基于所述多通道图像和所述第一增益值的白平衡处理。The processing module 1003 is configured to input input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel image; The multi-channel image is subjected to first processing to obtain a target image; wherein, the first processing includes white balance processing based on the multi-channel image and the first gain value.
在一些可能的实施例中,所述拍摄参数包括曝光值、快门时间、光圈大小、或ISO感光度中的至少一个。In some possible embodiments, the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
在一些可能的实施例中,所述第一神经网络模型是通过融合所述第一摄像头的拍摄参数和所述多通道图像的图像特征来实现对所述第一增益值的预测的。In some possible embodiments, the first neural network model implements the prediction of the first gain value by fusing the shooting parameters of the first camera and the image features of the multi-channel image.
在一些可能的实施例中,所述处理模块具体用于:根据所述第一摄像头的拍摄参数和所述多通道图像,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图像进行后处理,获得所述目标图像。In some possible embodiments, the processing module is specifically configured to: according to the shooting parameters of the first camera and the multi-channel image, obtain the first neural network model configured in the electronic device. Gain value; use the first gain value to perform white balance processing on the multi-channel image; perform post-processing on the white balance processed image to obtain the target image.
在一些可能的实施例中,所述处理模块具体用于:将所述第一摄像头的拍摄参数和所述多通道图像发送到服务器;接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图像进行后处理,获得所述目标图像。In some possible embodiments, the processing module is specifically configured to: send the shooting parameters of the first camera and the multi-channel image to a server; receive the first gain value from the server, and the The first gain value is obtained through the first neural network model configured on the server; the multi-channel image is white-balanced by using the first gain value; the white-balanced image is post-processed , To obtain the target image.
在一些可能的实施例中,所述第一神经网络模型包括第一特征提取网络、特征融合网络和光源预测网络;所述处理模块具体用于:通过所述第一特征提取网络对所述多通道图像进行特征提取,获得第一特征;通过所述特征融合网络融合所述第一摄像头的拍摄参数和所述第一特征,获得融合后的特征;通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值。In some possible embodiments, the first neural network model includes a first feature extraction network, a feature fusion network, and a light source prediction network; the processing module is specifically configured to: Perform feature extraction on the channel image to obtain the first feature; fuse the shooting parameters of the first camera and the first feature through the feature fusion network to obtain the fused feature; use the light source prediction network to obtain the fused feature according to the fusion To predict the features of, and obtain the first gain value.
在一些可能的实施例中,所述输入数据还包括由所述多通道图像表征的场景语义信息;所述第一神经网络模型具体是通过融合所述第一摄像头的拍摄参数、所述多通道图像的图像特征和所述多通道图像表征的场景语义信息来实现对所述增益值的预测的。In some possible embodiments, the input data further includes scene semantic information represented by the multi-channel image; the first neural network model specifically combines the shooting parameters of the first camera, the multi-channel image The image characteristics of the image and the scene semantic information represented by the multi-channel image are used to predict the gain value.
在一些可能的实施例中,所述处理模块具体用于:对所述多通道图像进行场景语义信息 的提取;根据所述第一摄像头的拍摄参数、所述多通道图像和所述场景语义信息,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图像进行后处理,获得所述目标图像。In some possible embodiments, the processing module is specifically configured to: extract scene semantic information from the multi-channel image; according to the shooting parameters of the first camera, the multi-channel image, and the scene semantic information , Obtaining the first gain value through a first neural network model configured in the electronic device; using the first gain value to perform white balance processing on the multi-channel image; performing white balance processing on the white balance processed image Post-processing to obtain the target image.
在一些可能的实施例中,所述处理模块具体用于:将所述第一摄像头的拍摄参数、所述多通道图像和所述场景语义信息发送到服务器;接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;利用所述第一增益值对所述多通道图像进行白平衡处理;对所述白平衡处理后的图像进行后处理,获得所述目标图像。In some possible embodiments, the processing module is specifically configured to: send the shooting parameters of the first camera, the multi-channel image, and the scene semantic information to a server; and receive the first camera from the server. A gain value, the first gain value is obtained through a first neural network model configured in the server; the white balance processing is performed on the multi-channel image by using the first gain value; the white balance processing is performed The post-processed image is performed to obtain the target image.
在一些可能的实施例中,所述第一神经网络模型包括第一特征提取网络、第二特征提取网络、特征融合网络和光源预测网络;所述处理模块具体用于:通过所述第一特征提取网络对所述多通道图像进行特征提取,获得第一特征;通过所述第二特征提取网络对所述场景语义信息进行特征提取,获得第二特征;通过所述特征融合网络融合所述拍摄参数、所述第一特征和所述第二特征,获得融合后的特征;通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值。In some possible embodiments, the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network; the processing module is specifically configured to: pass the first feature The extraction network performs feature extraction on the multi-channel image to obtain the first feature; performs feature extraction on the scene semantic information through the second feature extraction network to obtain the second feature; merges the shooting through the feature fusion network The parameters, the first feature, and the second feature are used to obtain a fused feature; the light source prediction network is used to predict according to the fused feature to obtain the first gain value.
在一些可能的实施例中,所述处理模块具体用于:对所述多通道图像执行物体检测、场景分类、图像场景分割、人像分割、或人脸检测中的至少一种操作,以获得所述场景语义信息。In some possible embodiments, the processing module is specifically configured to perform at least one operation of object detection, scene classification, image scene segmentation, portrait segmentation, or face detection on the multi-channel image, so as to obtain the Describe the semantic information of the scene.
在一些可能的实施例中,所述图像获取模块具体用于:对所述原始RAW域图像进行预处理获得所述多通道图像,所述预处理包括去马赛克处理。In some possible embodiments, the image acquisition module is specifically configured to perform preprocessing on the original RAW domain image to obtain the multi-channel image, and the preprocessing includes demosaicing.
在一些可能的实施例中,所述多通道图像为三通道图像或者四通道图像。In some possible embodiments, the multi-channel image is a three-channel image or a four-channel image.
需要说明的,通过前述图8-图16实施例的详细描述,本领域技术人员可以清楚的知道该装置所包含的各个功能模块的实现方法,所以为了说明书的简洁,在此不再详述。It should be noted that through the detailed description of the aforementioned embodiments in FIGS. 8-16, those skilled in the art can clearly know the implementation methods of the various functional modules included in the device, so for the sake of brevity of the description, the details are not described herein again.
基于相同的发明构思,本申请实施例还提供又一种电子设备,所述电子设备包括摄像头、显示屏、存储器和处理器,其中:所述摄像头用于拍摄图像;所述显示屏用于显示图像;所述存储器用于存储程序;所述处理器用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,具体用于执行图8、图10、图12、图14、图17所述方法实施例中的任意实施例描述的方法步骤。Based on the same inventive concept, the embodiments of the present application also provide another electronic device, the electronic device includes a camera, a display screen, a memory, and a processor, wherein: the camera is used to capture images; the display screen is used to display Image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute FIG. 8, FIG. 10, and FIG. 12, The method steps described in any of the method embodiments described in FIG. 14 and FIG. 17.
基于相同的发明构思,本申请实施例还提供又一种电子设备,所述电子设备包括至少两个摄像头、存储器和处理器,所述至少两个摄像头包括第一摄像头和第二摄像头,其中:所述至少两个摄像头均用于拍摄图像;所述存储器用于存储程序;所述处理器用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,可用于执行图21或图22所述方法实施例中的任意实施例描述的方法步骤。或者可用于执行具体用于执行图8、图10、图12、图14、图17所述方法实施例中的任意实施例描述的方法步骤。Based on the same inventive concept, the embodiments of the present application also provide yet another electronic device. The electronic device includes at least two cameras, a memory, and a processor. The at least two cameras include a first camera and a second camera, wherein: The at least two cameras are both used to capture images; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it can be used to execute The method steps described in any of the method embodiments described in FIG. 21 or FIG. 22. Or it can be used to execute the method steps described in any of the method embodiments described in FIG. 8, FIG. 10, FIG. 12, FIG. 14, and FIG. 17.
本申请实施例还提供一种芯片,该芯片包括收发单元和处理单元。其中,收发单元可以是输入输出电路、通信接口;处理单元为该芯片上集成的处理器或者微处理器或者集成电路。该芯片可以执行上述图8、图10、图12、图14、图17、图21或图22方法实施例中的任意实施例描述的方法步骤。The embodiment of the present application also provides a chip, which includes a transceiver unit and a processing unit. Wherein, the transceiver unit may be an input/output circuit or a communication interface; the processing unit is a processor, microprocessor, or integrated circuit integrated on the chip. The chip can execute the method steps described in any of the above-mentioned method embodiments in FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22.
本申请实施例还提供一种计算机可读存储介质,其上存储有指令,该指令被执行时执行上述图8、图10、图12、图14、图17、图21或图22方法实施例中的任意实施例描述的方 法步骤。The embodiment of the present application also provides a computer-readable storage medium on which an instruction is stored. When the instruction is executed, the method embodiment of FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22 is executed. The method steps described in any of the embodiments.
本申请实施例还提供一种包含指令的计算机程序产品,该指令被执行时执行上述图8、图10、图12、图14、图17、图21或图22方法实施例中的任意实施例描述的方法步骤The embodiment of the present application also provides a computer program product containing instructions that, when executed, execute any of the above-mentioned method embodiments in FIG. 8, FIG. 10, FIG. 12, FIG. 14, FIG. 17, FIG. 21, or FIG. 22 Method steps described
应理解,在本申请的各种方法实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various method embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not be implemented in this application. The implementation process of the example constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
在上述实施例中,对各个实施例的描述各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here. In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
上述实施例仅用以说明本申请的技术方案,而非对其限制。尽管参照上述实施例对本申请进行了详细的说明,本领域的普通技术人员还应当理解的是:任何基于对上述各实施例所记载的技术方案进行的改动、变形、或者对其中部分技术特征进行的等同替换均应属于本申请各实施例技术方案的精神和范围。The above-mentioned embodiments are only used to illustrate the technical solution of the present application, but not to limit it. Although this application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should also understand that: any changes, modifications, or changes to some of the technical features based on the technical solutions described in the above-mentioned embodiments All equivalent replacements of should belong to the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (43)

  1. 一种图像自动白平衡的方法,其特征在于,所述方法应用于包括第一摄像头的电子设备,包括:A method for image automatic white balance, characterized in that the method is applied to an electronic device including a first camera, and includes:
    获取所述第一摄像头拍摄原始RAW域图像时采用的拍摄参数;Acquiring the shooting parameters used when the first camera shoots the original RAW domain image;
    获取所述原始RAW域图像对应的多通道图像;Acquiring a multi-channel image corresponding to the original RAW domain image;
    将输入数据输入第一神经网络模型得到白平衡的第一增益值;所述输入数据至少包括所述第一摄像头的拍摄参数和所述多通道图像;Inputting input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel image;
    对所述多通道图像进行第一处理,得到目标图像;Performing first processing on the multi-channel image to obtain a target image;
    其中,所述第一处理包括基于所述多通道图像和所述第一增益值的白平衡处理。Wherein, the first processing includes white balance processing based on the multi-channel image and the first gain value.
  2. 根据权利要求1所述的方法,其特征在于,所述拍摄参数包括曝光值、快门时间、光圈大小、或ISO感光度中的至少一个。The method according to claim 1, wherein the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一神经网络模型是通过融合所述第一摄像头的拍摄参数和所述多通道图像的图像特征来实现对所述第一增益值的预测的。The method according to claim 1 or 2, wherein the first neural network model realizes the first gain by fusing the shooting parameters of the first camera and the image features of the multi-channel image. The value of the prediction.
  4. 根据权利要求3所述的方法,其特征在于,所述第一处理具体包括:The method according to claim 3, wherein the first processing specifically comprises:
    根据所述第一摄像头的拍摄参数和所述多通道图像,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera and the multi-channel image;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  5. 根据权利要求3所述的方法,其特征在于,所述第一处理具体包括:The method according to claim 3, wherein the first processing specifically comprises:
    将所述第一摄像头的拍摄参数和所述多通道图像发送到服务器;Sending the shooting parameters of the first camera and the multi-channel image to the server;
    接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  6. 根据权利要求3-5任一项所述的方法,其特征在于,所述第一神经网络模型包括第一特征提取网络、特征融合网络和光源预测网络;相应的,通过所述第一神经网络模型获得所述第一增益值的过程具体包括:The method according to any one of claims 3-5, wherein the first neural network model includes a first feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, the first neural network The process for the model to obtain the first gain value specifically includes:
    通过所述第一特征提取网络对所述多通道图像进行特征提取,获得第一特征;Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;
    通过所述特征融合网络融合所述第一摄像头的拍摄参数和所述第一特征,获得融合后的特征;Fusing the shooting parameters of the first camera and the first feature through the feature fusion network to obtain the fused feature;
    通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值。The light source prediction network is used to predict according to the merged features to obtain the first gain value.
  7. 根据权利要求1或2所述的方法,其特征在于,所述输入数据还包括由所述多通道图像表征的场景语义信息;所述第一神经网络模型具体是通过融合所述第一摄像头的拍摄参数、所述多通道图像的图像特征和所述多通道图像表征的场景语义信息来实现对所述第一增益值 的预测的。The method according to claim 1 or 2, wherein the input data further includes scene semantic information represented by the multi-channel image; the first neural network model is specifically obtained by fusing the first camera Shooting parameters, image features of the multi-channel image, and scene semantic information represented by the multi-channel image are used to predict the first gain value.
  8. 根据权利要求7所述的方法,其特征在于,所述第一处理具体包括:The method according to claim 7, wherein the first processing specifically comprises:
    对所述多通道图像进行场景语义信息的提取;Extracting scene semantic information on the multi-channel image;
    根据所述第一摄像头的拍摄参数、所述多通道图像和所述场景语义信息,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera, the multi-channel image, and the scene semantic information;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  9. 根据权利要求7所述的方法,其特征在于,所述处理具体包括:The method according to claim 7, wherein the processing specifically comprises:
    将所述第一摄像头的拍摄参数、所述多通道图像和所述场景语义信息发送到服务器;Sending the shooting parameters of the first camera, the multi-channel image and the scene semantic information to a server;
    接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  10. 根据权利要求7-9任一项所述的方法,其特征在于,所述第一神经网络模型包括第一特征提取网络、第二特征提取网络、特征融合网络和光源预测网络;相应的,通过所述第一神经网络获得所述第一增益值的过程具体包括:The method according to any one of claims 7-9, wherein the first neural network model includes a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network; correspondingly, through The process for the first neural network to obtain the first gain value specifically includes:
    通过所述第一特征提取网络对所述多通道图像进行特征提取,获得第一特征;Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;
    通过所述第二特征提取网络对所述场景语义信息进行特征提取,获得第二特征;Performing feature extraction on the scene semantic information through the second feature extraction network to obtain a second feature;
    通过所述特征融合网络融合所述拍摄参数、所述第一特征和所述第二特征,获得融合后的特征;Fusing the shooting parameters, the first feature, and the second feature through the feature fusion network to obtain the fused feature;
    通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值。The light source prediction network is used to predict according to the merged features to obtain the first gain value.
  11. 根据权利要求8所述的方法,其特征在于,对所述多通道图像进行场景语义信息的提取,包括:The method according to claim 8, wherein the extraction of scene semantic information on the multi-channel image comprises:
    对所述多通道图像执行物体检测、场景分类、图像场景分割、人像分割、或人脸检测中的至少一种操作,以获得所述场景语义信息。At least one of object detection, scene classification, image scene segmentation, portrait segmentation, or face detection is performed on the multi-channel image to obtain the scene semantic information.
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述获取所述原始RAW域图像对应的多通道图像,包括:The method according to any one of claims 1-11, wherein the acquiring a multi-channel image corresponding to the original RAW domain image comprises:
    对所述原始RAW域图像进行预处理获得所述多通道图像,所述预处理包括去马赛克处理。The multi-channel image is obtained by preprocessing the original RAW domain image, and the preprocessing includes demosaicing.
  13. 根据权利要求1-12任一项所述的方法,其特征在于,所述多通道图像为三通道图像或者四通道图像。The method according to any one of claims 1-12, wherein the multi-channel image is a three-channel image or a four-channel image.
  14. 一种图像自动白平衡的方法,其特征在于,所述方法应用于包括至少两个摄像头的电子设备,所述至少两个摄像头包括第一摄像头和第二摄像头,所述方法包括:A method for image automatic white balance, characterized in that the method is applied to an electronic device including at least two cameras, the at least two cameras include a first camera and a second camera, and the method includes:
    根据用户的拍摄指令从所述至少两个摄像头中选择目标摄像头;所述拍摄指令包括拍摄倍率;Selecting a target camera from the at least two cameras according to a user's shooting instruction; the shooting instruction includes a shooting magnification;
    当所述目标摄像头为所述第二摄像头时,获取所述第二摄像头拍摄第二原始RAW域图像时采用的拍摄参数和所述第二原始RAW域图像对应的第二多通道图像;When the target camera is the second camera, acquiring shooting parameters used when the second camera shoots a second original RAW domain image and a second multi-channel image corresponding to the second original RAW domain image;
    对所述第二多通道图像进行颜色迁移,获得契合所述第一摄像头的迁移图像;Performing color migration on the second multi-channel image to obtain a migration image that fits the first camera;
    至少将所述第二摄像头的拍摄参数和所述迁移图像输入第一神经网络模型,得到白平衡的第一增益值;所述第一神经网络模型与所述第一摄像头相关联;At least inputting the shooting parameters of the second camera and the migration image into a first neural network model to obtain a first gain value of white balance; the first neural network model is associated with the first camera;
    根据所述第一增益值获得所述第二摄像头对应的第二增益值;Obtaining a second gain value corresponding to the second camera according to the first gain value;
    对所述第二多通道图像进行第一处理,得到第二目标图像;Performing first processing on the second multi-channel image to obtain a second target image;
    其中,所述第一处理包括基于所述第二多通道图像和所述第二增益值的白平衡处理。Wherein, the first processing includes white balance processing based on the second multi-channel image and the second gain value.
  15. 根据权利要求14所述的方法,其特征在于,所述对所述第二多通道图像进行颜色迁移,获得契合所述第一摄像头的迁移图像,包括:The method according to claim 14, wherein the performing color migration on the second multi-channel image to obtain a migration image that fits the first camera comprises:
    基于所述第二摄像头与所述第一摄像头之间的差异,对所述第二多通道图像进行颜色迁移操作,以获得契合所述第一摄像头对应的图像传感器的感光特性的迁移图像。Based on the difference between the second camera and the first camera, a color shift operation is performed on the second multi-channel image to obtain a shifted image that fits the photosensitive characteristics of the image sensor corresponding to the first camera.
  16. 根据权利要求14或15所述的方法,其特征在于,当所述目标摄像头为所述第一摄像头时,所述方法还包括:The method according to claim 14 or 15, wherein when the target camera is the first camera, the method further comprises:
    获取所述第一摄像头拍摄第一原始RAW域图像时采用的拍摄参数和所述第一原始RAW域图像对应的第一多通道图像;Acquiring the shooting parameters used when the first camera captures the first original RAW domain image and the first multi-channel image corresponding to the first original RAW domain image;
    至少将所述第一摄像头的拍摄参数和所述第一多通道图像输入所述第一神经网络模型,得到白平衡的第三增益值;At least inputting the shooting parameters of the first camera and the first multi-channel image into the first neural network model to obtain a third gain value of the white balance;
    根据所述第一多通道图像和所述第三增益值进行白平衡处理,以获得第一目标图像。White balance processing is performed according to the first multi-channel image and the third gain value to obtain a first target image.
  17. 根据权利要求14-16任一项所述的方法,其特征在于,所述第一摄像头和所述第二摄像头各自的倍率不同,或者,所述第一摄像头和所述第二摄像头各自对应的图像传感器不同。The method according to any one of claims 14-16, wherein the respective magnifications of the first camera and the second camera are different, or the first camera and the second camera respectively correspond to The image sensor is different.
  18. 根据权利要求17所述的方法,其特征在于,所述第一摄像头和所述第二摄像头各自的摄像头类型不同,所述摄像头类型包括主摄像头、长焦摄像头、广角摄像头、中长焦摄像头、超长焦摄像头、超广角摄像头。The method according to claim 17, wherein the first camera and the second camera have different camera types, and the camera types include a main camera, a telephoto camera, a wide-angle camera, a medium telephoto camera, Ultra-telephoto camera, ultra-wide-angle camera.
  19. 根据权利要求18所述的方法,其特征在于,当所述第一摄像头和所述第二摄像头为主摄像头、长焦摄像头和广角摄像头中的两者时,以下至少一项成立:The method according to claim 18, wherein when the first camera and the second camera are two of a main camera, a telephoto camera, and a wide-angle camera, at least one of the following is true:
    所述长焦摄像头对应的图像传感器包括RGGB的模组;The image sensor corresponding to the telephoto camera includes an RGGB module;
    所述主摄像头对应的图像传感器包括RYYB的模组;The image sensor corresponding to the main camera includes a RYYB module;
    所述广角摄像头对应的图像传感器包括RGGB的模组;The image sensor corresponding to the wide-angle camera includes an RGGB module;
    所述长焦摄像头的拍摄倍率大于所述主摄像头的拍摄倍率;The shooting magnification of the telephoto camera is greater than the shooting magnification of the main camera;
    所述主摄像头的拍摄倍率大于所述广角摄像头的拍摄倍率。The shooting magnification of the main camera is greater than the shooting magnification of the wide-angle camera.
  20. 根据权利要求14-19任一项所述的方法,其特征在于,所述拍摄参数包括曝光值、快 门时间、光圈大小、或ISO感光度中的至少一个。The method according to any one of claims 14-19, wherein the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  21. 根据权利要求14-20任一项所述的方法,其特征在于,所述多通道图像为三通道图像或者四通道图像。The method according to any one of claims 14-20, wherein the multi-channel image is a three-channel image or a four-channel image.
  22. 一种图像自动白平衡的方法,其特征在于,所述方法应用于包括第一摄像头和第二摄像头的电子设备,所述方法包括:A method for image automatic white balance, characterized in that the method is applied to an electronic device including a first camera and a second camera, and the method includes:
    根据用户的拍摄指令选择所述第一摄像头和所述第二摄像头中的一者作为目标摄像头;所述拍摄指令包括拍摄倍率;Selecting one of the first camera and the second camera as the target camera according to the shooting instruction of the user; the shooting instruction includes a shooting magnification;
    获取所述目标摄像头拍摄原始RAW域图像时采用的拍摄参数和所述原始RAW域图像对应的多通道图像;Acquiring the shooting parameters used when the target camera shoots the original RAW domain image and the multi-channel image corresponding to the original RAW domain image;
    确定所述目标摄像头对应的神经网络模型;其中,所述第一摄像头与第一神经网络模型关联,所述第二摄像头与第二神经网络模型关联;Determining a neural network model corresponding to the target camera; wherein the first camera is associated with a first neural network model, and the second camera is associated with a second neural network model;
    将输入数据输入所述神经网络模型得到白平衡的增益值;其中,所述输入数据至少包括所述目标摄像头的拍摄参数和所述多通道图像;Inputting input data into the neural network model to obtain a white balance gain value; wherein, the input data includes at least the shooting parameters of the target camera and the multi-channel image;
    对所述多通道图像进行第一处理,得到目标图像;其中,所述第一处理包括基于所述多通道图像和所述增益值的白平衡处理。Performing first processing on the multi-channel image to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the gain value.
  23. 根据权利要求22所述的方法,其特征在于,所述第一摄像头和所述第二摄像头各自的倍率不同,或者,所述第一摄像头和所述第二摄像头各自对应的图像传感器不同。The method according to claim 22, wherein the magnifications of the first camera and the second camera are different, or the image sensors corresponding to the first camera and the second camera are different.
  24. 根据权利要求23所述的方法,其特征在于,所述第一摄像头和所述第二摄像头各自的摄像头类型不同,所述摄像头类型包括主摄像头、长焦摄像头、广角摄像头、中长焦摄像头、超长焦摄像头、超广角摄像头。22. The method of claim 23, wherein the first camera and the second camera have different camera types, and the camera types include a main camera, a telephoto camera, a wide-angle camera, a medium telephoto camera, Ultra-telephoto camera, ultra-wide-angle camera.
  25. 根据权利要求24所述的方法,其特征在于,当所述第一摄像头和所述第二摄像头为主摄像头、长焦摄像头和广角摄像头中的两者时,以下至少一项成立:The method according to claim 24, wherein when the first camera and the second camera are two of a main camera, a telephoto camera, and a wide-angle camera, at least one of the following is true:
    所述长焦摄像头对应的图像传感器包括RGGB的模组;The image sensor corresponding to the telephoto camera includes an RGGB module;
    所述主摄像头对应的图像传感器包括RYYB的模组;The image sensor corresponding to the main camera includes a RYYB module;
    所述广角摄像头对应的图像传感器包括RGGB的模组;The image sensor corresponding to the wide-angle camera includes an RGGB module;
    所述长焦摄像头的拍摄倍率大于所述主摄像头的拍摄倍率;The shooting magnification of the telephoto camera is greater than the shooting magnification of the main camera;
    所述主摄像头的拍摄倍率大于所述广角摄像头的拍摄倍率。The shooting magnification of the main camera is greater than the shooting magnification of the wide-angle camera.
  26. 根据权利要求22-25任一项所述的方法,其特征在于,所述拍摄参数包括曝光值、快门时间、光圈大小、或ISO感光度中的至少一个。The method according to any one of claims 22-25, wherein the shooting parameter includes at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  27. 一种实现图像自动白平衡的装置,其特征在于,包括:A device for realizing image automatic white balance, characterized in that it comprises:
    参数获取模块,用于获取所述第一摄像头拍摄原始RAW域图像时采用的拍摄参数;The parameter acquisition module is configured to acquire the shooting parameters used when the first camera shoots the original RAW domain image;
    图像获取模块,用于获取所述原始RAW域图像对应的多通道图像;An image acquisition module for acquiring a multi-channel image corresponding to the original RAW domain image;
    处理模块,用于将输入数据输入第一神经网络模型得到白平衡的第一增益值;所述输入 数据至少包括所述第一摄像头的拍摄参数和所述多通道图像;还用于对所述多通道图像进行第一处理,得到目标图像;其中,所述第一处理包括基于所述多通道图像和所述第一增益值的白平衡处理。The processing module is configured to input input data into the first neural network model to obtain the first gain value of the white balance; the input data includes at least the shooting parameters of the first camera and the multi-channel image; The multi-channel image is subjected to first processing to obtain a target image; wherein the first processing includes white balance processing based on the multi-channel image and the first gain value.
  28. 根据权利要求27所述的装置,其特征在于,所述拍摄参数包括曝光值、快门时间、光圈大小、或ISO感光度中的至少一个。The device according to claim 27, wherein the shooting parameter comprises at least one of exposure value, shutter time, aperture size, or ISO sensitivity.
  29. 根据权利要求27或28所述的装置,其特征在于,所述第一神经网络模型是通过融合所述第一摄像头的拍摄参数和所述多通道图像的图像特征来实现对所述第一增益值的预测的。The device according to claim 27 or 28, wherein the first neural network model implements the first gain by fusing the shooting parameters of the first camera and the image features of the multi-channel image. The value of the prediction.
  30. 根据权利要求29所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 29, wherein the processing module is specifically configured to:
    根据所述第一摄像头的拍摄参数和所述多通道图像,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera and the multi-channel image;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  31. 根据权利要求29所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 29, wherein the processing module is specifically configured to:
    将所述第一摄像头的拍摄参数和所述多通道图像发送到服务器;Sending the shooting parameters of the first camera and the multi-channel image to the server;
    接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  32. 根据权利要求29-31任一项所述的装置,其特征在于,所述第一神经网络模型包括第一特征提取网络、特征融合网络和光源预测网络;The device according to any one of claims 29-31, wherein the first neural network model comprises a first feature extraction network, a feature fusion network, and a light source prediction network;
    所述处理模块具体用于:The processing module is specifically used for:
    通过所述第一特征提取网络对所述多通道图像进行特征提取,获得第一特征;Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;
    通过所述特征融合网络融合所述第一摄像头的拍摄参数和所述第一特征,获得融合后的特征;Fusing the shooting parameters of the first camera and the first feature through the feature fusion network to obtain the fused feature;
    通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值。The light source prediction network is used to predict according to the merged features to obtain the first gain value.
  33. 根据权利要求27或28所述的装置,其特征在于,所述输入数据还包括由所述多通道图像表征的场景语义信息;所述第一神经网络模型具体是通过融合所述第一摄像头的拍摄参数、所述多通道图像的图像特征和所述多通道图像表征的场景语义信息来实现对所述增益值的预测的。The device according to claim 27 or 28, wherein the input data further includes scene semantic information represented by the multi-channel image; the first neural network model is specifically obtained by fusing the first camera Shooting parameters, image features of the multi-channel image, and scene semantic information represented by the multi-channel image are used to predict the gain value.
  34. 根据权利要求33所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 33, wherein the processing module is specifically configured to:
    对所述多通道图像进行场景语义信息的提取;Extracting scene semantic information on the multi-channel image;
    根据所述第一摄像头的拍摄参数、所述多通道图像和所述场景语义信息,通过配置在所述电子设备的第一神经网络模型获得所述第一增益值;Obtaining the first gain value through a first neural network model configured in the electronic device according to the shooting parameters of the first camera, the multi-channel image, and the scene semantic information;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  35. 根据权利要求33所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 33, wherein the processing module is specifically configured to:
    将所述第一摄像头的拍摄参数、所述多通道图像和所述场景语义信息发送到服务器;Sending the shooting parameters of the first camera, the multi-channel image and the scene semantic information to a server;
    接收来自所述服务器的所述第一增益值,所述第一增益值是通过配置在所述服务器的第一神经网络模型获得的;Receiving the first gain value from the server, where the first gain value is obtained through a first neural network model configured on the server;
    利用所述第一增益值对所述多通道图像进行白平衡处理;Performing white balance processing on the multi-channel image by using the first gain value;
    对所述白平衡处理后的图像进行后处理,获得所述目标图像。Performing post-processing on the white balance processed image to obtain the target image.
  36. 根据权利要求33-35任一项所述的装置,其特征在于,所述第一神经网络模型包括第一特征提取网络、第二特征提取网络、特征融合网络和光源预测网络;The device according to any one of claims 33-35, wherein the first neural network model comprises a first feature extraction network, a second feature extraction network, a feature fusion network, and a light source prediction network;
    所述处理模块具体用于:The processing module is specifically used for:
    通过所述第一特征提取网络对所述多通道图像进行特征提取,获得第一特征;Performing feature extraction on the multi-channel image through the first feature extraction network to obtain a first feature;
    通过所述第二特征提取网络对所述场景语义信息进行特征提取,获得第二特征;Performing feature extraction on the scene semantic information through the second feature extraction network to obtain a second feature;
    通过所述特征融合网络融合所述拍摄参数、所述第一特征和所述第二特征,获得融合后的特征;Fusing the shooting parameters, the first feature, and the second feature through the feature fusion network to obtain the fused feature;
    通过所述光源预测网络根据所述融合后的特征进行预测,获得所述第一增益值。The light source prediction network is used to predict according to the merged features to obtain the first gain value.
  37. 根据权利要求34所述的装置,其特征在于,所述处理模块具体用于:The device according to claim 34, wherein the processing module is specifically configured to:
    对所述多通道图像执行物体检测、场景分类、图像场景分割、人像分割、或人脸检测中的至少一种操作,以获得所述场景语义信息。At least one of object detection, scene classification, image scene segmentation, portrait segmentation, or face detection is performed on the multi-channel image to obtain the scene semantic information.
  38. 根据权利要求27-37任一项所述的装置,其特征在于,所述图像获取模块具体用于:The device according to any one of claims 27-37, wherein the image acquisition module is specifically configured to:
    对所述原始RAW域图像进行预处理获得所述多通道图像,所述预处理包括去马赛克处理。The multi-channel image is obtained by preprocessing the original RAW domain image, and the preprocessing includes demosaicing.
  39. 根据权利要求27-38任一项所述的装置,其特征在于,所述多通道图像为三通道图像或者四通道图像。The device according to any one of claims 27-38, wherein the multi-channel image is a three-channel image or a four-channel image.
  40. 一种电子设备,其特征在于,所述电子设备包括摄像头、存储器和处理器,其中:所述摄像头用于拍摄图像;所述存储器用于存储程序;所述处理器用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,具体用于执行权利要求1至13中任一项所述的方法。An electronic device, wherein the electronic device includes a camera, a memory, and a processor, wherein: the camera is used to capture images; the memory is used to store programs; the processor is used to execute The program, when the processor executes the program stored in the memory, is specifically configured to execute the method according to any one of claims 1 to 13.
  41. 一种电子设备,其特征在于,所述电子设备包括至少两个摄像头、存储器和处理器,所述至少两个摄像头包括第一摄像头和第二摄像头,其中:所述至少两个摄像头均用于拍摄图像;所述存储器用于存储程序;所述处理器用于执行所述存储器存储的程序,当所述处理器执行所述存储器存储的程序时,具体用于执行权利要求14-21或者22-26中任一项所述的方法。An electronic device, characterized in that the electronic device includes at least two cameras, a memory and a processor, the at least two cameras include a first camera and a second camera, wherein: the at least two cameras are both used for Take an image; the memory is used to store a program; the processor is used to execute the program stored in the memory, and when the processor executes the program stored in the memory, it is specifically used to execute claims 14-21 or 22- The method of any one of 26.
  42. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序指令,当所述程序指令由处理器运行时,实现权利要求1-13或者权利要求14-21或者22-26中任一项所述的方法。A computer-readable storage medium, wherein program instructions are stored in the computer-readable storage medium, and when the program instructions are run by a processor, it implements claims 1-13 or claims 14-21 or 22 -The method of any one of 26.
  43. 一种芯片,其特征在于,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,以执行如权利要求1-13或者权利要求14-21或者22-24中任一项所述的方法。A chip, characterized in that, the chip includes a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute claims 1-13 or 14-21 or The method of any one of 22-24.
PCT/CN2021/085966 2020-04-10 2021-04-08 Image auto white balance method and apparatus WO2021204202A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010280949.7 2020-04-10
CN202010280949 2020-04-10
CN202010817963.6A CN113518210B (en) 2020-04-10 2020-08-14 Method and device for automatic white balance of image
CN202010817963.6 2020-08-14

Publications (1)

Publication Number Publication Date
WO2021204202A1 true WO2021204202A1 (en) 2021-10-14

Family

ID=78022779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/085966 WO2021204202A1 (en) 2020-04-10 2021-04-08 Image auto white balance method and apparatus

Country Status (1)

Country Link
WO (1) WO2021204202A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114222075A (en) * 2022-01-28 2022-03-22 广州华多网络科技有限公司 Mobile terminal image processing method and device, equipment, medium and product thereof
CN114638951A (en) * 2022-03-29 2022-06-17 北京有竹居网络技术有限公司 House model display method and device, electronic equipment and readable storage medium
CN115334234A (en) * 2022-07-01 2022-11-11 北京讯通安添通讯科技有限公司 Method and device for supplementing image information by taking pictures in dark environment
US20220394223A1 (en) * 2021-06-08 2022-12-08 Black Sesame International Holding Limited Neural network based auto-white-balancing
CN116709003A (en) * 2022-10-09 2023-09-05 荣耀终端有限公司 Image processing method and electronic equipment
WO2024055764A1 (en) * 2022-09-14 2024-03-21 华为技术有限公司 Image processing method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107343190A (en) * 2017-07-25 2017-11-10 广东欧珀移动通信有限公司 White balance adjusting method, apparatus and terminal device
CN107578390A (en) * 2017-09-14 2018-01-12 长沙全度影像科技有限公司 A kind of method and device that image white balance correction is carried out using neutral net
CN109151426A (en) * 2017-06-28 2019-01-04 杭州海康威视数字技术股份有限公司 A kind of white balance adjustment method, device, camera and medium
CN110647930A (en) * 2019-09-20 2020-01-03 北京达佳互联信息技术有限公司 Image processing method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109151426A (en) * 2017-06-28 2019-01-04 杭州海康威视数字技术股份有限公司 A kind of white balance adjustment method, device, camera and medium
CN107343190A (en) * 2017-07-25 2017-11-10 广东欧珀移动通信有限公司 White balance adjusting method, apparatus and terminal device
CN107578390A (en) * 2017-09-14 2018-01-12 长沙全度影像科技有限公司 A kind of method and device that image white balance correction is carried out using neutral net
CN110647930A (en) * 2019-09-20 2020-01-03 北京达佳互联信息技术有限公司 Image processing method and device and electronic equipment

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220394223A1 (en) * 2021-06-08 2022-12-08 Black Sesame International Holding Limited Neural network based auto-white-balancing
US11606544B2 (en) * 2021-06-08 2023-03-14 Black Sesame Technologies Inc. Neural network based auto-white-balancing
CN114222075A (en) * 2022-01-28 2022-03-22 广州华多网络科技有限公司 Mobile terminal image processing method and device, equipment, medium and product thereof
CN114638951A (en) * 2022-03-29 2022-06-17 北京有竹居网络技术有限公司 House model display method and device, electronic equipment and readable storage medium
CN114638951B (en) * 2022-03-29 2023-08-15 北京有竹居网络技术有限公司 House model display method and device, electronic equipment and readable storage medium
CN115334234A (en) * 2022-07-01 2022-11-11 北京讯通安添通讯科技有限公司 Method and device for supplementing image information by taking pictures in dark environment
CN115334234B (en) * 2022-07-01 2024-03-29 北京讯通安添通讯科技有限公司 Method and device for taking photo supplementary image information in dim light environment
WO2024055764A1 (en) * 2022-09-14 2024-03-21 华为技术有限公司 Image processing method and apparatus
CN116709003A (en) * 2022-10-09 2023-09-05 荣耀终端有限公司 Image processing method and electronic equipment

Similar Documents

Publication Publication Date Title
WO2021204202A1 (en) Image auto white balance method and apparatus
CN113518210B (en) Method and device for automatic white balance of image
TWI805869B (en) System and method for computing dominant class of scene
WO2019237992A1 (en) Photographing method and device, terminal and computer readable storage medium
US8666191B2 (en) Systems and methods for image capturing
CN110505411B (en) Image shooting method and device, storage medium and electronic equipment
US10074165B2 (en) Image composition device, image composition method, and recording medium
US11158027B2 (en) Image capturing method and apparatus, and terminal
JP2021530911A (en) Night view photography methods, devices, electronic devices and storage media
WO2022076116A1 (en) Segmentation for image effects
WO2020082382A1 (en) Method and system of neural network object recognition for image processing
US11825179B2 (en) Auto exposure for spherical images
JP2021005846A (en) Stacked imaging device, imaging device, imaging method, learning method, and image readout circuit
CN116744120B (en) Image processing method and electronic device
Yang et al. Personalized exposure control using adaptive metering and reinforcement learning
TW202223834A (en) Camera image or video processing pipelines with neural embedding and neural network training system
US20230419505A1 (en) Automatic exposure metering for regions of interest that tracks moving subjects using artificial intelligence
WO2021154807A1 (en) Sensor prioritization for composite image capture
WO2023160220A1 (en) Image processing method and electronic device
US11671714B1 (en) Motion based exposure control
CN111382753A (en) Light field semantic segmentation method and system, electronic terminal and storage medium
US20230370727A1 (en) High dynamic range (hdr) image generation using a combined short exposure image
CN116708996B (en) Photographing method, image optimization model training method and electronic equipment
CN116055855B (en) Image processing method and related device
CN116051368B (en) Image processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21784553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21784553

Country of ref document: EP

Kind code of ref document: A1