WO2021051996A1 - 一种图像处理的方法和装置 - Google Patents
一种图像处理的方法和装置 Download PDFInfo
- Publication number
- WO2021051996A1 WO2021051996A1 PCT/CN2020/103377 CN2020103377W WO2021051996A1 WO 2021051996 A1 WO2021051996 A1 WO 2021051996A1 CN 2020103377 W CN2020103377 W CN 2020103377W WO 2021051996 A1 WO2021051996 A1 WO 2021051996A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- deep learning
- images
- learning network
- raw
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000013135 deep learning Methods 0.000 claims abstract description 294
- 238000012545 processing Methods 0.000 claims abstract description 144
- 238000000034 method Methods 0.000 claims abstract description 112
- 238000007781 pre-processing Methods 0.000 claims abstract description 71
- 230000006870 function Effects 0.000 claims description 96
- 238000012937 correction Methods 0.000 claims description 69
- 230000009467 reduction Effects 0.000 claims description 69
- 238000012549 training Methods 0.000 claims description 67
- 230000004927 fusion Effects 0.000 claims description 34
- 238000005070 sampling Methods 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 25
- 230000008707 rearrangement Effects 0.000 claims description 25
- 238000003860 storage Methods 0.000 claims description 24
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000013507 mapping Methods 0.000 claims description 14
- 238000009826 distribution Methods 0.000 claims description 13
- 238000003705 background correction Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 8
- 230000002708 enhancing effect Effects 0.000 claims 1
- 238000011084 recovery Methods 0.000 abstract description 27
- 230000000694 effects Effects 0.000 abstract description 26
- 238000013473 artificial intelligence Methods 0.000 abstract description 10
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 22
- 101100248200 Arabidopsis thaliana RGGB gene Proteins 0.000 description 18
- 230000033001 locomotion Effects 0.000 description 13
- 238000009825 accumulation Methods 0.000 description 10
- 238000013528 artificial neural network Methods 0.000 description 8
- 239000004065 semiconductor Substances 0.000 description 6
- 230000004913 activation Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 241000699670 Mus sp. Species 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- PXHVJJICTQNCMI-UHFFFAOYSA-N Nickel Chemical compound [Ni] PXHVJJICTQNCMI-UHFFFAOYSA-N 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 239000004984 smart glass Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 239000000758 substrate Substances 0.000 description 2
- UFHFLCQGNIYNRP-UHFFFAOYSA-N Hydrogen Chemical compound [H][H] UFHFLCQGNIYNRP-UHFFFAOYSA-N 0.000 description 1
- HBBGRARXTFLTSG-UHFFFAOYSA-N Lithium ion Chemical compound [Li+] HBBGRARXTFLTSG-UHFFFAOYSA-N 0.000 description 1
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 125000001475 halogen functional group Chemical group 0.000 description 1
- 229910052739 hydrogen Inorganic materials 0.000 description 1
- 239000001257 hydrogen Substances 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 229910001416 lithium ion Inorganic materials 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000011093 media selection Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229910052759 nickel Inorganic materials 0.000 description 1
- 239000005022 packaging material Substances 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 230000003238 somatosensory effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4015—Image demosaicing, e.g. colour filter arrays [CFA] or Bayer patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/10—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
- H04N23/12—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths with one sensor only
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
- H04N23/84—Camera processing pipelines; Components thereof for processing colour signals
- H04N23/843—Demosaicing, e.g. interpolating colour pixel values
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N25/00—Circuitry of solid-state image sensors [SSIS]; Control thereof
- H04N25/10—Circuitry of solid-state image sensors [SSIS]; Control thereof for transforming different wavelengths into image signals
- H04N25/11—Arrangement of colour filter arrays [CFA]; Filter mosaics
- H04N25/13—Arrangement of colour filter arrays [CFA]; Filter mosaics characterised by the spectral characteristics of the filter elements
- H04N25/131—Arrangement of colour filter arrays [CFA]; Filter mosaics characterised by the spectral characteristics of the filter elements including elements passing infrared wavelengths
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N25/00—Circuitry of solid-state image sensors [SSIS]; Control thereof
- H04N25/10—Circuitry of solid-state image sensors [SSIS]; Control thereof for transforming different wavelengths into image signals
- H04N25/11—Arrangement of colour filter arrays [CFA]; Filter mosaics
- H04N25/13—Arrangement of colour filter arrays [CFA]; Filter mosaics characterised by the spectral characteristics of the filter elements
- H04N25/134—Arrangement of colour filter arrays [CFA]; Filter mosaics characterised by the spectral characteristics of the filter elements based on three different wavelength filter elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N25/00—Circuitry of solid-state image sensors [SSIS]; Control thereof
- H04N25/10—Circuitry of solid-state image sensors [SSIS]; Control thereof for transforming different wavelengths into image signals
- H04N25/11—Arrangement of colour filter arrays [CFA]; Filter mosaics
- H04N25/13—Arrangement of colour filter arrays [CFA]; Filter mosaics characterised by the spectral characteristics of the filter elements
- H04N25/135—Arrangement of colour filter arrays [CFA]; Filter mosaics characterised by the spectral characteristics of the filter elements based on four or more different wavelength filter elements
Definitions
- This application relates to the field of artificial intelligence, and in particular to an image processing method and device in computer vision technology.
- the image captured by the camera is an unprocessed RAW image.
- the conversion from a RAW image to a red, green, blue (RGB) and other displayable color images requires a series of image processing operations.
- image signal processing Image Signal Processing, ISP
- ISP Image Signal Processing
- a variety of image processing operations are performed sequentially in a certain order.
- multiple image processing operations affect each other, the use of multi-module serial operations will lead to Errors gradually accumulate, reducing the quality of the image.
- the embodiments of the present application provide an image processing method and device, which are used to reduce the accumulation of errors caused by the serial operation of multiple modules and improve the quality of the image.
- the first aspect of the present application provides an image processing method, which includes: acquiring multiple frames of original RAW images; preprocessing the multiple frames of RAW images to obtain a first intermediate image, and the preprocessing includes: channel splitting and Pixel rearrangement, the first intermediate image includes sub-images belonging to multiple channels, wherein the sub-image of each channel contains only one color component; the first intermediate image is processed based on the first deep learning network to obtain the first intermediate image A target image.
- the functions of the first deep learning network include: demosaicing DM and noise reduction; performing at least one of brightness enhancement or color enhancement on the first target image to obtain a second target image.
- Both demosaicing and noise reduction are operations related to detail restoration. Performing demosaic processing first will affect the effect of noise reduction, and noise reduction first will affect the effect of demosaicing. In the embodiment of this application, both demosaicing and noise reduction pass the same
- the implementation of the deep learning network avoids the accumulation of errors caused by the mutual influence of different processing when multiple processing is performed serially, and improves the effect of image detail restoration; further, the embodiment of the present application inputs N frames of RAW images at the same time, which combines The effective information of multi-frame images is helpful to better recover the image details; on the other hand, before inputting the image to the deep learning network for detail recovery, the N-frame image is preprocessed such as channel splitting and pixel rearrangement. , Improve the processing effect of the deep learning network.
- the function of the first deep learning network further includes: super-resolution SR reconstruction, the RAW image has a first resolution, the first target image has a second resolution, and the second resolution Greater than the first resolution.
- demosaicing, noise reduction, and SR processing are all critical processing for detail restoration. If DM and SR processing are performed first, it will amplify the noise of the image or destroy the noise shape of the original image, which will affect the reduction. The effect of noise; if the noise is reduced first, the loss of detail caused by the noise reduction processing will not be restored, thus affecting the effect of DM, SR and other processing.
- the three functions of demosaicing, noise reduction and SR reconstruction can be realized at the same time by training a deep learning network, and the demosaicing, noise reduction and SR reconstruction related to detail restoration are performed on the image through the deep learning network. There is a sequence of processing, which avoids the mutual influence between different processing caused by the serial operation of multiple modules, and also avoids the accumulation of errors caused by this.
- the function of the first deep learning network further includes at least one of dead pixel correction or phase point compensation.
- Dead pixel correction and phase point compensation are also algorithms related to detail restoration.
- the embodiment of the present application realizes the functions of demosaicing, noise reduction, dead pixel correction and phase point compensation at the same time through the same deep learning network, avoiding multiple different processing serials
- the accumulation of errors caused by the mutual influence of different processing during the process improves the effect of image detail restoration.
- the preprocessing further includes: at least one of dead pixel correction or phase point compensation.
- the dead point and phase point can be calibrated on the production line, and then the dead point correction and phase point compensation can be implemented in the preprocessing, which simplifies The computational complexity of the deep learning network.
- the function of the first deep learning network further includes: sharpening.
- the embodiment of the application simultaneously realizes the functions of demosaicing, noise reduction, sharpening, dead pixel correction, and phase point compensation through the same deep learning network, avoiding the accumulation of errors caused by the mutual influence of different processes when multiple different processes are performed in series. , Improve the effect of image detail restoration.
- the method further includes: sharpening the second target image to obtain a third target image; and sending the third target image to the display screen or the memory.
- brightness and color enhancement may affect the sharpness of the image edge, sharpening may not be integrated in the first deep learning network. After the brightness enhancement and color enhancement, the image can be sharpened according to actual needs, which can improve image processing Effect.
- the format of the RAW image includes: Bayer image in RGGB format, image in RYYB format, and image in XYZW format, where the image in XYZW format represents an image containing four color components, X, Y , Z, and W each represent a color component.
- the Bayer image in the RGGB format, the image in the RYYB format, and the image in the XYZW format adopt a Quad arrangement, and the minimum repeating unit of the Quad arrangement includes: 16, 24, or 32 pixels.
- the RAW image is an RYYB image or an image containing 4 different color components.
- the method Before performing brightness enhancement and color enhancement on the first target image to obtain the second target image, the method also It includes: performing color conversion on the first target image to obtain an RGB color image; performing brightness enhancement and color enhancement on the first target image to obtain a second target image, specifically including: performing brightness enhancement or color enhancement on the RGB color image At least one of the enhancements is used to obtain the second target image.
- the image containing 4 different color components includes: RGBIR image or RGBW image.
- the function of the first deep learning network further includes: image alignment.
- the constructed training data is unaligned images with differences in multiple frames, so that the trained deep learning network has the ability to align images.
- you can The image registration and motion compensation are not performed in advance, but the unaligned N-frame RAW images are directly input to the network, and the network realizes the alignment and fusion of multi-frame data by itself.
- the preprocessing further includes: image alignment.
- the preprocessing specifically includes: channel splitting and pixel rearrangement of the multi-frame RAW image to obtain multi-frame sub-images belonging to M channels, wherein the sub-images in each channel The number of frames of the image is equal to the number of frames of the multi-frame RAW image; the multi-frame sub-images in each channel are respectively aligned.
- the separately aligning the multi-frame sub-images in each channel specifically includes: aligning the multi-frame sub-images in the first channel, where the first channel is any one of the M channels ; Align other channels based on the alignment used when aligning the first channel.
- channel splitting and pixel rearrangement are performed first, then one channel is selected for alignment, and then other channels are aligned based on the same alignment method, which simplifies the amount of calculation required when aligning images.
- the number of channels obtained by channel splitting is related to the format of the RAW image, and the number of channels is equal to the number of pixels included in the smallest repeating unit of the RAW image.
- the brightness enhancement or color enhancement includes at least one of the following: black level correction BLC, automatic white balance AWB, lens shading correction LSC, tone mapping Tone Mapping, color correction Color Mapping, contrast increase, or Gamma correction.
- the pre-processing specifically includes: performing at least one of black level correction BLC, automatic white balance AWB, or lens shading correction LSC on the multi-frame RAW image to obtain a multi-frame first pre-processing.
- Processed RAW image perform channel splitting and pixel rearrangement on the first preprocessed RAW image of the multi-frame to obtain multi-frame sub-images belonging to M channels, where the sub-image frames in each channel The number is equal to the number of frames of the multi-frame RAW image; the multi-frame sub-images in each channel are aligned.
- the embodiment of the application first performs one or more of BLC, AWB, and LSC processing on the input N frames of RAW images, and then performs image registration, split channel and pixel rearrangement processing, which improves the deep learning network performance The effect of image detail restoration.
- the number of channels to which the sub-images included in the first intermediate image belong is equal to the number of pixels included in the smallest repeating unit of the RAW image.
- the first intermediate image it includes sub-images belonging to 4 channels; when the RAW image is an image with a Quad arrangement in which the smallest repeating unit includes 16 pixels, the first intermediate image includes sub-images belonging to 16 channels.
- the preprocessing further includes: estimating at least one of a noise intensity area distribution map or a sharpening intensity map of the image; the first deep learning network is specifically configured to implement at least one of the following: Controlling the noise reduction degree of different areas of the first intermediate image based on the noise intensity area distribution map; controlling the sharpening intensity of different areas of the first intermediate image based on the sharpness enhancement map.
- the embodiments of the present application can effectively control the noise reduction intensity of each region according to the noise characteristics of each region, or adaptively control the sharpening intensity of each region.
- the first deep learning network includes: multiple residual network convolution modules, at least one up-sampling convolution block, and a second feature fusion convolution module, the second feature convolution module
- the output is the output of the first deep learning network, and the number of feature channels of the second feature fusion convolution module is 3 or 4.
- the number of the up-sampling convolutional blocks is related to the format of the RAW image, the size of the RAW image, and the size of the first target image.
- the first deep learning network further includes: a feature extraction convolution module and a first feature fusion module, the output of the plurality of residual network convolution modules is the input of the first feature fusion module .
- the training data of the first deep learning network includes: multiple frames of low-quality input images and one frame of high-quality target images, and the low-quality input images are simulated based on the high-quality target images.
- At least mosaic and noise processing are performed on the high-quality target image to obtain the low-quality input image.
- the method is applied to the following scenes: dark light scene, zoom mode, high dynamic range HDR scene, and night scene mode.
- the multi-frame RAW image is a multi-frame short-exposure RAW image
- the training data of the first deep learning network includes multi-frame short-exposure training images.
- the short-exposure training image is obtained according to the following method: reverse gamma correction is performed on a high-quality image with reasonable exposure to obtain a reverse gamma correction image; each pixel value of the reverse gamma correction image is divided by a number to obtain The short exposure training image.
- the method when the method is applied to a dark scene, the number of frames of the input RAW image is increased; when the method is applied to a zoom mode, the up-sampling convolution in the first deep learning network The number of blocks is related to the zoom factor.
- the first deep learning network is a target deep learning network selected from a deep learning network resource pool according to the first instruction information, and the first instruction information is selected by the user on the APP interface of the application program.
- Instruction information related to the application scenario; or, the first instruction information is instruction information related to the application scenario obtained by analyzing the characteristics of the preview image obtained by the camera; or, the first instruction information is carried by inputting the multi-frame RAW image Magnification information.
- the second aspect of the present application provides an image processing method.
- the method includes: selecting a target deep learning network from a deep learning network resource pool based on first indication information, the deep learning network resource pool including deep learning with multiple different functions Network; Based on the target deep learning network, the input data is processed to obtain the first output image.
- the first indication information is indication information related to the application scenario selected by the user on the application APP interface; or, the first indication information is obtained by analyzing the characteristics of the preview image obtained by the camera Instruction information related to the application scenario; or, the first instruction information is the magnification information carried by the input multi-frame RAW image.
- the deep learning networks in the deep learning network resource pool all include at least two of the following image processing functions: demosaicing, noise reduction, super-resolution SR reconstruction, dead pixel removal, phase point Compensation and sharpening.
- application scenarios applicable to the deep learning network in the deep learning network resource pool include: zoom scenes with different magnifications, HDR scenes, dark light scenes, or night scene modes.
- the multi-frame RAW image is a multi-frame short-exposure RAW image
- the training data of the target deep learning network includes multi-frame short-exposure training images.
- the exposure training image is obtained according to the following method: reverse gamma correction is performed on a high-quality image with reasonable exposure to obtain a reverse gamma correction image; each pixel value of the reverse gamma correction image is divided by a number to obtain the Short exposure training images.
- the method when the method is applied to a dark light scene, the number of frames of the input RAW image is increased; when the method is applied to a zoom mode, the up-sampling convolution block in the target deep learning network The number of is related to the zoom factor.
- the third aspect of the present application provides an image processing device, the device includes: a preprocessing module, used to preprocess a multi-frame RAW image to obtain a first intermediate image, the preprocessing includes: channel splitting and pixel reconstruction Arranged, the first intermediate image includes sub-images belonging to multiple channels, wherein the sub-image of each channel contains only one color component; the first deep learning network is used to process the first intermediate image to obtain the first A target image.
- the functions of the first deep learning network include: demosaicing DM and noise reduction; an enhancement module for performing at least one of brightness enhancement or color enhancement on the first target image to obtain a second target image.
- the function of the first deep learning network further includes: super-resolution SR reconstruction, the RAW image has a first resolution, the first target image has a second resolution, and the second resolution The rate is greater than the first resolution.
- demosaicing, noise reduction, and SR processing are all critical processing for detail restoration. If DM and SR processing are performed first, it will amplify the noise of the image or destroy the noise shape of the original image, which will affect the reduction. The effect of noise; if the noise is reduced first, the loss of detail caused by the noise reduction processing will not be restored, which will affect the effect of DM, SR and other processing.
- the three functions of demosaicing, noise reduction and SR reconstruction can be realized at the same time by training a deep learning network, and the demosaicing, noise reduction and SR reconstruction related to detail restoration are performed on the image through the deep learning network. There is a sequence of processing, which avoids the mutual influence between different processing caused by the serial operation of multiple modules, and also avoids the accumulation of errors caused by this.
- the function of the first deep learning network further includes: at least one of dead pixel correction or phase point compensation; or, the preprocessing further includes: dead pixel correction or phase point compensation. At least one item.
- the function of the first deep learning network further includes: sharpening.
- the device further includes: a sharpening module for sharpening the second target image to obtain a third target image; a sending interface for sending the third target image to the display Screen or memory.
- the RAW image is an RYYB image or an image containing 4 different color components
- the device further includes: a color conversion module for performing color conversion on the first target image to obtain an RGB color image
- the enhancement module is specifically configured to perform at least one of brightness enhancement or color enhancement on the RGB color image to obtain the second target image.
- the function of the first deep learning network further includes: image alignment, or the preprocessing further includes: image alignment.
- the preprocessing further includes image alignment
- the preprocessing module is specifically configured to: perform channel splitting and pixel rearrangement on the multi-frame RAW image to obtain multi-frames belonging to M channels Sub-images, where the number of frames of sub-images in each channel is equal to the number of frames of the multi-frame RAW image; aligns the multi-frame sub-images in the first channel, where the first channel is any one of the M channels ; Align other channels based on the alignment used when aligning the first channel.
- the enhancement module is specifically configured to implement at least one of the following: black level correction BLC, automatic white balance AWB, lens shading correction LSC, tone mapping, color correction Color Mapping, contrast increase, or Gamma correction.
- the preprocessing module is specifically configured to: perform at least one of black level correction BLC, automatic white balance AWB, or lens shading correction LSC on the multi-frame RAW image to obtain multiple frames RAW image after the first preprocessing; channel splitting and pixel rearrangement are performed on the RAW image after the first preprocessing of the multi-frame, to obtain multi-frame sub-images belonging to M channels, where the sub-images in each channel The number of frames of the image is equal to the number of frames of the multi-frame RAW image; the multi-frame sub-images in each channel are aligned.
- the format of the RAW image includes: Bayer image in RGGB format, image in RYYB format, and image in XYZW format, where the image in XYZW format represents an image containing four color components, X, Y , Z, and W each represent a color component.
- the Bayer image in the RGGB format, the image in the RYYB format, and the image in the XYZW format adopt a Quad arrangement, and the minimum repeating unit of the Quad arrangement includes: 16, 24, or 32 pixels.
- the number of channels to which the sub-images included in the first intermediate image belong is equal to the number of pixels included in the smallest repeating unit of the RAW image.
- the RAW image when the RAW image is a red, green, green, and blue RGGB format image, a red, yellow, yellow, and blue RYYB format image or an XYZW format image with a minimum repeating unit containing 4 pixels, the first intermediate image Including sub-images belonging to 4 channels; when the RAW image is an image with a Quad arrangement of 16 pixels in the smallest repeating unit, the first intermediate image includes sub-images belonging to 16 channels; among them, the XYZW image means that it contains four For an image with color components, X, Y, Z, and W each represent a color component.
- the preprocessing module is further used to: estimate at least one of the noise intensity area distribution map or the sharpening intensity map of the image; the first deep learning network is specifically used to implement at least one of the following Item: Control the noise reduction degree of different areas of the first intermediate image based on the noise intensity area distribution map; control the sharpening intensity of different areas of the first intermediate image based on the sharpness enhancement map.
- the first deep learning network includes: multiple residual network convolution modules, at least one up-sampling convolution block, and a second feature fusion convolution module, the second feature convolution module
- the output is the output of the first deep learning network, and the number of feature channels of the second feature fusion convolution module is 3 or 4.
- the multi-frame RAW image is a multi-frame short-exposure RAW image; when the device is applied to a dark light scene, the input frame of the RAW image is increased
- the number of up-sampling convolutional blocks in the first deep learning network is related to the zoom factor.
- the device further includes a deep learning network resource pool, and the deep learning network resource pool includes multiple deep learning networks with different functions.
- the first deep learning network is a target deep learning network selected from a deep learning network resource pool according to the first instruction information, and the first instruction information is selected by the user on the APP interface of the application program.
- Instruction information related to the application scenario; or, the first instruction information is instruction information related to the application scenario obtained by analyzing the characteristics of the preview image obtained by the camera; or, the first instruction information is carried by inputting the multi-frame RAW image Magnification information.
- the fourth aspect of the present application provides a method for training a deep learning network, characterized in that the method includes: obtaining training data, the training data includes multiple frames of independent low-quality input data and one frame of high-quality target data.
- the quality input data is simulated based on the high-quality target data; the basic network architecture is trained based on the training data to obtain a deep learning network with the target function, which is related to the difference between the low-quality input data and the high-quality target data .
- obtaining the training data includes: obtaining the training data by using an artificial synthesis method.
- obtaining training data includes: downloading an open data set through the Internet; selecting high-quality images from the open data set as high-quality target images; or, using a high-quality camera to shoot the images that meet preset conditions High-quality image, the preset conditions are set correspondingly according to user needs; reverse Gamma correction is performed on the high-quality image to obtain a high-quality image after reverse Gamma correction; the high-quality image after the reverse Gamma correction is downloaded Sampling to obtain high-quality target images.
- acquiring the training data includes: performing a quality reduction operation on the acquired high-quality target image to obtain the low-quality input image.
- performing a quality reduction operation on the acquired high-quality target image includes: down-sampling the acquired high-quality target image, Gaussian blur, adding noise, mosaic processing, adding phase points or adding bad points At least one of the processing.
- the quality reduction operation is related to the target function of the deep learning network.
- acquiring training data includes: down-sampling, adding noise, and mosaicing the acquired high-quality target image , To get the low-quality input image.
- acquiring training data includes: down-sampling the acquired high-quality target image, Gaussian blurring , Add noise and mosaic processing to obtain the low-quality input image.
- acquiring training data includes: downloading the acquired high-quality target image Sampling, Gaussian blur, adding noise, mosaic processing and adding bad pixels, to obtain the low-quality input image.
- multiple frames of low-quality input images are obtained by respectively performing quality reduction operations based on the same frame of high-quality target image, and the multiple frames of low-quality input images are independently constructed.
- the loss function of the deep learning network includes the L1 Loss or L2 Loss function, or the combination of L1 Loss and structural similarity (Structural similarity, SSIM) and anti-Loss, or L2 Loss and SSIM Combine with the fight against Loss.
- L1 Loss or L2 Loss function or the combination of L1 Loss and structural similarity (Structural similarity, SSIM) and anti-Loss, or L2 Loss and SSIM Combine with the fight against Loss.
- the training method of the deep learning network includes an adaptive momentum estimation (Adma) method.
- Adma adaptive momentum estimation
- the fifth aspect of the present application provides a device for adaptively selecting a deep learning network.
- the device includes a receiving interface, an artificial intelligence AI controller, and a deep learning network resource pool.
- the deep learning network resource pool includes deep learning with multiple functions.
- Network; the accepting interface is used for the first indication information, the first indication information is used to indicate the currently applicable application scenario; the artificial intelligence controller is used to select from the deep learning network resource pool and the first indication information based on the first indication information A target deep learning network corresponding to the indication information.
- the device further includes a processor, configured to process the input image based on the target deep learning network to obtain the first output image.
- the first indication information is indication information related to the application scenario selected by the user on the application APP interface; or, the first indication information is obtained by analyzing the characteristics of the preview image obtained by the camera Instruction information related to the application scenario; or, the first instruction information is the magnification information carried by the input multi-frame RAW image.
- the most suitable deep learning network can be selected or enabled from the deep learning network resource pool according to the needs of the user or the characteristics of the input data or according to the parameters carried by the input data. , To meet the needs of different users or different scenarios to the greatest extent, and can provide the best deep learning network in different scenarios, provide the best image processing effects, optimize user experience, and improve mobile terminals or image processors Improved image processing performance to enhance competitiveness.
- the receiving interface is also used to receive input images or control signals.
- the deep learning networks in the deep learning network resource pool all include at least two of the following image processing functions: demosaicing, noise reduction, super-resolution SR reconstruction, dead pixel removal, phase point Compensate or sharpen.
- application scenarios applicable to the deep learning network in the deep learning network resource pool include: zoom scenes with different magnifications, HDR scenes, dark light scenes, or night scene modes.
- the deep learning network in the deep learning network resource pool is implemented by software codes or software modules, and the deep learning network resource pool is stored in a memory.
- the AI controller reads the target deep learning network from the deep learning network resource pool based on the first instruction information and loads it into the processor; the processor runs the target deep learning network to To achieve this goal the corresponding function of the deep learning network.
- the deep learning network is implemented by an artificial intelligence AI engine, and the AI engine is a hardware module or a dedicated hardware circuit.
- the device further includes: hardware computing resources, and the hardware computing resources include at least one of addition, subtraction, multiplication, division, exponential operation, logarithmic operation, or size comparison.
- the hardware computing resources can be multiplexed by multiple deep learning networks.
- the device further includes a preprocessing module for channel splitting and pixel rearrangement of the initially input RAW image to obtain sub-images belonging to multiple channels, and the sub-images of each channel Contains only one color component.
- the preprocessing module is further used to analyze the characteristics of the preview image obtained by the camera, and send the characteristic signal to the AI controller.
- the receiving interface is used to: obtain multiple frames of short exposure RAW images, the training data of the target deep learning network includes multiple frames of short exposure training images, and
- the short-exposure training image is obtained according to the following method: reverse gamma correction is performed on a high-quality image with reasonable exposure to obtain a reverse gamma correction image; each pixel value of the reverse gamma correction image is divided by a number to obtain The short exposure training image.
- the device when the device is applied to a dark light scene, the number of frames of the input RAW image is increased; when the method is applied to the zoom mode, the up-sampling convolutional block in the target deep learning network The number is related to the zoom factor.
- a sixth aspect of the present application provides an image processing device, which includes: a receiving interface and a processor, on which a first deep learning network runs, and the functions of the first deep learning network include: demosaicing DM and Noise reduction; the receiving interface is used to receive multi-frame RAW images obtained by the camera; the processor is used to call the software code stored in the memory to execute the method in the first aspect or any one of the possible implementation manners .
- the seventh aspect of the present application provides an image processing device, the device comprising: a receiving interface and a processor, the receiving interface is used to obtain the first indication information; the processor is used to call the software code stored in the memory to Perform the method in the second aspect or any one of the possible implementations.
- the device further includes a memory for storing the deep learning network resource pool.
- An eighth aspect of the present application provides an image processing device, the device comprising: a receiving interface and a processor, the receiving interface is used to obtain training data, the training data includes multiple frames of independent low-quality input data and one frame High-quality target data, the low-quality input data is simulated based on the high-quality target data; the processor is used to call the software code stored in the memory to execute the method in the fourth aspect or any one of the possible implementation manners .
- a ninth aspect of the present application provides a computer-readable storage medium.
- the method includes: instructions stored in the computer-readable storage medium, which when run on a computer or processor, cause the computer or processor to execute as described above The method in the first aspect or any one of its possible implementation manners.
- the tenth aspect of the present application provides a computer-readable storage medium with instructions stored in the computer-readable storage medium, which when run on a computer or processor, cause the computer or processor to execute the above-mentioned second aspect or The method in any of its possible implementations.
- the eleventh aspect of the present application provides a computer-readable storage medium having instructions stored in the computer-readable storage medium, which when run on a computer or processor, cause the computer or processor to execute the fourth aspect as described above Or the method in any of its possible implementations.
- the twelfth aspect of the present application provides a computer program product containing instructions, which when it runs on a computer or processor, causes the computer or processor to execute the first aspect or any one of its possible implementations. Methods.
- the thirteenth aspect of this application provides a computer program product containing instructions, which when it runs on a computer or processor, causes the computer or processor to execute the second aspect or any one of its possible implementations. Methods.
- the fourteenth aspect of the present application provides a computer program product containing instructions, which when it runs on a computer or processor, causes the computer or processor to execute the fourth aspect or any one of its possible implementations. Methods.
- FIG. 1 is a schematic structural diagram of an exemplary terminal provided by an embodiment of the application.
- Fig. 2 is a hardware architecture diagram of an exemplary image processing apparatus provided by an embodiment of the application
- FIG. 3 is a schematic flowchart of an exemplary image processing method provided by an embodiment of the application.
- Fig. 4a is an exemplary Bayer image in RGGB format provided by an embodiment of the application.
- FIG. 4b is an exemplary RGBIR image provided by an embodiment of the application.
- FIG. 5 is an image of an exemplary Quad arrangement provided by an embodiment of the application.
- FIG. 6a is an exemplary schematic diagram of performing channel splitting and pixel rearrangement on a Bayer image in RGGB format to obtain a first intermediate image according to an embodiment of the application;
- Fig. 6b is an exemplary schematic diagram of performing channel splitting and pixel rearrangement on Quad arranged images to obtain a first intermediate image according to an embodiment of the application;
- FIG. 7 is an exemplary image processing framework provided by an embodiment of this application.
- FIG. 8 is another exemplary image processing framework provided by an embodiment of the application.
- FIG. 9 is another exemplary image processing framework provided by an embodiment of the application.
- FIG. 10 is a schematic structural diagram of an exemplary deep learning network provided by an embodiment of this application.
- FIG. 11 is a schematic diagram of the processing effect of an exemplary detail recovery network provided by an embodiment of this application.
- FIG. 12 is a structural diagram of an exemplary feature extraction convolution block provided by an embodiment of the application.
- FIG. 13 is a structural diagram of an exemplary residual network convolution block provided by an embodiment of the application.
- Fig. 14a is a structural diagram of an exemplary feature fusion module 1 provided by an embodiment of the application.
- FIG. 14b is a structural diagram of an exemplary feature fusion module 2 provided by an embodiment of the application.
- FIG. 15 is a structural diagram of an exemplary up-sampling convolution block provided by an embodiment of the application.
- FIG. 16 is a flowchart of an exemplary method for adaptively selecting a deep learning network provided by an embodiment of this application.
- Fig. 17 is an exemplary device for adaptively selecting a deep learning network provided by an embodiment of the application.
- At least one (item) refers to one or more, and “multiple” refers to two or more.
- “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
- the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
- the following at least one item (a) or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
- At least one of a, b, or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
- the image obtained by the mobile phone camera is a RAW image, it usually needs to be converted to an RGB color image before it can be displayed on the display device.
- the image format finally displayed on the display device can also be other image formats, such as YUV color image, YCbCr
- the embodiment of the present application takes the image finally displayed on the display device as an RGB image as an example for description. Converting RAW images to RGB images requires a series of image processing operations such as detail restoration, color restoration, and brightness restoration. Among them, the processing related to detail restoration includes demosaicking (DM), dead pixel correction, noise reduction, sharpening, and super resolution (SR) reconstruction, etc.
- DM demosaicking
- SR super resolution
- SR reconstruction processing is only needed at that time.
- operations such as DM, dead pixel correction, and SR reconstruction usually require pixel filling or interpolation. Sharpening needs to strengthen and highlight the edges and texture of the image. If DM, dead pixel correction, and SR reconstruction are performed first, the image will be enlarged. The noise or destroy the noise shape of the original image will affect the effect of noise reduction; if the noise is reduced first, the detail loss caused by the noise reduction processing will not be restored, which will affect the effect of DM, bad pixel correction, SR reconstruction and other processing. Therefore, the use of multi-module serial operation will cause errors to gradually accumulate.
- the embodiments of the present application propose a framework, method, and device for image processing based on deep learning, which integrate a variety of processing related to detail restoration into a deep learning network, and multiple images can be realized through a deep learning network.
- Processing function thereby reducing the mutual influence between different image processing, and reducing the accumulation of errors.
- processing such as demosaicing, noise reduction, and super-resolution reconstruction may be integrated into a deep learning network.
- processing such as bad pixel correction and sharpening may also be integrated into the deep learning network.
- the image processing framework provided by the embodiments of the present application greatly improves the resolution, clarity, and visual effects of the image, while suppressing the phenomenon of moiré, halo, and overshoot, and is suitable for zooming and high dynamic range (high dynamic range, Various shooting scenes such as HDR) and night scene mode. Further, in the embodiment of the present application, multiple frames of continuous images are used as input at the same time, and effective information of multiple frames of images is merged to better restore image details.
- the image processing framework and image processing method provided in the embodiments of this application are applicable to various terminals.
- the image processing apparatus provided in the embodiments of this application can be various types of terminal products, such as smart phones, tablet computers, smart glasses, Wearable devices, cameras, video cameras, etc., as shown in FIG. 1, are schematic diagrams of the architecture of an exemplary terminal 100 provided in this embodiment of the present application.
- the terminal 100 may include an antenna system 110, a radio frequency (RF) circuit 120, a processor 130, a memory 140, a camera 150, an audio circuit 160, a display screen 170, one or more sensors 180, a wireless transceiver 190, and so on.
- RF radio frequency
- the antenna system 110 may be one or more antennas, and may also be an antenna array composed of multiple antennas.
- the radio frequency circuit 120 may include one or more analog radio frequency transceivers, the radio frequency circuit 120 may also include one or more digital radio frequency transceivers, and the RF circuit 120 is coupled to the antenna system 110. It should be understood that in the various embodiments of the present application, coupling refers to mutual connection in a specific manner, including direct connection or indirect connection through other devices, such as connection through various interfaces, transmission lines, buses, and the like.
- the radio frequency circuit 120 can be used for various types of cellular wireless communications.
- the processor 130 may include a communication processor, and the communication processor may be used to control the RF circuit 120 to receive and send a signal through the antenna system 110, and the signal may be a voice signal, a media signal, or a control signal.
- the processor 130 may include various general processing devices, such as a general central processing unit (Central Processing Unit, CPU), a system on chip (System on Chip, SOC), a processor integrated on the SOC, and a separate processor chip. Or a controller, etc.; the processor 130 may also include a dedicated processing device, such as an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA), or a digital signal processor (Digital Signal Processor).
- ASIC application specific integrated circuit
- FPGA Field Programmable Gate Array
- Digital Signal Processor Digital Signal Processor
- the processor 130 may be a processor group composed of multiple processors, and the multiple processors are coupled to each other through one or more buses.
- the processor may include an analog-to-digital converter (ADC) and a digital-to-analog converter (Digital-to-Analog Converter, DAC) to realize signal connection between different components of the device.
- ADC analog-to-digital converter
- DAC Digital-to-Analog Converter
- the memory 140 is coupled to the processor 130. Specifically, the memory 140 may be coupled to the processor 130 through one or more memory controllers.
- the memory 140 may be used to store computer program instructions, including a computer operating system (Operation System, OS) and various user applications.
- the memory 140 may also be used to store user data, such as calendar information, contact information, acquired image information, Audio information or other media files, etc.
- the processor 130 may read computer program instructions or user data from the memory 140, or store computer program instructions or user data in the memory 140, so as to implement related processing functions.
- the memory 140 may be a non-power-down volatile memory, such as EMMC (Embedded Multi Media Card, embedded multimedia card), UFS (Universal Flash Storage), or read-only memory (Read-Only Memory, ROM), Or other types of static storage devices that can store static information and instructions, or volatile memory (volatile memory) such as Random Access Memory (RAM) or other types that can store information and instructions
- EMMC embedded Multi Media Card, embedded multimedia card
- UFS Universal Flash Storage
- ROM read-only memory
- volatile memory volatile memory
- volatile memory volatile memory
- volatile memory volatile memory
- RAM Random Access Memory
- the type of dynamic storage device can also be electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM), or other optical disk storage, optical disc Storage (including compact discs, laser discs, optical discs, digital universal discs, Blu-ray discs, etc.), magnetic disk storage media or other
- the camera 150 is used to collect images or videos, and can be triggered to be turned on by an application program instruction to realize a photographing or camera function, such as taking pictures or videos of any scene.
- the camera may include imaging lenses, filters, image sensors and other components. The light emitted or reflected by the object enters the imaging lens, passes through the filter, and finally converges on the image sensor.
- the imaging lens is mainly used to converge and image the light emitted or reflected by all objects in the camera angle of view (also called the scene to be shot, the target scene, or the scene image that the user expects to shoot);
- the filter is mainly used Used to filter out unnecessary light waves in the light (for example, light waves other than visible light, such as infrared);
- the image sensor is mainly used to photoelectrically convert the received light signal, convert it into an electrical signal, and input it to the processor 130 for subsequent deal with.
- the camera may be located in front of the terminal device or on the back of the terminal device. The specific number and arrangement of the cameras can be flexibly determined according to the requirements of the designer or manufacturer's strategy, which is not limited in this application.
- the audio circuit 160 is coupled with the processor 130.
- the audio circuit 160 may include a microphone 161 and a speaker 162.
- the microphone 161 may receive sound input from the outside, and the speaker 162 may play audio data.
- the terminal 100 may have one or more microphones and one or more earphones, and the embodiment of the present application does not limit the number of microphones and earphones.
- the display screen 170 is used to display information input by the user and various menus of the information provided to the user. These menus are associated with specific internal modules or functions.
- the display screen 170 can also accept user input, such as enabling or disabling. And other control information.
- the display screen 170 may include a display panel 171 and a touch panel 172.
- the display panel 171 may adopt a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), a light-emitting diode (Light Emitting Diode, LED) display device, or a cathode ray tube (Cathode Ray tube). Tube, CRT) etc.
- the touch panel 172 also known as a touch screen, a touch-sensitive screen, etc., can collect user contact or non-contact operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 172
- operations near the touch panel 172 may also include somatosensory operations; the operations include single-point control operations, multi-point control operations and other types of operations.), and drive the corresponding connection device according to a preset program.
- the touch panel 172 may include two parts: a touch detection device and a touch controller.
- the touch detection device detects the signal brought by the user's touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into information that can be processed by the processor 130, and then sends it to
- the processor 130 can receive and execute commands sent by the processor 130.
- the touch panel 172 can cover the display panel 171, and the user can follow the content displayed on the display panel 171.
- the display content includes, but is not limited to, soft keyboard, virtual mouse, virtual keys, icons, etc., which are covered on the display panel 171. An operation is performed on or near the touch panel 172.
- the touch panel 172 After the touch panel 172 detects an operation on or near it, it is transmitted to the processor 130 through the I/O subsystem 10 to determine the user input, and then the processor 130 passes through the user input according to the user input.
- the I/O subsystem 10 provides corresponding visual output on the display panel 171.
- the touch panel 172 and the display panel 171 are used as two independent components to implement the input and input functions of the terminal 100, in some embodiments, the touch panel 172 and the display panel 171 may be integrated And realize the input and output functions of the terminal 100.
- the sensor 180 may include an image sensor, a motion sensor, a proximity sensor, an environmental noise sensor, a sound sensor, an accelerometer, a temperature sensor, a gyroscope, or other types of sensors, and various combinations of them.
- the processor 130 drives the sensor 180 to receive various information such as audio signals, image signals, and motion information through the sensor controller 12 in the I/O subsystem 10, and the sensor 180 transmits the received information to the processor 130 for processing.
- the wireless transceiver 190 can provide wireless connection capabilities to other devices.
- the other devices can be peripheral devices such as wireless headsets, Bluetooth headsets, wireless mice, wireless keyboards, etc., or wireless networks, such as wireless fidelity (Wireless Fidelity). Fidelity, WiFi) network, wireless personal area network (Wireless Personal Area Network, WPAN), or other wireless local area network (Wireless Local Area Network, WLAN), etc.
- the wireless transceiver 190 may be a Bluetooth compatible transceiver, which is used to wirelessly couple the processor 130 to peripheral devices such as a Bluetooth headset, a wireless mouse, etc.
- the wireless transceiver 190 may also be a WiFi compatible transceiver for processing The device 130 is wirelessly coupled to a wireless network or other devices.
- the terminal 100 may also include other input devices 14, which are coupled to the processor 130 to receive various user inputs, such as receiving inputted numbers, names, addresses, and media selections, etc.
- the other input devices 14 may include keyboards, physical buttons (press buttons, etc.). , Rocker buttons, etc.), dials, slide switches, joysticks, click scroll wheels, optical mice (optical mice are touch-sensitive surfaces that do not display visual output, or are an extension of touch-sensitive surfaces formed by touch screens).
- the terminal 100 may also include the aforementioned I/O subsystem 10, and the I/O subsystem 10 may include other input device controllers 11 for receiving signals from other input devices 14 or sending control of the processor 130 to other input devices 190. Or driving information, the I/O subsystem 10 may also include the aforementioned sensor controller 12 and display controller 13, which are used to implement the exchange of data and control information between the sensor 180 and the display screen 170 and the processor 130, respectively.
- the terminal 100 may further include a power source 101 to supply power to other components of the terminal 100 including 110-190, and the power source may be a rechargeable or non-rechargeable lithium ion battery or a nickel hydrogen battery.
- the power supply 101 when the power supply 101 is a rechargeable battery, it can be coupled with the processor 130 through a power management system, so that functions such as management of charging, discharging, and power consumption adjustment can be realized through the power management system.
- terminal 100 in FIG. 1 is only an example, and does not limit the specific form of the terminal 100.
- the terminal 100 may also include other existing components that are not shown in FIG. 1 or that may be added in the future.
- the RF circuit 120, the processor 130, and the memory 140 may be partially or completely integrated on one chip, or may be three independent chips.
- the RF circuit 120, the processor 130, and the memory 140 may include one or more integrated circuits arranged on a printed circuit board (PCB).
- PCB printed circuit board
- FIG. 2 an exemplary hardware architecture diagram of an image processing apparatus provided by an embodiment of this application.
- the image processing apparatus 200 may be, for example, a processor chip.
- the hardware architecture shown in FIG. 2 The figure may be an exemplary architecture diagram of the processor 130 in FIG. 1, and the image processing method and image processing framework provided by the embodiment of the present application may be applied to the processor chip.
- the device 200 includes: at least one CPU, a memory, a microcontroller (Microcontroller Unit, MCU), a GPU, an NPU, a memory bus, a receiving interface, a transmitting interface, and so on.
- the device 200 may also include an application processor (AP), a decoder, and a dedicated video or image processor.
- AP application processor
- decoder decoder
- dedicated video or image processor dedicated video or image processor
- the connectors include various interfaces, transmission lines, or buses. These interfaces are usually electrical communication interfaces, but may also be mechanical interfaces or other forms of interfaces. The embodiment does not limit this.
- the CPU can be a single-CPU processor or a multi-CPU processor; alternatively, the CPU can be a processor group composed of multiple processors, between multiple processors Coupled to each other through one or more buses.
- the receiving interface may be a data input interface of the processor chip.
- the receiving interface and the transmitting interface may be High Definition Multimedia Interface (HDMI), V-By-One Interface, Embedded Display Port (eDP), Mobile Industry Processor Interface (MIPI) or Display Port (DP), etc.
- HDMI High Definition Multimedia Interface
- eDP Embedded Display Port
- MIPI Mobile Industry Processor Interface
- DP Display Port
- the above-mentioned parts are integrated on the same chip; in another optional case, the CPU, GPU, decoder, receiving interface, and transmitting interface are integrated on one chip, and the chip is Each part of the access external memory through the bus.
- the dedicated video/graphics processor can be integrated with the CPU on the same chip, or it can exist as a separate processor chip.
- the dedicated video/graphics processor can be a dedicated ISP.
- the NPU can also be used as an independent processor chip.
- the NPU is used to implement various neural network or deep learning related operations.
- the image processing method and image processing framework provided in the embodiments of the present application may be implemented by a GPU or an NPU, or may be implemented by a dedicated graphics processor.
- the chip involved in the embodiments of this application is a system manufactured on the same semiconductor substrate by an integrated circuit process, also called a semiconductor chip, which can be fabricated on a substrate using an integrated circuit process (usually a semiconductor such as silicon). Material) a collection of integrated circuits formed on the surface, the outer layer of which is usually encapsulated by a semiconductor packaging material.
- the integrated circuit may include various types of functional devices. Each type of functional device includes transistors such as logic gate circuits, Metal-Oxide-Semiconductor (MOS) transistors, bipolar transistors or diodes, and may also include capacitors and resistors. Or inductance and other components. Each functional device can work independently or under the action of necessary driver software, and can realize various functions such as communication, calculation, or storage.
- MOS Metal-Oxide-Semiconductor
- FIG. 3 it is a schematic flowchart of an image processing method provided by an embodiment of this application.
- the image processing method includes:
- a RAW image is an unprocessed original image obtained by a camera.
- Each pixel of the RAW image only represents the intensity of one color.
- the camera can be a Complementary Metal-Oxide Semiconductor (CMOS)
- CMOS Complementary Metal-Oxide Semiconductor
- CCD Charge-Coupled Device
- the color format of the RAW image is determined by the color filter array (CFA) placed in front of the sensor.
- CFA color filter array
- the RAW image can be acquired in various CFA formats
- the RAW image can be a Bayer image in RGGB format. As shown in Figure 4a, it is a Bayer image in RGGB format.
- each grid represents a pixel
- R represents a red pixel
- G represents a green pixel.
- the RAW image can also be an image in red yellow yellow blue (RYYB) format, or an image in XYZW format, where XYZW format represents an image format containing 4 components, X, Y, Z, W each represents a component, such as a Bayer image arranged in red, green and blue infrared (RGBIR), or, for example, a Bayer image arranged in red, green, blue and white (RGBW), as shown in Figure 4b It is an exemplary RGBIR image.
- RGBIR red, green and blue infrared
- RGBW red, green, blue and white
- the RAW image may also be an image in a Quad arrangement as shown in FIG. 5.
- the length and width of the input RAW image are h and w, respectively, and N is a positive integer.
- N can be 4 or 6, etc.
- the N frames of images are continuously acquired N frames of images, and the time interval between successive acquisitions of the N frames of images may be equal or unequal.
- the N frames of images may not be continuous, for example, It is the 1st, 3rd, 5th, and 7th frame of the continuously acquired multi-frame images.
- the execution body of the image processing is the processor chip shown in Figure 2
- the RAW image may be obtained through the receiving interface, and the RAW image is taken by the camera of the terminal; if the execution body of the image processing is as In the terminal shown in FIG. 1, the RAW image may be obtained through the camera 150.
- the preprocessing includes channel splitting and pixel rearrangement
- the first intermediate image includes sub-images belonging to multiple channels, wherein each sub-image contains only one color component.
- the RGGB format is shown in Figure 6a, which is a schematic diagram of the first intermediate image obtained by channel splitting and pixel rearrangement of the Bayer image in the RGGB format.
- the smallest repeating unit of the Bayer image in the RGGB format includes R, G, G, and B.
- the 4 pixels R, G, G, B in each smallest repeating unit in the RAW image are split and rearranged to obtain four different sub-images.
- a w*h RAW image is split into Four frames of w/2*h/2 sub-images, N frames of w*h RAW images are split into 4*N frames of w/2*h/2 sub-images. That is, when the input RAW image is a Bayer image in N frames of RGGB format, the first intermediate image includes 4*N frames w/2*h/2 sub-images belonging to 4 channels, where each channel contains N frames of sub-images, each frame of sub-images contains only one color component.
- the 4*N frame of sub-images includes N frames of R sub-images belonging to the first channel, N frames of G sub-images belonging to the second channel, and N Frames belong to the G sub-image of the third channel and N frames belong to the B sub-image of the fourth channel.
- the first intermediate image also includes sub-images belonging to 4 channels. If the number of frames of the input RAW image is N, the first intermediate image contains sub-images. The number of images is 4*N frames, and the number of sub-images contained in each channel is equal to the number of frames of RAW images equal to N. As shown in Fig.
- the first intermediate image is obtained by channel splitting and pixel rearrangement of the Quad-arranged image.
- the smallest repeating unit of the Quad-arranged image includes 4 pixels each of R, G, G, and B, a total of 16 A frame of w*h Quad arranged image is divided into channels and pixels are rearranged to obtain 16 frames of w/4*h/4 sub-images. Among them, one frame of sub-image belongs to one channel, and N frames of Quad The arranged image is split into 16*N frames of sub-images.
- the first intermediate image includes 16*N sub-images belonging to 16 channels, where each channel contains N Frame sub-images, each frame sub-image contains only one color component.
- the number of R, G, G, and B pixels included in the smallest repeating unit of the Quad arrangement image can also be 6, 8, or other numbers.
- the first The intermediate image includes sub-images belonging to 24 channels, or sub-images belonging to 32 channels. It should be understood that the number of channels of the first intermediate image is equal to the number of pixels contained in the smallest repeating unit of the RAW image.
- the preprocessing may also include image registration and motion compensation.
- Image registration can remove the changes between multiple frames of images caused by camera movement. However, if there are moving objects in the captured scene, after the image registration is completed, the background areas between the multiple frames of images are aligned, but The moving objects are not aligned, and the misalignment caused by the movement of the objects needs to be compensated.
- one frame of the N frames of images is selected as the reference frame, for example, the first frame of image can be used as the reference frame, and the other frame images are all registered with the reference frame to realize the alignment of multiple frames of images.
- the RAW image is split into channels first to obtain sub-images of multiple channels, and one of the channels is aligned first, and then the other channels are aligned based on the same method.
- image registration and motion compensation may be performed first to realize the alignment of the multi-frame RAW image, and then the RAW image can be split into channels.
- the constructed training data is unaligned images with different frames, so that the trained deep learning network has the ability to fuse multiple unaligned images
- the preprocessing may include estimating the magnitude of the noise in each area of the image, and obtaining a noise intensity distribution map, which can reflect the noise intensity distribution in different regions, and the noise intensity distribution map And the aligned and split image data are input into the first deep learning network together, so that the first deep learning network can adaptively control the noise reduction intensity of each region according to the noise characteristics of each region.
- a sharpening intensity map can be obtained during preprocessing, the sharpening intensity map contains the sharpening intensity for different regions, and the sharpened intensity map and the aligned and split image data Enter them into the first deep learning network together, so that the first deep learning network can adaptively control the sharpening intensity of each region.
- a noise distribution map and a sharpening intensity map can be obtained at the same time during preprocessing, and the noise distribution map, the sharpening intensity map and the image data to be processed are input into the first depth together. Learning network.
- the first deep learning network can implement at least two image processing functions related to detail restoration, and the first target image can be an RGB color image with rich details and low noise.
- the first target image obtained after the first deep learning network processing is a color image with three channels of RYB
- the input RAW image is in the XYZW format
- the first target image obtained after processing by the first deep learning network is a color image with four channels of XYZW.
- the image processing method further includes: performing color conversion on the first target image to obtain an RGB color image.
- the first deep learning network may include demosaicing and noise reduction functions. It can also be said that after the input image is processed by the deep learning network, it is equivalent to simultaneously achieving demosaicing and denoising processing. Since demosaicing and noise reduction are critical processes for detail restoration, and whether the demosaicing process or the noise reduction process is performed first, it will affect the restoration effect of image details.
- the embodiment of the present application combines the demosaicing process and the noise reduction process. It is implemented in the same deep learning network, avoiding the accumulation of errors caused by the serial processing of the two operations.
- the first target image output by the first deep learning network is an RGB color image that has been denoised and demosaic.
- the first deep learning network may include demosaicing, noise reduction, and SR reconstruction functions. It can also be said that after the input image is processed by the deep learning network, it is equivalent to simultaneously achieving demosaicing and de-mosaic processing. Noise processing and SR reconstruction processing.
- Super-resolution refers to obtaining high-resolution images from low-resolution images. For example, one frame of high-resolution images can be obtained based on one frame of low-resolution images, or one frame of high-resolution images can also be obtained based on multiple frames of low-resolution images. Resolution image. For scenes with super-resolution requirements, demosaicing, noise reduction, and SR reconstruction are all critical processes for detail restoration.
- the noise of the image will be amplified. Or destroy the noise shape of the original image and affect the effect of noise reduction; if the noise is reduced first, the loss of detail caused by the noise reduction processing will not be restored, thereby affecting the effect of DM, SR reconstruction and other processing.
- the embodiment of this application obtains a deep learning network that can realize DM, SR reconstruction and denoising at the same time through training. Since multiple functions are implemented by the same deep learning network, there is no sequential processing sequence, which avoids the serialization of multiple modules. The mutual influence between different processes brought about by the operation also avoids the accumulation of errors caused by this.
- the first target image output by the first deep learning network is an RGB color image that has been processed by denoising, demosaicing, and SR reconstruction. The resolution of the image after SR reconstruction is higher than the resolution of the image before SR reconstruction.
- the first deep learning network may include demosaicing, noise reduction, SR reconstruction, and dead pixel correction functions.
- dead pixels can refer to invalid or erroneous pixels in the image due to defects in the photosensitive component, or defects in the image, such as dots that are much brighter than the surroundings, dots that are much darker than the surroundings, or no pixels. Points that are particularly brighter or darker than the surroundings but have incorrect pixel values, etc.
- the first deep learning network may include demosaicing, noise reduction, SR reconstruction, dead pixel correction, and sharpening functions.
- the first deep learning network may include functions of demosaicing, noise reduction, SR reconstruction, dead pixel correction, sharpening, and phase point compensation.
- a phase point is a pixel point that contains phase information but does not contain effective pixel information. During display, it is necessary to obtain the pixel value corresponding to the phase point according to the pixel points around the phase point.
- the first deep learning network may include demosaicing, noise reduction, and dead pixel correction functions.
- the first deep learning network may include demosaicing, noise reduction, and sharpening functions.
- the first deep learning network may include demosaicing, noise reduction, dead pixel correction, and sharpening functions.
- the first deep learning network may include demosaicing, noise reduction, dead pixel correction, sharpening function, and phase point compensation function.
- the dead point and the phase point can be calibrated on the production line, and then in the preprocessing according to the bad point calibration of the production line Perform dead point correction and phase point compensation at the point position and phase point position; then input the image without dead points and phase points into the first deep learning network for detail reconstruction.
- the location detection of dead pixels and phase points, as well as dead point correction and phase point compensation can be implemented in preprocessing.
- the first deep learning network runs in the NPU or GPU in Figure 2; alternatively, the deep learning network can also run partly in the NPU and partly in the GPU; optionally, The operation of the first deep learning network may also involve the control function of the CPU or MCU.
- the processing of brightness enhancement or color enhancement includes at least one of the following: Black Level Correction (BLC), Auto-White Balance (AWB), Lens Shading Correction (LSC), Tone Mapping (Tone Mapping), color correction (Color Mapping), contrast increase or gamma correction, etc.
- BLC Black Level Correction
- AVB Auto-White Balance
- LSC Lens Shading Correction
- Tone Mapping Tone Mapping
- Color Mapping Color correction
- contrast increase or gamma correction etc.
- the brightness enhancement and the color enhancement can be implemented using a serial module, or a neural network.
- one or more of BLC, AWB, and LSC can be implemented in preprocessing.
- BLC, AWB, and LSC can be performed on the input N frames of RAW images.
- the preprocessing specifically includes: performing at least one of black level correction BLC, automatic white balance AWB, or lens shading correction LSC on the multi-frame RAW image to obtain the multi-frame RAW after the first preprocessing.
- the first pre-processed RAW image of this multi-frame is divided into channels and pixels are rearranged to obtain multi-frame sub-images belonging to M channels, where the number of sub-images in each channel is the same as the multi-frame RAW images have the same number of frames; align multiple frames of sub-images in each channel.
- sharpening may not be integrated in the first deep learning network. After the brightness and color enhancement, the image can be adjusted according to actual needs. Sharpen it.
- the image processing method further includes:
- encoding or compression processing may be performed first.
- the second target image can also be sent to other devices.
- the embodiment of the present application does not limit the location of the obtained second target image.
- the embodiment of the application integrates the processing related to detail restoration into the same deep learning network, avoids the mutual influence between different processing when multiple processing is performed serially, and reduces the accumulation of errors caused by the mutual influence of different processing , Improve the resolution and clarity of the image. Further, the embodiment of the present application inputs N frames of RAW images at the same time, which combines the effective information of multiple frames of images, which helps to better restore the image details; on the other hand, due to the possible differences between the multiple frames of images, Before inputting to the deep learning network for detail recovery, the N frames of images are preprocessed such as channel splitting, pixel rearrangement and alignment, which improves the processing effect of the deep learning network.
- FIG. 7 an image processing framework provided by an embodiment of this application.
- the image processing framework shown in FIG. 7 can be used to implement the image processing method shown in FIG. 3.
- the image processing framework includes: a preprocessing module, a detail restoration deep learning network, brightness, and color enhancement modules.
- the image processing framework also includes a display screen and a memory.
- the preprocessing module, the detail restoration deep learning network, and the brightness and color enhancement modules are implemented by the processor. These modules can be implemented by a software module on the processor, or by a dedicated hardware circuit on the processor, or by software and hardware. Achieved in a combined way.
- the preprocessing module, brightness and color enhancement module are implemented by the GPU or ISP or CPU in the processor, and the deep learning network is implemented by the NPU in the processor; alternatively, the deep learning network can also be jointly implemented by the GPU and NPU achieve.
- the preprocessing module and the deep learning network are implemented by the Application Processor (AP), and the brightness and color enhancement module is implemented by the Display Driving Integrated Circuit (DDIC). DDIC is used to drive the display screen.
- AP Application Processor
- DDIC Display Driving Integrated Circuit
- FIG. 7 may also be referred to as an enhancement module, and the enhancement module is used to implement at least one of brightness enhancement or color enhancement.
- the input of the image processing framework is N frames of RAW images.
- the N frames of RAW images can be Bayer images in RGGB format, Quad array images or other CFA format RAW images containing three color components of R, G, and B.
- the preprocessing module is used to preprocess the input N frames of RAW images to obtain the first intermediate image.
- the first intermediate image output by the preprocessing module is a 4N frame sub-image, which belongs to 4 channels, and the sub-image of each channel contains only one color. Weight.
- the 4N frames of sub-images include N frames of sub-images of R, G, G, and B components, and the sub-images of each component belong to one channel.
- the first intermediate image output by the preprocessing module is a 16N frame of sub-image, which belongs to 16 channels, and the sub-image of each channel contains only one color component.
- a minimum repeating unit contains four R, G, G, and B components.
- the 16N frame of sub-images includes 4N sub-images of R, G, G, and B components.
- Frame the sub-image of each sub-component belongs to a channel. It should be understood that the number of frames of the first intermediate image output by the preprocessing module is related to the number of pixels included in the smallest repeating unit of the input RAW image.
- the detail recovery deep learning network is an exemplary network of the first deep learning network in the foregoing method embodiment.
- the detail recovery deep learning network is used to recover the details of the preprocessed image. Specifically, this detailed restoration of the deep learning network is used to implement step 303.
- this detailed restoration of the deep learning network is used to implement step 303.
- step 303 For details, please refer to the description of part 303 of the method embodiment, which will not be repeated here.
- dead pixel correction and phase point compensation are implemented by the preprocessing module, and demosaicing, noise reduction and SR reconstruction are implemented by the detail recovery deep learning network; in an optional case, demosaicing,
- the functions of noise reduction, dead pixel correction, sharpening and phase point compensation are all implemented by the detail recovery deep learning network.
- the brightness and color enhancement module is used to perform brightness enhancement and color enhancement on the image output by the deep learning network for detail restoration. It should be understood that brightness enhancement and color enhancement can be implemented by the same module or by different modules, that is, the brightness enhancement module and the color enhancement module can be two different modules. In an optional situation, brightness enhancement and color enhancement may be implemented by multiple modules, for example, each processing related to brightness enhancement or color enhancement corresponds to a module.
- the brightness and color enhancement module is used to implement step 304.
- step 304 For details, please refer to the description of part 304 in the method embodiment, which will not be repeated here.
- the image processed by the image processing framework can be sent to the display screen or stored in the memory.
- the image processing framework shown in FIG. 8 can also be used to implement the image processing method shown in FIG. 3.
- the image processing framework includes: a preprocessing module, a detail restoration deep learning network, a brightness, color enhancement module, and a sharpening module.
- the image processing framework also includes a display screen and a memory.
- the sharpening module is behind the brightness and color enhancement modules. This is because the brightness enhancement and color enhancement may affect the sharpness of the image edge. Therefore, in the brightness enhancement After and color enhancement, the image is sharpened according to actual needs.
- the brightness and color enhancement module shown in FIG. 8 may also be referred to as an enhancement module, and the enhancement module is used to implement at least one of brightness enhancement or color enhancement.
- the image processing framework shown in FIG. 9 can also be used to implement the image processing method shown in FIG. 3.
- the image processing framework includes: a preprocessing module, a detail restoration deep learning network, a color conversion module, and a brightness and color enhancement module.
- the image processing framework also includes a display screen and a memory.
- the input of the image processing frame is N frames of RAW images
- the N frames of RAW images can be in RYYB format or XYZW format
- the first intermediate image output by the preprocessing module includes 4N frames of sub-images
- the 4N frames of sub-images include N frames of sub-images of R, Y, Y, and B components
- the image obtained after the detail recovery deep learning network processing is a color image with three channels of RYB
- the first intermediate image output by the preprocessing module includes 4N frames of sub-images.
- the 4N frames of sub-images include N frames of sub-images of X, Y, Z, and W components.
- the details are restored after deep learning network processing
- the obtained image is a color image with four channels of XYZW. Therefore, in the above two cases, there is a color conversion module behind the detail recovery deep learning network, which is used to convert RYB and XYZW color images into RGB color images.
- the image is converted into an RGB color image, it is processed by the brightness and color enhancement module, and then sent to the display screen or stored in the memory.
- a sharpening module can be added after the brightness and color enhancement module of the image processing frame shown in FIG. 9. It should be understood that the brightness and color enhancement module shown in FIG. 9 may also be referred to as an enhancement module, and the enhancement module is used to implement at least one of brightness enhancement or color enhancement.
- FIG. 10 it is a schematic structural diagram of an exemplary deep learning network provided by an embodiment of this application. It should be understood that FIG. 10 uses 2x zoom as an example to describe the structure of the deep learning network, and there are other forms of network structure, and the embodiment of the present application does not limit the specific form of the network structure. It should be understood that if the length and width of the output image of the deep learning network are twice the length and width of the input image, it means that the magnification of the deep learning network is 2 times. If the length and width of the image output by the deep learning network are If they are four times the length and width of the input image, it means that the magnification of the deep learning network is 4 times.
- 2x zoom means that the length and width of the final output image are respectively 2 times the length and width of the original input image.
- the original input image is different from the input image of the deep learning network.
- the deep learning network The input image is obtained by preprocessing the original input image.
- FIG. 11 it is a schematic diagram of the processing effect of an exemplary detail recovery network provided by an embodiment of this application.
- the detail recovery network is a deep learning network with 2x zoom.
- the original input image size is 4 frames of 6*6 RAW images. After the original input image is preprocessed, the input image of the detail recovery network is obtained.
- the detail recovery network The input image is the original input RAW image through channel splitting and pixel rearrangement to obtain the sub-image of the size of 3*3 R, G, G, B four components, of which, a frame of 6*6 RAW image After channel splitting and pixel rearrangement, 4 frames of 3*3 sub-images are obtained, and 4 frames of 6*6 RAW images are split to obtain a total of 16 sub-images (only 8 frames are shown in the figure), and the details are restored through the network After processing, the output image is a 12*12 RGB color image.
- the deep learning network includes: a feature extraction convolution module, multiple residual network convolution modules, a feature fusion module 1, two up-sampling convolution blocks, and a feature fusion convolution module 2.
- the feature extraction convolution block includes a first convolution layer Conv (k3n64s1), a first activation function layer (PReLU), a second convolution layer Conv (k3n128s1), and a second activation function layer (PReLU).
- k represents the size of the convolution kernel
- n represents the number of channels in the feature map after convolution
- s represents the convolution stride. It should be understood that k in the subsequent structure diagrams shown in Figure 13 to Figure 15 , n, s have the same physical meaning.
- the size of the convolution kernel of the first convolutional layer shown in Figure 12 is 3, the number of channels of the convolutional feature map is 64, the convolution step size is 1, and the convolution kernel of the second convolutional layer The size of is 3, the number of channels of the feature map after convolution is 128, and the convolution step size is 1.
- the embodiment of the present application only provides an exemplary structure of the feature extraction convolution block, and other structures may also be used.
- the number of convolutional layers and activation function layers may not be two, and the number of convolutional layers
- the numbers of k, n, s are optional.
- the detail recovery network may not include a feature extraction convolution module, or may include multiple feature extraction convolution modules.
- the residual network convolution block includes a first convolution layer Conv (k3n128s1), an activation function layer (PReLU), and a second convolution layer Conv (k3n128s1).
- k3n128s1 a convolution layer Conv
- PReLU activation function layer
- k3n128s1 a convolution layer Conv
- k3n128s1 a convolution layer Conv
- the residual network convolution block The number can be set to 6.
- the feature fusion module 1 includes a convolutional layer Conv (k3n128s1)
- the feature fusion module 2 includes a convolutional layer Conv (k3n3s1). That is, the convolution kernel size of the convolutional layer of feature fusion module 1 is 3, the number of channels of the feature map is 128, the convolution step size is 1, and the convolution kernel size of the convolutional layer of feature fusion module 2 is 3. , The number of channels in the feature map is 3, and the convolution step size is 1.
- the image data output by the feature fusion module 2 is the output data of the detail recovery network
- the number of feature channels of the feature fusion module 2 is 3, and the number of feature fusion modules 2 is k , s and the values of k, n, and s of feature fusion module 1 are all optional.
- the detail recovery network may not include the feature fusion module 1, or may include multiple feature fusion modules 1.
- the number of feature channels of the feature fusion module 2 is 4, that is, the image output by the deep learning network contains 4 channels.
- FIG. 15 it is a structural diagram of an exemplary up-sampling convolution block provided in this embodiment of the application. Since the deep learning network shown in Figure 10 is a 2x zoom deep learning network, two up-sampling convolutional blocks are required.
- the up-sampling convolutional block includes the convolutional layer Conv(k3n256s1), the pixel shuffle layer PixelShuffler and activation Function layer (PReLU).
- PixelShufflerX2 shown in Figure 15 means that the pixel shuffle layer is a 2 times up-sampled pixel shuffle layer.
- an up-sampled convolutional block with a magnification of 4 Contains a 4-fold up-sampled pixel shuffle layer, or contains two 2-fold up-sampled pixel shuffle layers.
- the structure of the deep learning network also needs to be adjusted accordingly.
- a training data set can be formed by collecting a large number of paired low-quality input images and high-quality target images to train the network.
- the low-quality images are the input images of the deep learning network and the high-quality
- the target image is the target image processed by a deep learning network that meets the requirements.
- the training data to be constructed includes: multiple frames with noise, mosaic, low-resolution RAW images and one frame without noise, Demosaic, high-resolution color images.
- the training data to be constructed includes: multiple frames of RAW images with noise, mosaic, low resolution, and dead pixels, and one The frame is a color image with no noise, demosaic, high resolution, and no dead pixels. If you want to train a deep learning network that includes demosaicing, noise reduction, SR reconstruction, and sharpening functions, the training data to be constructed includes: multiple frames with noisy, mosaic, blurred, low-resolution RAW images and one frame without Noise, demosaic, sharp, high-resolution color images.
- the training data to be constructed includes: multiple frames with noise, mosaic, blur, dead pixels, and low resolution RAW image and a high-resolution color image with no noise, de-mosaic, sharpened, and no dead pixels.
- the constructed training data is related to the function of the deep learning network, and will not be listed here.
- the embodiments of this application provide two exemplary solutions for obtaining high-quality images: first, download a certain amount of open data sets through the Internet, and select images with very good quality from them; second, use high-quality cameras to strictly control the light source Condition, the high-quality image that meets the preset conditions is obtained by shooting.
- the preset conditions can be set correspondingly according to specific requirements. It should be understood that the high-quality image captured and output by the camera is RGB color that meets the characteristics of the human eye after processing.
- the high-quality images in the training data set can be all images obtained by the first scheme, or all images obtained by the second scheme, or the images obtained by the first scheme and the images obtained by the second scheme meet certain requirements. proportion.
- embodiments of the present application provide an exemplary solution for obtaining low-quality images.
- a series of quality reduction operations are performed on the high-quality image obtained above to obtain a low-quality input image.
- RAW images with noise, mosaic, blur, dead pixels, and low resolution
- the deep learning network is a 2x zoom network
- the high-quality image is down-sampled 2x, and the blur intensity of the Gaussian blur can be randomly selected. It should be understood that by performing the above-mentioned operations on one frame of high-quality image, one frame of low-quality image can be obtained. If multiple frames of low-quality images are to be obtained, the above-mentioned operations are performed multiple times on one frame of high-quality image.
- the training data is trained using such training data.
- the network can have the functions of demosaicing, noise reduction, SR reconstruction, dead pixel removal and sharpening at the same time. Since the low-quality input image is obtained based on high-quality image simulation, the low-quality input image and the high-quality target image are strictly aligned, which further improves the training effect of the training network.
- the trained network since multiple frames of low-quality input images are constructed independently when constructing multiple frames of low-quality input images, there are differences in noise, dead pixels, and local sharpness between different low-quality input images, so the trained network has The ability of multi-frame fusion.
- high-quality images are first obtained, and low-quality images are obtained by degrading the high-quality images. Therefore, the low-quality input images in the constructed training data are strictly aligned with the high-quality target images; further Yes, the network is trained based on the constructed training data, and the obtained deep learning network can realize a variety of image detail recovery related processing, and there are certain noises, dead pixels and local clarity between the input multi-frame low-quality images
- the deep learning network obtained by training also has the ability of multi-frame fusion. Based on the deep learning network to process the image, it can realize the functions related to image detail restoration at the same time, and convert the input RAW image into a high-resolution RGB color image with high definition, low noise and clear details.
- the processing related to detail restoration is implemented by the deep learning network, instead of using a serial processing sequence, which avoids the mutual influence between multiple processing and eliminates the conversion of low-quality RAW images into high-quality images. Errors accumulated during the process of RGB color images.
- the input is multiple frames of low-quality images
- the output is a frame of high-quality images
- the trained deep learning network also has the ability to integrate multiple frames. Based on this, When image processing is performed, multiple frames of low-quality RAW images are input, and the deep learning network can combine the effective information of multiple frames of images to further improve the quality of the output image after processing by the deep learning network.
- the loss function is an important equation used to measure the difference between the predicted value and the target value. Because it is hoped that the output of the deep neural network is as close as possible to the value that you really want to predict, you can compare the current network's predicted value with the really desired target value, and then update each layer of neural network according to the difference between the two.
- the weight vector of the network for example, if the predicted value of the network is high, adjust the weight vector to make it predict lower, and keep adjusting until the neural network can predict the really desired target value. How to compare the difference between the predicted value and the target value is defined by the loss function or the objective function.
- L1 Loss can be combined with structural similarity (Structural similarity, SSIM) and anti-Loss as the loss function.
- L2 Loss can also be combined with SSIM and Anti-Loss as the loss function.
- an adaptive matrix estimation (Adma) method may be used in the embodiment of the present application to optimize network parameters, and when the loss drops to a relatively convergent state, the training can be considered to be completed.
- the image processing framework and image processing method provided in the embodiments of the present application are applicable to multiple application scenarios or to multiple different photographing scenarios.
- Dark light scene This scene has high requirements for noise reduction, and multi-frame fusion technology is very important. Therefore, in a dark light scene, the number of frames of input images can be increased. For example, if 4 frames of images are input in a bright light scene, 6 frames, 8 frames or 9 frames of images can be input in a dark light scene.
- Zoom mode For different zoom multiples, the structure of the deep learning network is also different. Let's talk about the 4x zoom situation here. Different from the 2x zoom network structure, the 4x zoom deep learning network requires 3 up-sampling convolutional blocks. When generating training data, when processing high-quality images into low-quality images, 4 times downsampling is required. It should be understood that 4 times downsampling means that the length and width of the down-sampled image are the length of the original image. One quarter of the sum width, that is, the area of the image after downsampling is one sixteenth of the area of the original image.
- HDR scene Input a multi-frame short exposure image, try to ensure that the highlight area is not exposed, and then restore the details of the image based on the detail restoration network, especially the dark details of the image, and further, use the brightness enhancement module to restore the details of the network output
- the brightness of the image is enhanced to restore the dynamic range of the entire image, thereby achieving the HDR function.
- the input data is a multi-frame short exposure RAW image, for example, it can be 6 or 8 frames.
- some short-exposure training data needs to be added to the training data. This embodiment of the application provides a method for obtaining short-exposure training data:
- each pixel value of the first intermediate image by a number that represents the degree of exposure reduction of the reasonably exposed image. For example, when each pixel value is divided by 2, it means that the short-exposure image obtained by simulation is The exposure time of the original image with reasonable exposure is 1/2, divided by 4 means the exposure time is 1/4, and so on.
- the value of the number depends on the ratio of exposure reduction that may be selected when the image is actually captured. For example, the value can be 2, 4, 8, 16, and so on.
- the embodiment of the present application also provides a method for adaptively selecting a deep learning network. As shown in FIG. 16, the method includes:
- the deep learning network resource pool includes multiple deep learning networks with different functions
- Exemplary, deep learning networks with multiple different functions include: deep learning networks in a variety of different zoom scenarios, deep learning networks in HDR scenarios, deep learning networks in dark light scenarios, and deep learning networks in night scene mode.
- the third detail restores the network and so on.
- the multiple deep learning networks with different functions are obtained through training in advance, and are solidified or stored in the memory of the mobile terminal or the storage unit of the processor of the mobile terminal. In an optional situation, the deep learning network may also be trained in real time and continuously updated.
- deep learning networks with multiple different functions are implemented by software algorithms, and based on these software algorithms, hardware computing resources in the NPU or GPU are used to implement the processing functions of the deep learning network. It should be understood that hardware resources are also It can be a hardware resource other than the NPU or GPU.
- deep learning networks with different functions are solidified in different artificial intelligence AI engines.
- a deep learning network corresponds to an AI engine.
- the AI engine is a hardware module or a dedicated hardware circuit.
- the AI engine can share computing resources in the computing resource pool.
- the first instruction information may be selected and sent by the user based on his own needs or based on the characteristics of the current scene. For example, the user selects an applicable or preferred application scene by touching the mode selection button on the APP interface of the application program and sends it to The first indication information corresponding to the application scenario, and the first indication information is sent to the AI controller in the mobile terminal or the processor, and further, the AI controller gates or enables the corresponding AI based on the first indication information The engine or the corresponding deep learning network, or the AI controller reads the corresponding deep learning network based on the first instruction information and loads it into the processor.
- the first indication information is obtained by analyzing the characteristics of the preview image acquired by the current camera, and the characteristics of the preview image are related to the current application scenario. The characteristics are different.
- the current application scenario can be determined by analyzing the characteristics of the preview image, and the first indication information used to indicate the current application scenario is obtained.
- the AI controller obtains the first indication information from the deep learning network resource pool based on the first indication information. Choose a deep learning network suitable for the current application scenario. For example, if the characteristics of the current preview image match the dark light scene, the AI controller selects the dark light deep learning network as the target deep learning network. Further, it controls the camera to take multiple frames of reasonably exposed images as input.
- the low-light scene needs to consider the effect of noise reduction, and the number of frames of the input image needs to be appropriately increased; if the characteristics of the current preview image match the HDR scene, the AI controller selects the HDR deep learning network as the target deep learning network, and further , Control the camera to take multiple short exposure images as input.
- the multiple frames of images with different exposure times may include several images with longer exposure times and Several images with a short exposure time.
- the first indication information is carried by the input data.
- the first indication information is the zoom factor carried by the input data, and when the AI controller receives the zoom factor carried by the input data , Strobe or enable the deep learning network corresponding to the zoom factor.
- the first output image may be a target high-quality image that is finally output.
- the method further includes:
- the method further includes:
- the method further includes:
- the preprocessing includes image registration, motion compensation, channel splitting, pixel rearrangement, and the like.
- the second output image may also be sharpened.
- the most suitable deep learning network can be selected or enabled from the deep learning network resource pool according to the needs of the user or the characteristics of the input data or according to the parameters carried by the input data. , To meet the needs of different users or different scenarios to the greatest extent, and can provide the best deep learning network in different scenarios, provide the best image processing effects, optimize user experience, and improve mobile terminals or image processors Improved image processing performance to enhance competitiveness.
- An embodiment of the present application also provides a device for adaptively selecting a deep learning network.
- the device includes: a receiving interface, an artificial intelligence controller, and a deep learning network resource pool.
- the deep learning network resource pool includes multiple Functional deep learning network.
- the receiving interface is used to receive image data, instruction information or various control signals. For example, it can be used to receive the mode or scene instruction information selected by the user on the application APP interface on the display screen of the mobile terminal, or it can be used to receive the camera. The acquired image data, etc.
- the artificial intelligence AI controller is coupled with a deep learning network resource pool, and the artificial intelligence controller selects a target deep learning network corresponding to the first instruction information from the deep learning network resource pool based on the first instruction information.
- the first indication information may be indication information received from the user through the receiving interface, or may be indication information related to the scene obtained by the device performing characteristic analysis on the preview image obtained by the camera, or may be an input image
- the artificial intelligence controller may be realized by a dedicated hardware circuit, or by a general-purpose processor or CPU, or by a software module running on the processor.
- the deep learning network is implemented by an AI engine, which is a hardware module or a dedicated hardware circuit, or the deep learning network is implemented by software code or software module; when the deep learning network is implemented by software code or software module, deep learning
- the network resource pool is stored in the memory.
- the device further includes: a processor.
- the processor may be, for example, a GPU, NPU, ISP, general AP or other intelligent processor.
- the processor processes the input image based on the target deep learning network to obtain the first Output image.
- the deep learning network runs on the processor.
- the AI controller reads the target deep learning network from the deep learning network resource pool and loads it into the processor. Then the processor runs the target deep learning network to realize the function corresponding to the target deep learning network. For example, the selected target deep learning network can be loaded into the detail recovery network as shown in FIG. 17.
- the device further includes: hardware computing resources.
- the hardware computing resources include: addition, subtraction, multiplication, division, exponential operations, logarithmic operations, and size comparison, etc.
- the hardware computing resources can be multiplexed by multiple deep learning networks. use. Specifically, when the processor runs the target deep learning network, it calls the computing resources in the hardware computing resources to process the input image based on the instructions of the target deep learning network, so as to realize the functions corresponding to the target deep learning network.
- the device further includes: a preprocessing module, which is used to preprocess the initially input RAW image before the deep learning network.
- the preprocessing may include the preprocessing described in section 302.
- the preprocessing module may also analyze the characteristics of the preview image obtained by the camera, and send the characteristic signal to the AI controller, and the AI controller selects the corresponding deep learning network from the deep learning network resource pool based on the characteristic signal.
- analyzing the characteristics of the original RAW image may also be realized by a dedicated image characteristic analysis module or by a general-purpose processor.
- the device further includes: a color enhancement module and a brightness enhancement module, the color enhancement module is used for color enhancement of the first output image output by the deep learning network, and the brightness enhancement module is used for the first output output of the deep learning network The image is brightened.
- color enhancement and brightness enhancement can also be implemented by the same module, and color enhancement and brightness enhancement can be implemented by hardware modules, software modules, or software modules combined with hardware modules.
- the device further includes: a color format conversion module for converting the image into an image format supported by the display screen or a target format specified by the user.
- a color format conversion module for converting the image into an image format supported by the display screen or a target format specified by the user.
- preprocessing module the color enhancement and brightness enhancement module, and the color format conversion module may all be implemented by the processor.
- the device for adaptively selecting a deep learning network includes a deep learning network resource pool, which can select a suitable deep learning network according to the mode selected by the user, or adaptively analyze the characteristics of the input image to select a suitable deep learning network, Or select the appropriate deep learning network according to the characteristic parameters carried by the input image.
- the image can be processed based on the optimal deep learning network, and the best image processing effect can be achieved in various scenarios. User experience, improve the image processing performance of mobile terminals or image processors, and enhance competitiveness.
- the embodiment of the present application also provides a computer-readable storage medium that stores instructions in the computer-readable storage medium, and when it runs on a computer or a processor, the computer or the processor executes any one of the above methods. Or multiple steps. If each component module of the above-mentioned signal processing device is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer readable storage medium.
- the embodiments of the present application also provide a computer program product containing instructions, which when run on a computer or a processor, cause the computer or the processor to execute any of the methods provided in the embodiments of the present application.
- the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including several instructions. This allows a computer device or a processor therein to execute all or part of the steps of the methods described in the various embodiments of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Studio Devices (AREA)
Abstract
一种图像处理的方法、框架和装置,属于人工智能领域中计算机视觉技术,该方法包括:获取多帧RAW图像,对获取的多帧RAW图像进行图像对齐、通道拆分、像素重组等预处理之后,基于深度学习网络对图像进行细节恢复,并对深度学习网络输出的图像进行亮度增强和颜色增强。该方法将细节恢复相关的多种处理均融合在同一个深度学习网络中,避免了多种处理串行进行时不同处理之间的相互影响,且有效融合了多帧图像的有效信息,有助于更好提升图像处理的效果。
Description
本申请要求于2019年09月18日提交中国国家知识产权局、申请号为201910882529.3、申请名称为“一种图像处理的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及人工智能领域,尤其涉及计算机视觉技术中的一种图像处理的方法和装置。
拍照已经成为各种移动终端如手机、平板电脑、智能眼镜、穿戴式设备等最常用的功能之一,而对图像细节的还原能力、图像清晰度等可以认为是衡量拍照质量最重要的评价标准。然而,现如今移动终端设备越加轻薄化,对移动终端体积的限制更加严格,导致移动终端相机的物理器件与单反相机存在一定的差距。因此,需要通过算法对图像进行处理,在保证移动终端轻薄特性的前提下,尽可能提升图像的细节和清晰度。
通常情况下,摄像头获取的图像为未经处理的RAW图像,从RAW图像转换到红绿蓝(Red Green Blue,RGB)等可以显示的彩色图像,需要经过一系列的图像处理操作。在传统图像信号处理(Image Signal Processing,ISP)模型中,多种图像处理操作按一定的顺序依次进行,然而,由于多种图像处理操作之间是相互影响因此,采用多模块串行操作会导致错误逐步累积,降低图像的质量。
发明内容
本申请实施例提供一种图像处理的方法和装置,用于减少多模块串行操作带来的错误累积,提升图像的质量。
本申请第一方面提供了一种图像处理的方法,该方法包括:获取多帧原始RAW图像;对该多帧RAW图像进行预处理,得到第一中间图像,该预处理包括:通道拆分和像素重排列,该第一中间图像包括属于多个通道的子图像,其中,每个通道的子图像只包含一种颜色分量;基于第一深度学习网络对该第一中间图像进行处理,得到第一目标图像,该第一深度学习网络的功能包括:去马赛克DM和降噪;对该第一目标图像进行亮度增强或颜色增强中的至少一项,得到第二目标图像。
去马赛克和降噪均为与细节恢复相关的运算,而先进行去马赛克处理会影响降噪效果,先降噪会影响去马赛克的效果,本申请实施例将去马赛克和降噪均通过同一个深度学习网络来实现,避免了多种处理串行进行时不同处理的相互影响带来的错误累积,提升了图像细节恢复的效果;进一步的,本申请实施例同时输入N帧RAW图像,融合了多帧图像的有效信息,有助于更好的恢复图像细节;另一方面,在将图像输入到深度学习网络进行细节恢复之前,先对N帧图像进行通道拆分和像素重排列等预处 理,提升了深度学习网络的处理效果。
在一种可能的实施方式中,第一深度学习网络的功能还包括:超分辨率SR重建,该RAW图像具有第一分辨率,该第一目标图像具有第二分辨率,该第二分辨率大于该第一分辨率。
对于有超分辨率需求的场景,去马赛克、降噪和SR处理都是细节恢复很关键的处理,并且如果先进行DM、SR处理,会放大图像的噪声或破坏原始图像的噪声形态,影响降噪的效果;如果先降噪,降噪处理带来的细节损失将无法恢复,从而影响DM、SR等处理的效果。本申请实施例中,通过训练一个深度学习网络可以同时实现去马赛克、降噪和SR重建3种功能,且通过深度学习网络对图像进行与细节恢复相关的去马赛克、降噪和SR重建,不存在先后处理顺序,避免了由于多模块串行操作带来的不同处理间的相互影响,也避免了因此导致的错误积累。
在一种可能的实施方式中,该第一深度学习网络的功能还包括:坏点校正或相位点补偿中的至少一项。
坏点校正和相位点补偿也是与细节恢复相关的算法,本申请实施例通过同一个深度学习网络同时实现去马赛克、降噪、坏点校正和相位点补偿功能,避免了多种不同处理串行进行时不同处理的相互影响带来的错误累积,提升了图像细节恢复的效果。
在一种可能的实施方式中,该预处理还包括:坏点校正或相位点补偿中的至少一项。
由于相位点的位置基本固定,且坏点校正的算法比较成熟,因此坏点和相位点可以在产线进行标定,然后可以将坏点校正和相位点补偿放在预处理中实现,这样简化了深度学习网络的计算复杂度。
在一种可能的实施方式中,该第一深度学习网络的功能还包括:锐化。
本申请实施例通过同一个深度学习网络同时实现去马赛克、降噪、锐化、坏点校正和相位点补偿功能,避免了多种不同处理串行进行时不同处理的相互影响带来的错误累积,提升了图像细节恢复的效果。
在一种可能的实施方式中,该方法还包括:对该第二目标图像进行锐化,得到第三目标图像;将该第三目标图像发送到显示屏或者存储器。
由于亮度和颜色增强可能会影响图像边缘的锐度,锐化可以不融合在第一深度学习网络中,在亮度增强和颜色增强之后,再根据实际需求对图像进行锐化,这样可以提升图像处理的效果。
在一种可能的实施方式中,该RAW图像的格式包括:RGGB格式的Bayer图像、RYYB格式的图像以及XYZW格式的图像,其中,XYZW格式的图像表示包含四种颜色分量的图像,X、Y、Z、W各代表一种颜色分量。
在一种可能的实施方式中,RGGB格式的Bayer图像、RYYB格式的图像以及XYZW格式的图像采用Quad排列,该Quad排列的最小重复单元包括的像素个数包括:16,24或32。
在一种可能的实施方式中,该RAW图像为RYYB图像或包含4个不同颜色分量的图像,在该对该第一目标图像进行亮度增强和颜色增强,得到第二目标图像之前,该方法还包括:对该第一目标图像经过颜色转换,得到RGB彩色图像;该对该第一目标图 像进行亮度增强和颜色增强,得到第二目标图像,具体包括:对该RGB彩色图像进行亮度增强或颜色增强中的至少一项,得到该第二目标图像。
示例性的,包含4个不同颜色分量的图像包括:RGBIR图像或RGBW图像。
在一种可能的实施方式中,该第一深度学习网络的功能还包括:图像对齐。
在训练深度学习网络时,构建的训练数据为多帧存在差异的不对齐的图像,这样训练出的深度学习网络具备图像对齐的能力,对应的,在将数据输入第一深度学习网络之前,可以不预先进行图像配准和运动补偿,而将不对齐的N帧RAW图像直接输入网络,由网络自行实现多帧数据的对齐和融合。
应当理解,图像配准和运动补偿都是为了实现图像对齐。
在一种可能的实施方式中,该预处理还包括:图像对齐。
在一种可能的实施方式中,该预处理具体包括:对该多帧RAW图像进行通道拆分和像素重排列,得到分属M个通道的多帧子图像,其中,每个通道中的子图像的帧数等于该多帧RAW图像的帧数;分别对齐每个通道中的多帧子图像。
在一种可能的实施方式中,该分别对齐每个通道中的多帧子图像,具体包括:对齐第一通道中的多帧子图像,该第一通道为该M个通道中的任一个通道;基于对齐该第一通道时所使用的对齐方式对齐其他通道。
本申请实施例中,先进行通道拆分和像素重排列,然后选择一个通道进行对齐,再然后基于相同的对齐方式对其他通道进行对齐,简化了对齐图像时所需要的计算量。
示例的,通道拆分获得的通道的个数与该RAW图像的格式有关,通道的个数等于RAW图像的最小重复单元包括的像素的个数。
在一种可能的实施方式中,该亮度增强或颜色增强包括如下至少一项:黑电平校正BLC、自动白平衡AWB、镜头阴影校正LSC、色调映射Tone Mapping、颜色校正Color Mapping、对比度增加或者伽马gamma校正。
在一种可能的实施方式中,该预处理具体包括:对该多帧RAW图像进行黑电平校正BLC、自动白平衡AWB或镜头阴影校正LSC中的至少一项处理,得到多帧第一预处理后的RAW图像;对该多帧第一预处理后的RAW图像进行通道拆分和像素重排列,得到分属于M个通道的多帧子图像,其中,每个通道中的子图像的帧数与该多帧RAW图像的帧数相等;对齐每个通道中的多帧子图像。
本申请实施例先对输入的N帧RAW图像进行BLC、AWB以及LSC中的一项或多项处理,然后再进行图像配准、拆分通道和像素重排等处理,提升了深度学习网络进行图像细节恢复的效果。
在一种可能的实施方式中,该第一中间图像包括的子图像所属的通道数等于该RAW图像的最小重复单元包含的像素个数。
在一种可能的实施方式中,当该RAW图像为最小重复单元包含4个像素的红绿绿蓝RGGB格式的图像、红黄黄蓝RYYB格式的图像或XYZW格式的图像时,该第一中间图像包括属于4个通道的子图像;当该RAW图像为最小重复单元包括16个像素的Quad排列的图像时,该第一中间图像包括属于16个通道的子图像。
在一种可能的实施方式中,该预处理还包括:估计图像的噪声强度区域分布图或锐化强度图的至少一项;该第一深度学习网络具体用于实现下述的至少一项:基于该 噪声强度区域分布图控制该第一中间图像的不同区域的降噪程度;基于该锐度强化图控制该第一中间图像的不同区域的锐化强度。
本申请实施例可以根据每个区域的噪声特点有效控制每个区域的降噪强度,或者,自适应控制每个区域的锐化强度。
在一种可能的实施方式中,该第一深度学习网络包括:多个残差网络卷积模块、至少一个上采样卷积块以及第二特征融合卷积模块,该第二特征卷积模块的输出为该第一深度学习网络的输出,该第二特征融合卷积模块的特征通道数为3或4。
在一种可能的实施方式中,该上采样卷积块的个数与该RAW图像的格式、该RAW图像的尺寸和该第一目标图像的尺寸有关。
在一种可能的实施方式中,该第一深度学习网络还包括:特征提取卷积模块和第一特征融合模块,该多个残差网络卷积模块的输出为该第一特征融合模块的输入。
在一种可能的实施方式中,该第一深度学习网络的训练数据包括:多帧低质量输入图像和一帧高质量目标图像,该低质量输入图像基于该高质量目标图像模拟得到。
在一种可能的实施方式中,对该高质量目标图像至少进行马赛克和添加噪声处理,得到该低质量输入图像。
在一种可能的实施方式中,该方法应用于如下场景:暗光场景、变焦模式、高动态范围HDR场景和夜景模式。
在一种可能的实施方式中,当该方法应用于HDR场景时,该多帧RAW图像为多帧短曝光的RAW图像,该第一深度学习网络的训练数据包括多帧短曝光训练图像,该短曝光训练图像根据如下方法获得:对曝光合理的高质量图像进行反向Gamma校正,得到反向伽马校正图像;将该反向伽马校正图像的每个像素值均除以一个数字,得到该短曝光训练图像。
在一种可能的实施方式中,当该方法应用于暗光场景时,增加输入的该RAW图像的帧数;当该方法应用于变焦模式时,该第一深度学习网络中的上采样卷积块的个数与变焦倍数有关。
在一种可能的实施方式中,该第一深度学习网络为根据第一指示信息从深度学习网络资源池中选择的目标深度学习网络,该第一指示信息为用户在应用程序APP界面上选择的与应用场景相关的指示信息;或者,该第一指示信息为分析摄像头获取的预览图像的特性得到的与应用场景相关的指示信息;或者,该第一指示信息为输入该多帧RAW图像携带的倍率信息。
本申请第二方面提供了一种图像处理的方法,该方法包括:基于第一指示信息从深度学习网络资源池中选择目标深度学习网络,该深度学习网络资源池包括多种不同功能的深度学习网络;基于该目标深度学习网络对输入的数据进行处理,得到第一输出图像。
在一种可能的实施方式中,该第一指示信息为用户在应用程序APP界面上选择的与应用场景相关的指示信息;或者,该第一指示信息为分析摄像头获取的预览图像的特性得到的与应用场景相关的指示信息;或者,该第一指示信息为输入该多帧RAW图像携带的倍率信息。
在一种可能的实施方式中,该深度学习网络资源池中的深度学习网络均包括以下 图像处理功能中的至少两种:去马赛克、降噪、超分辨率SR重建、坏点去除、相位点补偿和锐化。
在一种可能的实施方式中,该深度学习网络资源池中的深度学习网络适用的应用场景包括:不同倍率的变焦场景、HDR场景、暗光场景或夜景模式。
在一种可能的实施方式中,当该方法应用于HDR场景时,该多帧RAW图像为多帧短曝光的RAW图像,该目标深度学习网络的训练数据包括多帧短曝光训练图像,该短曝光训练图像根据如下方法获得:对曝光合理的高质量图像进行反向Gamma校正,得到反向伽马校正图像;将该反向伽马校正图像的每个像素值均除以一个数字,得到该短曝光训练图像。
在一种可能的实施方式中,当该方法应用于暗光场景时,增加输入的该RAW图像的帧数;当该方法应用于变焦模式时,该目标深度学习网络中的上采样卷积块的个数与变焦倍数有关。
本申请第三方面提供了一种图像处理的装置,该装置包括:预处理模块,用于对多帧RAW图像进行预处理,得到第一中间图像,该预处理包括:通道拆分和像素重排列,该第一中间图像包括属于多个通道的子图像,其中,每个通道的子图像只包含一种颜色分量;第一深度学习网络,用于对该第一中间图像进行处理,得到第一目标图像,该第一深度学习网络的功能包括:去马赛克DM和降噪;增强模块,用于对该第一目标图像进行亮度增强或颜色增强中的至少一项,得到第二目标图像。
在一种可能的实施方式中,该第一深度学习网络的功能还包括:超分辨率SR重建,该RAW图像具有第一分辨率,该第一目标图像具有第二分辨率,该第二分辨率大于该第一分辨率。
对于有超分辨率需求的场景,去马赛克、降噪和SR处理都是细节恢复很关键的处理,并且如果先进行DM、SR处理,会放大图像的噪声或破坏原始图像的噪声形态,影响降噪的效果;如果先降噪,降噪处理带来的细节损失将无法恢复,从而影响DM、SR等处理的效果。本申请实施例中,通过训练一个深度学习网络可以同时实现去马赛克、降噪和SR重建3种功能,且通过深度学习网络对图像进行与细节恢复相关的去马赛克、降噪和SR重建,不存在先后处理顺序,避免了由于多模块串行操作带来的不同处理间的相互影响,也避免了因此导致的错误积累。
在一种可能的实施方式中,该第一深度学习网络的功能还包括:坏点校正或相位点补偿中的至少一项;或者,该预处理还包括:坏点校正或相位点补偿中的至少一项。
在一种可能的实施方式中,该第一深度学习网络的功能还包括:锐化。
在一种可能的实施方式中,该装置还包括:锐化模块,用于对该第二目标图像进行锐化,得到第三目标图像;发送接口,用于将该第三目标图像发送到显示屏或者存储器。
在一种可能的实施方式中,该RAW图像为RYYB图像或包含4个不同颜色分量的图像,该装置还包括:颜色转换模块,用于对该第一目标图像经过颜色转换,得到RGB彩色图像;该增强模块,具体用于对该RGB彩色图像进行亮度增强或颜色增强中的至少一项,得到该第二目标图像。
在一种可能的实施方式中,该第一深度学习网络的功能还包括:图像对齐,或者, 该预处理还包括:图像对齐。
在一种可能的实施方式中,该预处理还包括图像对齐,该预处理模块,具体用于:对该多帧RAW图像进行通道拆分和像素重排列,得到分属M个通道的多帧子图像,其中,每个通道中的子图像的帧数等于该多帧RAW图像的帧数;对齐第一通道中的多帧子图像,该第一通道为该M个通道中的任一个通道;基于对齐该第一通道时所使用的对齐方式对齐其他通道。
在一种可能的实施方式中,该增强模块具体用于实现如下至少一项:黑电平校正BLC、自动白平衡AWB、镜头阴影校正LSC、色调映射Tone Mapping、颜色校正Color Mapping、对比度增加或者伽马gamma校正。
在一种可能的实施方式中,该预处理模块,具体用于:对该多帧RAW图像进行黑电平校正BLC、自动白平衡AWB或镜头阴影校正LSC中的至少一项处理,得到多帧第一预处理后的RAW图像;对该多帧第一预处理后的RAW图像进行通道拆分和像素重排列,得到分属于M个通道的多帧子图像,其中,每个通道中的子图像的帧数与该多帧RAW图像的帧数相等;对齐每个通道中的多帧子图像。
在一种可能的实施方式中,该RAW图像的格式包括:RGGB格式的Bayer图像、RYYB格式的图像以及XYZW格式的图像,其中,XYZW格式的图像表示包含四种颜色分量的图像,X、Y、Z、W各代表一种颜色分量。
在一种可能的实施方式中,RGGB格式的Bayer图像、RYYB格式的图像以及XYZW格式的图像采用Quad排列,该Quad排列的最小重复单元包括的像素个数包括:16,24或32。
在一种可能的实施方式中,该第一中间图像包括的子图像所属的通道数等于该RAW图像的最小重复单元包含的像素个数。
在一种可能的实施方式中,当该RAW图像为最小重复单元包含4个像素的红绿绿蓝RGGB格式的图像、红黄黄蓝RYYB格式的图像或XYZW格式的图像时,该第一中间图像包括属于4个通道的子图像;当该RAW图像为最小重复单元包括16个像素的Quad排列的图像时,该第一中间图像包括属于16个通道的子图像;其中,XYZW图像表示包含四种颜色分量的图像,X、Y、Z、W各代表一种颜色分量。
在一种可能的实施方式中,该预处理模块还用于:估计图像的噪声强度区域分布图或锐化强度图的至少一项;该第一深度学习网络具体用于实现下述的至少一项:基于该噪声强度区域分布图控制该第一中间图像的不同区域的降噪程度;基于该锐度强化图控制该第一中间图像的不同区域的锐化强度。
在一种可能的实施方式中,该第一深度学习网络包括:多个残差网络卷积模块、至少一个上采样卷积块以及第二特征融合卷积模块,该第二特征卷积模块的输出为该第一深度学习网络的输出,该第二特征融合卷积模块的特征通道数为3或4。
在一种可能的实施方式中,当该装置应用于HDR场景时,该多帧RAW图像为多帧短曝光的RAW图像;当该装置应用于暗光场景时,增加输入的该RAW图像的帧数;当该装置应用于变焦模式时,该第一深度学习网络中的上采样卷积块的个数与变焦倍数有关。
在一种可能的实施方式中,该装置还包括深度学习网络资源池,该深度学习网络 资源池中包括多种不同功能的深度学习网络。
在一种可能的实施方式中,该第一深度学习网络为根据第一指示信息从深度学习网络资源池中选择的目标深度学习网络,该第一指示信息为用户在应用程序APP界面上选择的与应用场景相关的指示信息;或者,该第一指示信息为分析摄像头获取的预览图像的特性得到的与应用场景相关的指示信息;或者,该第一指示信息为输入该多帧RAW图像携带的倍率信息。
本申请第四方面提供了一种深度学习网络训练的方法,其特征在于,该方法包括:获取训练数据,该训练数据包括多帧独立的低质量输入数据和一帧高质量目标数据,该低质量输入数据基于该高质量目标数据模拟得到;基于该训练数据对基础网络架构进行训练,得到具有目标功能的深度学习网络,该目标功能与该低质量输入数据和该高质量目标数据的差异有关。
在一种可能的实施方式中,获取训练数据包括:采用人工合成的方法获取该训练数据。
在一种可能的实施方式中,获取训练数据包括:通过网络下载开放数据集;从该开放数据集中选择高质量的图像作为高质量目标图像;或者,将利用高质量相机拍摄符合预设条件的高质量图像,该预设条件根据用户需求对应性设置;对该高质量图像进行反向Gamma校正,得到反向Gamma校正后的高质量图像;对该反向Gamma校正后的高质量图像进行下采样,得到高质量目标图像。
在一种可能的实施方式中,获取训练数据包括:对获取的高质量目标图像进行降质量操作,得到该低质量输入图像。
在一种可能的实施方式中,对获取的高质量目标图像进行降质量操作,包括:对该获取的高质量目标图像进行下采样、高斯模糊、添加噪声、马赛克处理加相位点或加坏点中的至少一项处理。
在一种可能的实施方式中,该降质量操作与该深度学习网络的目标功能有关。
在一种可能的实施方式中,当该深度学习网络的功能包括:去马赛克、降噪和SR重建功能时,获取训练数据包括:对获取的高质量目标图像进行下采样、添加噪声和马赛克处理,得到该低质量输入图像。
在一种可能的实施方式中,当该深度学习网络的功能包括:去马赛克、降噪、SR重建和锐化功能时,获取训练数据包括:对获取的高质量目标图像进行下采样、高斯模糊、添加噪声和马赛克处理,得到该低质量输入图像。
在一种可能的实施方式中,当该深度学习网络的功能包括:去马赛克、降噪、SR重建、锐化和坏点去除功能时,获取训练数据包括:对获取的高质量目标图像进行下采样、高斯模糊、添加噪声、马赛克处理和加坏点,得到该低质量输入图像。
在一种可能的实施方式中,多帧低质量输入图像是基于同一帧高质量目标图像分别进行降质量操作得到的,该多帧低质量输入图像是独立构建的。
在一种可能的实施方式中,该深度学习网络的损失函数包括L1 Loss或L2 Loss函数,或者,L1 Loss与结构相似性(Structural similarity,SSIM)和对抗Loss相结合,或者,L2 Loss与SSIM和对抗Loss相结合。
在一种可能的实施方式中,该深度学习网络的训练方法包括自适应矩阵估计 (adaptive moment estimation,Adma)方法。
本申请第五方面提供了一种自适应选择深度学习网络的装置,该装置包括:接收接口、人工智能AI控制器和深度学习网络资源池,该深度学习网络资源池包括多种功能的深度学习网络;该接受接口,用于第一指示信息,该第一指示信息用于指示当前适用的应用场景;该人工智能控制器,用于基于第一指示信息从深度学习网络资源池中选择与第一指示信息对应的目标深度学习网络。
在一种可能的实施方式中,该装置还包括:处理器,用于基于目标深度学习网络对输入图像进行处理,得到第一输出图像。
在一种可能的实施方式中,该第一指示信息为用户在应用程序APP界面上选择的与应用场景相关的指示信息;或者,该第一指示信息为分析摄像头获取的预览图像的特性得到的与应用场景相关的指示信息;或者,该第一指示信息为输入该多帧RAW图像携带的倍率信息。
本申请实施例提供的自适应选择深度学习网络的方法中,可以根据用户的需求或者输入数据的特性或者根据输入数据携带的参数从深度学习网络资源池中选择或使能最合适的深度学习网络,最大程度满足不同用户或不同场景的需求,并且可以做到在不同的场景下均可以提供最优的深度学习网络,提供最好的图像处理效果,优化用户体验,提升移动终端或图像处理器的图像处理性能,增强竞争力。
在一种可能的实施方式中,该接受接口还用于:接收输入图像或控制信号。
在一种可能的实施方式中,该深度学习网络资源池中的深度学习网络均包括以下图像处理功能中的至少两种:去马赛克、降噪、超分辨率SR重建、坏点去除、相位点补偿或锐化。
在一种可能的实施方式中,该深度学习网络资源池中的深度学习网络适用的应用场景包括:不同倍率的变焦场景、HDR场景、暗光场景或夜景模式。
在一种可能的实施方式中,该深度学习网络资源池中的深度学习网络以软件代码或者软件模块实现,该深度学习网络资源池存储在存储器中。
在一种可能的实施方式中,AI控制器基于该第一指示信息从深度学习网络资源池中将该目标深度学习网络读出并加载到处理器中;该处理器运行该目标深度学习网络以实现该目标深度学习网络对应的功能。
在一种可能的实施方式中,该深度学习网络由人工智能AI引擎实现,该AI引擎为硬件模块或专用硬件电路。
在一种可能的实施方式中,该装置还包括:硬件计算资源,该硬件计算资源包括:加、减、乘、除、指数运算、对数运算或大小比较中的至少一项。
在一种可能的实施方式中,该硬件计算资源可以被多个深度学习网络复用。
在一种可能的实施方式中,该装置还包括预处理模块,用于对初始输入的RAW图像进行通道拆分和像素重排列,得到分属多个通道的子图像,每个通道的子图像只包含一种颜色分量。
在一种可能的实施方式中,该预处理模块还用于:分析摄像头获取的预览图像的特性,并将特性信号发送给AI控制器。
在一种可能的实施方式中,当该装置应用于HDR场景时,该接收接口用于:获取 多帧短曝光的RAW图像,该目标深度学习网络的训练数据包括多帧短曝光训练图像,该短曝光训练图像根据如下方法获得:对曝光合理的高质量图像进行反向Gamma校正,得到反向伽马校正图像;将该反向伽马校正图像的每个像素值均除以一个数字,得到该短曝光训练图像。
在一种可能的实施方式中,当该装置应用于暗光场景时,增加输入的RAW图像的帧数;当该方法应用于变焦模式时,该目标深度学习网络中的上采样卷积块的个数与变焦倍数有关。
本申请第六方面提供了一种图像处理的装置,该装置包括:接收接口和处理器,该处理器上运行有第一深度学习网络,该第一深度学习网络的功能包括:去马赛克DM和降噪;该接收接口,用于接收摄像头获取的多帧RAW图像;该处理器,用于调用存储器中存储的软件代码,以执行如第一方面或者其中任一种可能的实施方式中的方法。
本申请第七方面提供了一种图像处理的装置,该装置包括:接收接口和处理器,该接收接口用于,获取第一指示信息;该处理器用于调用存储在存储器中的软件代码,以执行如第二方面或其中任一种可能的实施方式中的方法。
在一种可能的实施方式中,该装置还包括存储器,用于存储深度学习网络资源池。
本申请第八方面提供了一种一种图像处理的装置,该装置包括:接收接口和处理器,该接收接口用于获取训练数据,该训练数据包括多帧独立的低质量输入数据和一帧高质量目标数据,该低质量输入数据基于该高质量目标数据模拟得到;该处理器用于调用存储在存储器中的软件代码,以执行如第四方面或其中任一种可能的实施方式中的方法。
本申请第九方面提供了一种计算机可读存储介质,该方法包括:该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得该计算机或处理器执行如上述第一方面或者其任一种可能的实施方式中的方法。
本申请第十方面提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得该计算机或处理器执行如上述第二方面或者其任一种可能的实施方式中的方法。
本申请第十一方面提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得该计算机或处理器执行如上述第四方面或者其任一种可能的实施方式中的方法。
本申请第十二方面提供了一种包含指令的计算机程序产品,当其在计算机或处理器上运行时,使得该计算机或处理器执行如上述第一方面或者其任一种可能的实施方式中的方法。
本申请第十三方面提供了一种包含指令的计算机程序产品,当其在计算机或处理器上运行时,使得该计算机或处理器执行如上述第二方面或者其任一种可能的实施方式中的方法。
本申请第十四方面提供了一种包含指令的计算机程序产品,当其在计算机或处理器上运行时,使得该计算机或处理器执行如上述第四方面或者其任一种可能的实施方式中的方法。
图1为本申请实施例提供的一种示例性的终端的架构示意图;
图2为本申请实施例提供的一种示例性的图像处理装置的硬件架构图;
图3为本申请实施例提供的一种示例性图像处理的方法的流程示意图;
图4a为本申请实施例提供的一种示例性的RGGB格式的Bayer图像;
图4b为本申请实施例提供的一种示例性的RGBIR图像;
图5为本申请实施例提供的一种示例性的Quad排列的图像;
图6a为本申请实施例提供的一种示例性的对RGGB格式的Bayer图像进行通道拆分和像素重排列得到第一中间图像的示意图;
图6b为本申请实施例提供的一种示例性的对Quad排列的图像进行通道拆分和像素重排列得到第一中间图像的示意图;
图7为本申请实施例提供的一种示例性的图像处理的框架;
图8为本申请实施例提供的另一种示例性的图像处理的框架;
图9为本申请实施例提供的另一种示例性的图像处理的框架;
图10为本申请实施例提供的一种示例性的深度学习网络的结构示意图;
图11为本申请实施例提供的一种示例性的细节恢复网络的处理效果示意图;
图12为本申请实施例提供的一种示例性的特征提取卷积块的结构图;
图13为本申请实施例提供的一种示例性的残差网络卷积块的结构图;
图14a为本申请实施例提供的一种示例性的特征融合模块1的结构图;
图14b为本申请实施例提供的一种示例性的特征融合模块2的结构图;
图15为本申请实施例提供的一种示例性的上采样卷积块的结构图;
图16为本申请实施例提供的一种示例性的自适应选择深度学习网络的方法流程图;
图17为本申请实施例提供的一种示例性的自适应选择深度学习网络的装置。
本申请的说明书实施例和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
由于手机摄像头得到的图像是RAW图像,通常需要转换为RGB彩色图像才可以显示在显示设备上,应当理解,最终显示在显示设备上的图像格式也可以为其他图像格式,例如YUV彩色图像、YCbCr彩色图像或者灰度图像等,本申请实施例以最终显示在显示设备上的图像为RGB图像为例进行说明。将RAW图像转换为RGB图像需要经过细节恢复、颜色恢复和亮度恢复等一系列的图像处理操作。其中,与细节恢复相关的处理包括:去马赛克(Demosaicking,DM)、坏点校正、降噪、锐化以及超分辨率(super resolution,SR)重建等,应当理解,在用户有变焦的需求的时候才需要进行SR重建处理。然而,DM、坏点校正、SR重建等操作通常需要进行像素的填充或插值,锐化需要强化、突出图像的边缘和纹理,如果先进行DM、坏点校正、SR重建等处理,会放大图像的噪声或破坏原始图像的噪声形态,影响降噪的效果;如果先降噪,降噪处理带来的细节损失将无法恢复,从而影响DM、坏点校正、SR重建等处理的效果。因此,采用多模块串行操作会导致错误逐步积累。
基于此,本申请实施例提出一种基于深度学习的图像处理的框架、方法和装置,将多种与细节恢复相关的处理融合在一个深度学习网络中,通过一个深度学习网络可实现多种图像处理的功能,从而减少不同图像处理之间的相互影响,并减少错误的累积。示例性的,可以将去马赛克、降噪和超分辨率重建等处理融合在一个深度学习网络中,可选的,还可以将坏点校正、锐化等处理也融合在该深度学习网络中。本申请实施例提供的图像处理框架极大提升了图像的分辨率、清晰度和视觉效果,同时抑制了摩尔纹、光晕以及过冲等现象,适用于变焦、高动态范围(high dynamic range,HDR)和夜景模式等各种拍照场景。进一步的,本申请实施例将多帧连续图像同时作为输入,融合多帧图像的有效信息,更好的恢复图像细节。
本申请实施例提供的图像处理框架和图像处理方法适用于各种终端,对应的,本申请实施例提供的图像处理装置可以为多种形态的终端产品,如智能手机、平板电脑、智能眼镜、穿戴式设备、照相机和摄像机等,如图1所示,为本申请实施例提供的一种示例性的终端100的架构示意图。该终端100可以包括天线系统110、射频(Radio Frequency,RF)电路120、处理器130、存储器140、摄像头150、音频电路160、显示屏170、一个或多个传感器180和无线收发器190等。
天线系统110可以是一个或多个天线,还可以是由多个天线组成的天线阵列。射频电路120可以包括一个或多个模拟射频收发器,该射频电路120还可以包括一个或多个数字射频收发器,该RF电路120耦合到天线系统110。应当理解,本申请的各个实施例中,耦合是指通过特定方式的相互联系,包括直接相连或者通过其他设备间接相连,例如可以通过各类接口、传输线、总线等相连。该射频电路120可用于各类蜂窝无线通信。
处理器130可包括通信处理器,该通信处理器可用来控制RF电路120通过天线系统110实现信号的接收和发送,该信号可以是语音信号、媒体信号或控制信号。该处理器130可以包括各种通用处理设备,例如可以是通用中央处理器(Central Processing Unit,CPU)、片上系统(System on Chip,SOC)、集成在SOC上的处理器、单独的处理器芯片或控制器等;该处理器130还可以包括专用处理设备,例如专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或数字信号处理器(Digital Signal Processor,DSP)、专用的视频或图形处理器、图形处理单元(Graphics Processing Unit,GPU)以及神经网络处理单元(Neural-network Processing Unit,NPU)等。该处理器130可以是多个处理器构成的处理器组,多个处理器之间通过一个或多个总线彼此耦合。该处理器可以包括模拟-数字转换器(Analog-to-Digital Converter,ADC)、数字-模拟转换器(Digital-to-Analog Converter,DAC)以实现装置不同部件之间信号的连接。处理器130用于实现图像、音频和视频等媒体信号的处理。
存储器140耦合到处理器130,具体的,该存储器140可以通过一个或多个存储器控制器耦合到处理器130。存储器140可以用于存储计算机程序指令,包括计算机操作系统(Operation System,OS)和各种用户应用程序,存储器140还可以用于存储用户数据,例如日历信息、联系人信息、获取的图像信息、音频信息或其他媒体文件等。处理器130可以从存储器140读取计算机程序指令或用户数据,或者向存储器140存入计算机程序指令或用户数据,以实现相关的处理功能。该存储器140可以是非掉电易失性存储器,例如是EMMC(Embedded Multi Media Card,嵌入式多媒体卡)、UFS(Universal Flash Storage,通用闪存存储)或只读存储器(Read-Only Memory,ROM),或者是可存储静态信息和指令的其他类型的静态存储设备,还可以是掉电易失性存储器(volatile memory),例如随机存取存储器(Random Access Memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的程序代码并能够由计算机存取的任何其他计算机可读存储介质,但不限于此。存储器140可以是独立存在,存储器140也可以和处理器130集成在一起。
摄像头150用于采集图像或视频,可以通过应用程序指令触发开启,实现拍照或者摄像功能,如拍摄获取任意场景的图片或视频。摄像头可以包括成像镜头,滤光片,图像传感器等部件。物体发出或反射的光线进入成像镜头,通过滤光片,最终汇聚在图像传感器上。成像镜头主要是用于对拍照视角中的所有物体(也可称为待拍摄场景、目标场景,也可以理解为用户期待拍摄的场景图像)发出或反射的光汇聚成像;滤光片主要是用于将光线中的多余光波(例如除可见光外的光波,如红外)滤去;图像传感器主要是用于对接收到的光信号进行光电转换,转换成电信号,并输入到处理器130进行后续处理。其中,摄像头可以位于终端设备的前面,也可以位于终端设备的背面,摄像头具体个数以及排布方式可以根据设计者或厂商策略的需求灵活确定,本申请不做限定。
音频电路160与处理器130耦合。该音频电路160可以包括麦克风161和扬声器162,麦克风161可以从外界接收声音输入,扬声器162可以实现音频数据的播放。应当理解,该终端100可以有一个或多个麦克风、一个或多个耳机,本申请实施例对麦克风和耳机的数量不做限定。
显示屏170,用于显示由用户输入的信息,提供给用户的信息的各种菜单,这些 菜单与内部的具体模块或功能相关联,显示屏170还可以接受用户输入,例如接受使能或禁用等控制信息。具体的,显示屏170可以包括显示面板171和触控面板172。其中,显示面板171可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)、发光二级管(Light Emitting Diode,LED)显示设备或阴极射线管(Cathode Ray Tube,CRT)等来配置显示面板。触控面板172,也称为触摸屏、触敏屏等,可收集用户在其上或附近的接触或者非接触操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板172上或在触控面板172附近的操作,也可以包括体感操作;该操作包括单点控制操作、多点控制操作等操作类型。),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板172可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成处理器130能够处理的信息,再送给处理器130,并能接收处理器130发来的命令并加以执行。进一步的,触控面板172可覆盖显示面板171,用户可以根据显示面板171显示的内容,该显示内容包括但不限于,软键盘、虚拟鼠标、虚拟按键、图标等,在显示面板171上覆盖的触控面板172上或者附近进行操作,触控面板172检测到在其上或附近的操作后,通过I/O子系统10传送给处理器130以确定用户输入,随后处理器130根据用户输入通过I/O子系统10在显示面板171上提供相应的视觉输出。虽然在图1中,触控面板172与显示面板171是作为两个独立的部件来实现终端100的输入和输入功能,但是在某些实施例中,可以将触控面板172与显示面板171集成而实现终端100的输入和输出功能。
传感器180可以包括图像传感器、运动传感器、接近度传感器、环境噪声传感器、声音传感器、加速度计、温度传感器、陀螺仪或者其他类型的传感器,以及它们的各种形式的组合。处理器130通过I/O子系统10中的传感器控制器12驱动传感器180接收音频信号、图像信号、运动信息等各种信息,传感器180将接收的信息传到处理器130中进行处理。
无线收发器190,该无线收发器190可以向其他设备提供无线连接能力,其他设备可以是无线耳麦、蓝牙耳机、无线鼠标、无线键盘等外围设备,也可以是无线网络,例如无线保真(Wireless Fidelity,WiFi)网络、无线个人局域网络(Wireless Personal Area Network,WPAN)或者其他无线局域网络(Wireless Local Area Network,WLAN)等。无线收发器190可以是蓝牙兼容的收发器,用于将处理器130以无线方式耦合到蓝牙耳机、无线鼠标等外围设备,该无线收发器190也可以是WiFi兼容的收发器,用于将处理器130以无线方式耦合到无线网络或其他设备。
终端100还可以包括其他输入设备14,其耦合到处理器130以接收各种用户输入,例如接收输入的号码、姓名、地址以及媒体选择等,其他输入设备14可以包括键盘、物理按钮(按压按钮、摇臂按钮等)、拨号盘、滑动开关、操纵杆、点击滚轮、光鼠(光鼠是不显示可视输出的触摸敏感表面,或者是由触摸屏形成的触摸敏感表面的延伸)等。
终端100还可以包括上述的I/O子系统10,该I/O子系统10可以包括其他输入设备控制器11用于从其他输入设备14接收信号或者向其他输入设备190发送处理器130 的控制或驱动信息,I/O子系统10还可以包括上述的传感器控制器12和显示器控制器13,分别用于实现传感器180和显示屏170与处理器130之间的数据和控制信息的交换。
终端100还可以包括电源101,以向终端100的包括110-190在内的其他部件供电,该电源可以是可充电的或不可充电的锂离子电池或镍氢电池。进一步的,当电源101是可充电的电池时,可以通过电源管理系统与处理器130耦合,从而通过电源管理系统实现管理充电、放电、以及功耗调整等功能。
应当理解,图1中的终端100仅仅是一种示例,对终端100的具体形态不构成限定,终端100还可以包括图1中未显示出来的现有的或者将来可能增加的其他组成部分。
在一种可选的方案中,RF电路120、处理器130和存储器140可以部分或全部集成在一个芯片上,也可以是三个彼此独立的芯片。RF电路120、处理器130和存储器140可以包括布置在印刷电路板(Printed Circuit Board,PCB)上的一个或多个集成电路。
如图2所示,为本申请实施例提供的一种示例性的图像处理装置的硬件架构图,该图像处理装置200例如可以为处理器芯片,示例性的,图2中所示的硬件架构图可以是图1中的处理器130的示例性架构图,本申请实施例提供的图像处理方法和图像处理框架可以应用在该处理器芯片上。
参考图2,该装置200包括:至少一个CPU,存储器、微控制器(Microcontroller Unit,MCU)、GPU、NPU、内存总线、接收接口和发送接口等。虽然图2中未示出,该装置200还可以包括应用处理器(Application Processor,AP),解码器以及专用的视频或图像处理器。
装置200的上述各个部分通过连接器相耦合,示例性的,连接器包括各类接口、传输线或总线等,这些接口通常是电性通信接口,但是也可能是机械接口或其它形式的接口,本实施例对此不做限定。
可选的,CPU可以是一个单核(single-CPU)处理器或多核(multi-CPU)处理器;可选的,CPU可以是多个处理器构成的处理器组,多个处理器之间通过一个或多个总线彼此耦合。该接收接口可以为处理器芯片的数据输入的接口,在一种可选的情况下,该接收接口和发送接口可以是高清晰度多媒体接口(High Definition Multimedia Interface,HDMI)、V-By-One接口、嵌入式显示端口(Embedded Display Port,eDP)、移动产业处理器接口(Mobile Industry Processor Interface,MIPI)或Display Port(DP)等。该存储器可以参考前述对于存储器140部分的描述。
在一种可选的情况中,上述各部分集成在同一个芯片上;在另一种可选的情况中,CPU、GPU、解码器、接收接口以及发送接口集成在一个芯片上,该芯片内部的各部分通过总线访问外部的存储器。专用视频/图形处理器可以与CPU集成在同一个芯片上,也可以作为单独的处理器芯片存在,例如专用视频/图形处理器可以为专用ISP。在一种可选的情况中,NPU也可以作为独立的处理器芯片。该NPU用于实现各种神经网络或者深度学习的相关运算。可选的,本申请实施例提供的图像处理方法和图像处理框架可以由GPU或NPU实现,也可以由专门的图形处理器来实现。
在本申请实施例中涉及的芯片是以集成电路工艺制造在同一个半导体衬底上的系统,也叫半导体芯片,其可以是利用集成电路工艺制作在衬底(通常是例如硅一类的半导体材料)上形成的集成电路的集合,其外层通常被半导体封装材料封装。所述集成电路可以包括各类功能器件,每一类功能器件包括逻辑门电路、金属氧化物半导体(Metal-Oxide-Semiconductor,MOS)晶体管、双极晶体管或二极管等晶体管,也可包括电容、电阻或电感等其他部件。每个功能器件可以独立工作或者在必要的驱动软件的作用下工作,可以实现通信、运算、或存储等各类功能。
如图3所示,为本申请实施例提供的一种图像处理的方法的流程示意图。
该图像处理的方法包括:
301、获取N帧RAW图像;
RAW图像为通过摄像头获取的未经处理的原始图像,RAW图像每个像素只表征一种颜色的强度,示例性的,该摄像头可以为互补式金属氧化物半导体(Complementary Metal-Oxide Semiconductor,CMOS)传感器或感光耦合元件(Charge-Coupled Device,CCD)传感器,RAW图像的颜色格式是由放置在传感器前的彩色滤波器(color filter array,CFA)决定的,RAW图像可以为各种CFA格式下获取的图像,示例性、RAW图像可以为RGGB格式的贝叶尔Bayer图像,如图4a所示为RGGB格式的Bayer图像,图4a中每个格代表一个像素,R表示红色像素,G表示绿色像素,B表示蓝色像素,该Bayer图像的最小重复单元为一个2X2的阵列,该2X2的阵列单元内包含了R、G、G、B 4个像素。可选的,RAW图像还可以是红黄黄蓝(red yellow yellow blue,RYYB)格式的图像,或者XYZW格式的图像,其中,XYZW格式表示包含4种分量的图像格式,X、Y、Z、W各代表一种分量,例如红绿蓝红外(Red Green Blue Infrared,RGBIR)排列的Bayer图像,或者,例如红绿蓝白(Red Green Blue White,RGBW)排列的Bayer图像,如图4b所示为一种示例性的RGBIR图像。RAW图像还可以是如图5所示的Quad排列的图像。输入的RAW图像的长和宽分别为h和w,N为正整数,示例性的,N可以取4或者6等。可选的,该N帧图像为连续获取的N帧图像,连续获取该N帧图像之间的时间间隔可以相等也可以不相等,可选的,该N帧图像也可以不是连续的,例如可以是连续获取的多帧图像中的第1、3、5、7帧图像。
应当理解,如果图像处理的执行主体是如图2所示的处理器芯片时,可以是通过接收接口获取该RAW图像,该RAW图像为终端的摄像头拍摄得到的;如果图像处理的执行主体是如图1所示的终端时,可以是通过摄像头150获取该RAW图像。
302、对输入的N帧RAW图像进行预处理,得到第一中间图像。
示例性的,该预处理包括通道拆分和像素重排列,该第一中间图像包括属于多个通道的子图像,其中,每个子图像只包含一种颜色分量。RGGB格式如图6a所示,为对RGGB格式的Bayer图像进行通道拆分和像素重排列得到第一中间图像的示意图,RGGB格式的Bayer图像的最小重复单元中包括R、G、G、B四个像素,将RAW图中每个最小重复单元中的4个像素R、G、G、B拆分并各自重新排列得到四个不同的子图,一帧w*h的RAW图像被拆分成四帧w/2*h/2的子图,N帧w*h的RAW图像被拆分成4*N帧w/2*h/2的子图像。也即,当输入的RAW图像为N帧RGGB格式的Bayer图像时,第一中间图像包括分属于4个通道的4*N帧w/2*h/2的子图像,其中, 每个通道包含N帧子图像,每帧子图像只包含一种颜色分量,具体的,该4*N帧子图像包括N帧属于第一通道的R子图像、N帧属于第二通道的G子图像、N帧属于第三通道的G子图像以及N帧属于第四通道的B子图像。应当理解,当输入的RAW图像为RYYB图像或者XYZW图像时,第一中间图像也包括分属于4个通道的子图像,如果输入的RAW图像的帧数为N,则第一中间图像包含的子图像的个数为4*N帧,每个通道包含的子图像的个数等于RAW图像的帧数等于N。如图6b所示,为对Quad排列的图像进行通道拆分和像素重排列得到第一中间图像的示意图,Quad排列的图像的最小重复单元包括R、G、G、B像素各4个共16个像素,对一帧w*h的Quad排列的图像进行通道拆分和像素重排列之后得到16帧w/4*h/4的子图像,其中,一帧子图像属于一个通道,N帧Quad排列的图像被拆分成16*N帧子图像。也即,当输入的RAW图像为N帧最小重复单元包括16个像素的Quad排列的图像时,第一中间图像包括分属于16个通道的16*N帧子图像,其中,每个通道包含N帧子图像,每帧子图像中只包含一种颜色分量。在一种可选的方案中,Quad排列的图像的最小重复单元中包括的R、G、G、B像素的个数也可以各为6个、8个或其他个数,对应的,第一中间图像包括分属于24个通道的子图像,或者分属于32个通道的子图像。应当理解,第一中间图像的通道数等于RAW图像的最小重复单元所包含的像素数。
示例性的,该预处理还可以包含图像配准和运动补偿。图像配准可以去除由于相机运动导致的多帧图像之间的变化,但是,如果拍摄的场景中存在运动的物体,在完成图像配准之后,多帧图像之间的背景区域是对齐的,但是该运动的物体却不是对齐的,需要对物体运动导致的不对齐进行补偿。示例性的,选取该N帧图像中的一帧图像作为参考帧,例如可以将第一帧图像作为参考帧,将其他帧图像均与参考帧进行图像配准,以实现多帧图像的对齐,在一种可选的情况中,如果N帧RAW图像之间存在运动区域,在进行图像配准之后,需要进一步根据参考帧对运动区域进行补偿,才能得到对齐的N帧图像。应当理解,图像配准和运动补偿共同用于实现多帧图像的对齐。应当理解,在某些情况中,难以真正实现多帧图像的完全对齐。
在一种可选的情况中,先对RAW图像进行通道拆分得到多个通道的子图像,先对齐其中一个通道,再基于相同的方式对齐其他通道。在另一种可选的情况中,也可以先进行图像配准和运动补偿实现多帧RAW图像的对齐之后,再对RAW图像进行通道拆分。
在一种可选的方案中,在训练深度学习网络时,构建的训练数据为多帧存在差异的不对齐的图像,这样训练出的深度学习网络具备将多帧不对齐的图像进行融合的能力,可选的,在将数据输入第一深度学习网络之前,可以不预先进行图像配准和运动补偿,而将拆分得到的多帧不对齐的子图像直接输入网络,由网络自行实现多帧数据的对齐和融合。
在一种可选的方案中,预处理可以包含估计图像各区域噪声的大小,并得到一幅噪声强度分布图,该噪声强度分布图可以反映不同区域的噪声强度分布,将该噪声强度分布图以及对齐和拆分的图像数据一起输入第一深度学习网络中,这样第一深度学习网络可以根据每个区域的噪声特点自适应控制每个区域的降噪强度。
在一种可选的方案中,在预处理时可以得到一幅锐化强度图,该锐化强度图包含 对不同区域的锐化强度,将该锐化强度图以及对齐和拆分的图像数据一起输入第一深度学习网络中,这样第一深度学习网络可以自适应控制每个区域的锐化强度。
在一种可选的方案中,在预处理时可以同时得到一幅噪声分布图和一幅锐化强度图,并将噪声分布图、锐化强度图和待处理的图像数据一起输入第一深度学习网络。
303、基于第一深度学习网络对第一中间图像进行处理,得到第一目标图像。
该第一深度学习网络至少可以实现两种细节恢复相关的图像处理功能,该第一目标图像可以为细节丰富、噪声较小的RGB彩色图像。在一种可选的情况中,当输入的RAW图像为RYYB格式时,第一深度学习网络处理之后得到的第一目标图像是RYB三个通道的彩色图像,当输入的RAW图像为XYZW格式时,第一深度学习网络处理之后得到的第一目标图像是XYZW四个通道的彩色图像,这两种情况下,该图像处理的方法还包括:将第一目标图像经过颜色转换得到RGB彩色图像。
示例性的,第一深度学习网络可以包括去马赛克和降噪功能,也可以说输入的图像经过该深度学习网络处理之后,相当于同时实现了去马赛克处理和去噪处理。由于去马赛克和降噪是细节恢复很关键的处理,且无论先进行去马赛克处理,还是先进行降噪处理,都会影响图像细节的恢复效果,本申请实施例将去马赛克处理和降噪处理融合在同一个深度学习网络中实现,避免了两种操作串行处理导致的错误积累。对应的,第一深度学习网络输出的第一目标图像为经过去噪和去马赛克的RGB彩色图像。
在一种可选的情况中,第一深度学习网络可以包括去马赛克、降噪和SR重建功能,也可以说输入的图像经过该深度学习网络处理之后,相当于同时实现了去马赛克处理、去噪处理和SR重建处理。超分辨率是指通过低分辨率的图像得到高分辨率的图像,例如,可以基于一帧低分辨率图像得到一帧高分辨图像,或者,也可以基于多帧低分辨率图像得到一帧高分辨率图像,对于有超分辨率需求的场景,去马赛克、降噪和SR重建处理都是细节恢复很关键的处理,并且前面提到了,如果先进行DM、SR重建处理,会放大图像的噪声或破坏原始图像的噪声形态,影响降噪的效果;如果先降噪,降噪处理带来的细节损失将无法恢复,从而影响DM、SR重建等处理的效果。本申请实施例通过训练得到一个可以同时实现DM、SR重建和去噪的深度学习网络,由于多种功能是由同一个深度学习网络实现的,不存在先后处理顺序,避免了由于多模块串行操作带来的不同处理间的相互影响,也避免了因此导致的错误积累。对应的,第一深度学习网络输出的第一目标图像为经过去噪、去马赛克和SR重建处理的RGB彩色图像。经过SR重建之后的图像的分辨率高于SR重建之前的图像的分辨率。
示例性的,第一深度学习网络可以包括去马赛克、降噪、SR重建和坏点校正功能。应当理解,坏点可以是指图像中由于感光组件的缺陷而导致的无效或错误的像素点,或者是指图像中的瑕疵点,例如比周围亮很多的点、比周围暗很多的点、没有比周围特别亮或特别暗但是像素值不正确的点等。
示例性的,第一深度学习网络可以包括去马赛克、降噪、SR重建、坏点校正和锐化功能。
示例性的,第一深度学习网络可以包括去马赛克、降噪、SR重建、坏点校正、锐化和相位点补偿功能。应当理解,相位点为包含相位信息但是不包含有效像素信息的像素点,在显示时,需要根据相位点周围的像素点得到该相位点对应的像素值。
示例性的,第一深度学习网络可以包括去马赛克、降噪和坏点校正功能。
示例性的,第一深度学习网络可以包括去马赛克、降噪和锐化功能。
示例性的,第一深度学习网络可以包括去马赛克、降噪、坏点校正和锐化功能。
示例性的,第一深度学习网络可以包括去马赛克、降噪、坏点校正、锐化功能和相位点补偿功能。
在一种可选的方案中,由于相位点的位置基本固定,且坏点校正的算法比较成熟,因此坏点和相位点可以在产线进行标定,然后在预处理中根据产线标定的坏点位置和相位点的位置进行坏点校正和相位点补偿;然后将没有坏点和相位点的图像输入第一深度学习网络进行细节重建。在一种可选的方案中,坏点和相位点的位置检测以及坏点校正和相位点补偿都可以是在预处理中实现的。
在一种可选的情况中,第一深度学习网络运行在图2中的NPU或GPU中;可选的,深度学习网络也可以部分运行在NPU中,部分运行在GPU中;可选的,第一深度学习网络的运行可能也会涉及到CPU或MCU的控制作用。
304、对第一目标图像进行亮度增强或颜色增强中的至少一项,得到第二目标图像。
应当理解,在第一深度学习网络对输入的图像进行处理之后,还需要对至少对第一目标图像进行亮度增强或者颜色增强,或者对第一目标图像进行亮度增强和颜色增强,示例性的,亮度增强或颜色增强的处理包括以下至少一项:黑电平校正(Black Level Correction,BLC)、自动白平衡(Auto-White Balance,AWB)、镜头阴影校正(Lens Shading Correction,LSC)、色调映射(Tone Mapping)、颜色校正(Color Mapping)、对比度增加或者伽马gamma校正等。可选的,亮度增强和颜色增强可以采用串行模块实现,也可以采用一个神经网络来实现。
在一种可选的方案中,可以将BLC、AWB以及LSC中的一项或多项处理放在预处理中实现,示例性的,可以先对输入的N帧RAW图像进行BLC、AWB以及LSC中的一项或多项处理,然后再进行图像配准、拆分通道和像素重排等处理。在这种情况下,该预处理具体包括:对多帧RAW图像进行黑电平校正BLC、自动白平衡AWB或镜头阴影校正LSC中的至少一项处理,得到多帧第一预处理后的RAW图像;对该多帧第一预处理后的RAW图像进行通道拆分和像素重排列,得到分属于M个通道的多帧子图像,其中,每个通道中的子图像的帧数与多帧RAW图像的帧数相等;对齐每个通道中的多帧子图像。
在一种可选的方案中,由于亮度和颜色增强可能会影响图像边缘的锐度,锐化可以不融合在第一深度学习网络中,在亮度增强和颜色增强之后,再根据实际需求对图像进行锐化。
可选的,该图像处理的方法还包括:
305、将第二目标图像送给显示屏显示或者存储在存储单元中。
可选的,在将第二目标图像存储在存储单元之前,可以先进行编码或压缩处理。可选的,也可以将第二目标图像发送给其他设备。本申请实施例对得到的第二目标图像的去处不做限定。
本申请实施例将细节恢复相关的处理均融合在同一个深度学习网络中,避免了多种处理串行进行时不同处理之间的相互影响,并减少了不同处理的相互影响带来的错 误累积,提升了图像的分辨率和清晰度。进一步的,本申请实施例同时输入N帧RAW图像,融合了多帧图像的有效信息,有助于更好的恢复图像细节;另一方面,由于多帧图像之间可能存在差异,在将图像输入到深度学习网络进行细节恢复之前,先对N帧图像进行通道拆分、像素重排列和对齐等预处理,提升了深度学习网络的处理效果。
如图7所示,为本申请实施例提供的一种图像处理的框架。如图7所示的图像处理的框架可以用于实现如图3所示的图像处理方法。
该图像处理的框架包括:预处理模块、细节恢复深度学习网络、亮度、颜色增强模块,可选的,该图像处理的框架还包括显示屏以及存储器。其中,预处理模块、细节恢复深度学习网络以及亮度、颜色增强模块由处理器实现,这些模块可以由处理器上的软件模块实现,或者由处理器上的专用硬件电路实现,或者由软件和硬件相结合的方式实现。示例性的,预处理模块、亮度和颜色增强模块由处理器中的GPU或ISP或CPU实现,深度学习网络由处理器中的NPU实现;可选的,深度学习网络也可以由GPU和NPU共同实现。在一种可能的方案中,预处理模块和深度学习网络由应用处理器(Application Processor,AP)实现,亮度和颜色增强模块由显示驱动集成器(Display Driving Integrated Circuit,DDIC)实现。DDIC用于驱动显示屏。应当理解,图7中所示的亮度、颜色增强模块还可以称为增强模块,该增强模块用于实现亮度增强或颜色增强中的至少一项。
该图像处理的框架的输入为N帧RAW图像,该N帧RAW图像可以是RGGB格式的Bayer图像,Quad排列的图像或者其他包含R、G、B三种颜色分量的CFA格式的RAW图像。
该预处理模块,用于对输入的N帧RAW图像进行预处理,得到第一中间图像。具体的,可以参考方法实施例关于302部分的描述,此处不再赘述。应当理解,如果输入为N帧RGGB格式的Bayer图像,预处理模块输出的第一中间图像为4N帧子图像,该4N帧图像属于4个通道,且每个通道的子图像只包含一种颜色分量。具体的,该4N帧子图像包括R、G、G、B分量的子图像各N帧,每个分量的子图像属于一个通道。如果输入为N帧Quad排列的图像,则预处理模块输出的第一中间图像为16N帧子图像,该16N帧子图像属于16个通道,且每个通道的子图像只包含一种颜色分量。具体的,由于Quad排列的图像,一个最小重复单元包含的R、G、G、B分量均为4个,对应的,该16N帧子图像包括R、G、G、B分量的子图像各4N帧,每个子分量的子图像属于一个通道。应当理解,预处理模块输出的第一中间图像的帧数与输入的RAW图像的最小重复单元包含的像素的个数有关。
该细节恢复深度学习网络为前述方法实施例中第一深度学习网络的一种示例性网络。细节恢复深度学习网络用于对预处理后的图像进行细节恢复。具体的,该细节恢复深度学习网络用于实现步骤303,具体请参考方法实施例关于303部分的描述,此处不再赘述。在一种可选的方案中,坏点校正和相位点补偿由预处理模块实现,去马赛克、降噪和SR重建由细节恢复深度学习网络实现;在一种可选的情况中,去马赛克、降噪、坏点校正、锐化功能和相位点补偿功能均由细节恢复深度学习网络实现。
亮度、颜色增强模块,用于细节恢复深度学习网络输出的图像进行亮度增强和颜色增强。应当理解,亮度增强和颜色增强可以是由同一个模块实现的,也可以是由不 同的模块实现的,也即亮度增强模块和颜色增强模块可以是两个不同的模块。在一种可选的情况中,可以由多个模块实现亮度增强和颜色增强,例如亮度增强或颜色增强相关的每一项处理分别对应一个模块。
示例性的,亮度、颜色增强模块用于实现步骤304,具体请参考方法实施例关于304部分的描述,此处不再赘述。
图像处理框架处理完的图像可以送给显示屏显示或者存储在存储器中。
如图8所示,为本申请实施例提供的另一种示例性的图像处理的框架。图8所示的图像处理的框架也可以用于实现如图3所示的图像处理方法。该图像处理的框架包括:预处理模块、细节恢复深度学习网络、亮度、颜色增强模块以及锐化模块,可选的,该图像处理的框架还包括显示屏以及存储器。区别于图7所示的框架,图8中的框架,锐化模块在亮度、颜色增强模块的后面,这是由于由于亮度增强和颜色增强可能会影响图像边缘的锐度,因此,在亮度增强和颜色增强之后,再根据实际需求对图像进行锐化。其他部分可参考图7中所示的图像处理的框架。应当理解,图8中所示的亮度、颜色增强模块还可以称为增强模块,该增强模块用于实现亮度增强或颜色增强中的至少一项。
如图9所示,为本申请实施例提供的另一种示例性的图像处理的框架。图9所示的图像处理的框架也可以用于实现如图3所示的图像处理方法。该图像处理的框架包括:预处理模块、细节恢复深度学习网络、颜色转换模块以及亮度、颜色增强模块,可选的,该图像处理的框架还包括显示屏以及存储器。
该图像处理的框架的输入为N帧RAW图像,该N帧RAW图像可以是RYYB格式或XYZW格式,当输入的RAW图像为RYYB格式时,预处理模块输出的第一中间图像包括4N帧子图像,具体的,该4N帧子图像包括R、Y、Y、B分量的子图像各N帧,细节恢复深度学习网络处理之后得到的图像是RYB三个通道的彩色图像;当输入的RAW图像为XYZW格式时,预处理模块输出的第一中间图像包括4N帧子图像,具体的,该4N帧子图像包括X、Y、Z、W分量的子图像各N帧,细节恢复深度学习网络处理之后得到的图像是XYZW四个通道的彩色图像,因此,上述两种情况,细节恢复深度学习网络后面还有一个颜色转换模块,用于将RYB、XYZW彩色图像转换为RGB彩色图像。应当理解,当输入的RAW图像的格式使得细节恢复深度学习网络输出的图像不是RGB格式的彩色图像时,都需要在细节恢复深度学习网络后增加颜色转换模块,从而将=其他非RGB格式的图像转换为RGB彩色图像。将图像转换为RGB彩色图像之后,再经过亮度、颜色增强模块的处理之后,送到显示屏显示或存储在存储器中。
在一种可选的方案中,可以在图9所示的图像处理框架的亮度、颜色增强模块之后加一个锐化模块。应当理解,图9中所示的亮度、颜色增强模块还可以称为增强模块,该增强模块用于实现亮度增强或颜色增强中的至少一项。
如图10所示,为本申请实施例提供的一种示例性的深度学习网络的结构示意图。应当理解,图10以为2倍变焦为例对深度学习网络的结构进行说明,还存在其他形式的网络结构,本申请实施例对网络结构的具体形态不做限定。应当理解,如果深度学习网络的输出图像的长和宽分别是输入图像的长和宽的两倍,则表示该深度学习网络 的放大倍数为2倍,如果深度学习网络输出的图像的长和宽分别是输入图像的长和宽的四倍,则表示该深度学习网络的放大倍数为4倍。2倍变焦是指最终输出的图像的长和宽分别是原始输入的图像的长和宽的2倍,应当理解,原始输入的图像不同于深度学习网络的输入图像,通常来说,深度学习网络的输入图像是通过对原始输入的图像经过预处理得到的。如图11所示,为本申请实施例提供的一种示例性的细节恢复网络的处理效果示意图。该细节恢复网络为2倍变焦的深度学习网络,原始输入的图像的尺寸为4帧6*6的RAW图像,该原始输入的图像经过预处理之后得到细节恢复网络的输入图像,该细节恢复网络的输入图像是原始输入的RAW图像经过通道拆分和像素重排列得到的尺寸为3*3的R、G、G、B四个分量的子图像,其中,一帧6*6的RAW图像进行通道拆分和像素重排列之后得到4帧3*3的子图像,4帧6*6的RAW图像拆分后共得到16帧子图像(图中只示出了8帧),经过细节恢复网络处理之后,输出的图像为12*12的RGB彩色图像。
参考图10所示,该深度学习网络包括:特征提取卷积模块、多个残差网络卷积模块、特征融合模块1、两个上采样卷积块以及特征融合卷积模块2。
如图12所示,为本申请实施例提供的一种示例性的特征提取卷积块的结构图。该特征提取卷积块包括第一卷积层Conv(k3n64s1)、第一激活函数层(PReLU)、第二卷积层Conv(k3n128s1)和第二激活函数层(PReLU)。其中,k表示卷积核的大小,n表示卷积后的特征图的通道个数,s表示卷积步长(stride),应当理解,后续图13至图15所示的结构图中的k,n,s表示的物理意义相同。也即图12所示的第一卷积层的卷积核的大小为3,卷积后的特征图的通道个数为64,卷积步长为1,第二卷积层的卷积核的大小为3,卷积后的特征图的通道个数为128,卷积步长为1。应当理解,本申请实施例仅提供了特征提取卷积块的一种示例性结构,还可以由其他结构,例如卷积层和激活函数层的个数可以不为2个,卷积层中的k,n,s的数都是可选的。在一种可选的情况中,细节恢复网络可以不包含特征提取卷积模块,或者包含多个特征提取卷积模块。
如图13所示,为本申请实施例提供的一种示例性的残差网络卷积块的结构图。该残差网络卷积块包括第一卷积层Conv(k3n128s1)、激活函数层(PReLU)和第二卷积层Conv(k3n128s1)。应当理解,图10所示的细节恢复网络结构中,有多个残差网络卷积模块,也也可以说需要做多次残差网络卷积块,示例性的,残差网络卷积块的个数可以设置为6个。
如图14a和图14b所示,为本申请实施例提供的一种示例性的特征融合模块1和特征融合模块2的结构图。其中,特征融合模块1包括一个卷积层Conv(k3n128s1),特征融合模块2包括一个卷积层Conv(k3n3s1)。也即,特征融合模块1的卷积层的卷积核大小为3,特征图的通道个数为128,卷积步长为1,特征融合模块2的卷积层的卷积核大小为3,特征图的通道个数为3,卷积步长为1。应当理解,由于特征融合模块2输出的图像数据为细节恢复网络的输出数据,当细节恢复网络的输出数据为RGB彩色数据时,特征融合模块2的特征通道数为3,特征融合模块2的k,s以及特征融合模块1的k,n,s的值都是可选择的。应当理解,细节恢复网络可以不包含特征融合模块1,或者可以包含多个特征融合模块1。在一种可选的情况中,当输入的RAW 图像为XYZW格式的图像时,特征融合模块2的特征通道数为4,也即,深度学习网络输出的图像包含4个通道。
如图15所示,为本申请实施例提供的一种示例性的上采样卷积块的结构图。由于图10所示的深度学习网络为2倍变焦的深度学习网络,因此需要两个上采样卷积块,该上采样卷积块包括卷积层Conv(k3n256s1),像素洗牌层PixelShuffler以及激活函数层(PReLU),应当理解,图15中所示的PixelShufflerX2表示该像素洗牌层的为一个2倍上采样的像素洗牌层,可选的,放大倍数为4的上采样卷积块中包含一个4倍上采样的像素洗牌层,或者包含两个2倍上采样的像素洗牌层,。
应当理解,对应不同的RAW图像和输入图像、输出图像的尺寸,或者当变焦倍数不同时,深度学习网络的结构也需要相应调整,示例性的,上采样卷积块的个数会有所不同。将输入的RAW图像的长和宽分别记为h0和w0,通道拆分后的子图像的长和宽分别记为h1和w1,深度学习网络输出的彩色图像的长和宽分别记为h2和w2,则深度学习网络中所需的上采样卷积块的个数为〖log〗_2r,其中r=h2/h1=w2/w1。当输入的RAW图像为RGGB、RYYB或者XYZW时,h1/h0=1/2,w1/w0=1/2,当输入的RAW图像为Quad格式时,h1/h0=1/4,w1/w0=1/4,如果输入的是10M的RGGB图像,输出的是10M的RGB图像,即h0=h2,w0=w2,r=h2/h1=w2/w1=h0/h1=w0/w1=2,因此上采样卷积块的个数为1;如果输入的是10M Quad图像,输出的是10M RGB图像,则r=h2/h1=w2/w1=4h2/h0=4w2/w0=4,需要的上采样卷积块的个数为2;如果输入的是40M XYZW格式的图像,输出的是10M XYZW四通道彩色图像,则r=h2/h1=w2/w1=2h2/h0=2w2/w0=1,这种情况下,不需要上采样卷积块。
为了实现网络训练,需要获取训练数据,例如可以通过搜集大量配对的低质量输入图像和高质量目标图像组成训练数据集,对网络进行训练,其中低质量图像为深度学习网络输入的图像,高质量目标图像为经过一个符合要求的深度学习网络处理之后的目标图像。示例性的,如果要训练出包含去马赛克、降噪和SR重建功能的深度学习网络,则构建的训练数据包括:多帧有噪声、带马赛克、低分辨率的RAW图像和一帧没有噪声、去除马赛克的、高分辨率的彩色图像。如果要训练出包含去马赛克、降噪、SR重建和坏点去除功能的深度学习网络,则构建的训练数据包括:多帧有噪声、带马赛克、低分辨率、有坏点的RAW图像和一帧没有噪声、去除马赛克的、高分辨率、没有坏点的彩色图像。如果要训练出包含去马赛克、降噪、SR重建和锐化功能的深度学习网络,则构建的训练数据包括:多帧有噪声、带马赛克、模糊的、低分辨率的RAW图像和一帧没有噪声、去除马赛克的、锐利的、高分辨率的彩色图像。如果要训练出包含去马赛克、降噪、SR重建、坏点去除和锐化功能的深度学习网络,则构建的训练数据包括:多帧有噪声、带马赛克、模糊、有坏点、低分辨率的RAW图像和一帧没有噪声、去除马赛克的、锐化的、没有坏点的、高分辨率的彩色图像。总之,构建的训练数据与深度学习网络的功能有关,此处不再一一列举。
然而,在现实拍摄的环境下。很难同时采集到严格对齐的低质量和高质量图像。因此,我们采用人工合成的方法获取训练数据。
本申请实施例提供两种获取高质量图像的示例性方案:第一,通过网络下载一定量的开放数据集,并从中选出质量非常好的图像;第二,利用高质量相机,严格控制 光源条件,拍摄得到符合预设条件的高质量图像,该预设条件可以是根据具体需求对应性设置的,应当理解,照相机拍摄并输出的该高质量图像为经过处理后符合人眼特性的RGB彩色图像;进一步的,对得到的高质量图像进行反向Gamma校正,使得反向Gamma校正后的高质量图像的亮度范围更加接近照相机获取的RAW图的亮度范围,然后进行下采样得到训练数据中高质量的目标图像(RGB彩色图像),下采样可以去除一些微小的瑕疵,进一步提升图像质量。应当理解,训练数据集中的高质量图像可以均为第一种方案获取的图像,或者均为第二种方案获取的图像,或者第一种方案获取的图像和第二种方案获取的图像符合一定的比例。
下面,本申请实施例提供一种获取低质量图像的示例性方案。
对上述获得的高质量图像进行一系列的降质量操作,得到低质量的输入图像。例如,要获取有噪声、带马赛克、模糊、有坏点、低分辨率的RAW图像时,对高质量图像进行如下操作:下采样,高斯模糊、添加噪声、马赛克处理、和加坏点、其中,如果深度学习网络为2倍变焦网络,则对高质量图像进行2倍下采样,高斯模糊的模糊强度可以随机选择。应当理解,对一帧高质量图像进行上述各操作,则可获取一帧低质量图像,如果要获取多帧低质量图像,则对一帧高质量图像多次进行上述各操作。经过上述构造训练数据的方式,输入的低质量图像与输出的高质量目标图像之间存在着噪声、坏点、马赛克、分辨率和清晰度(模糊)的差异,用这样的训练数据训练出的网络可以同时具有去马赛克、降噪、SR重建、坏点去除和锐化的功能。由于低质量输入图像是基于高质量图像模拟得到的,低质量输入图像和高质量目标图像之间是严格对齐的,进一步提升了训练网络的训练效果。
示例性的,如果要获取包含噪声、带马赛克、低分辨率的RAW图像时,对高质量图像进行如下操作:下采样,添加噪声和马赛克处理。本申请实施例根据需要获取的低质量图像对高质量图像进行对应的降质处理,此处不一一列举。
另外,由于构建多帧低质量输入图像时,多帧低质量输入图像是独立构建的,不同的低质量输入图像之间存在着噪声、坏点和局部清晰度的差异,因此训练出的网络具备多帧融合的能力。
本申请实施例先获取高质量图像,并通过对高质量图像进行降质处理模拟得到低质量图像,因此构建的训练数据中的低质量输入图像与高质量目标图像之间是严格对齐的;进一步的,基于该构建的训练数据对网络进行训练,得到的深度学习网络可以实现多种图像细节恢复相关的处理,并且由于输入的多帧低质量图像之间存在一定的噪声、坏点和局部清晰度的差异,训练得到的深度学习网络还具备多帧融合的能力。基于该深度学习网络对图像进行处理,可以同时实现图像细节恢复相关的功能,将输入的RAW图像转换成清晰度高、噪声较低、细节分明的高分辨率RGB彩色图像,另外,由于多种细节恢复相关的处理都是由深度学习网络实现的,而非采用串行的处理顺序,避免了多种处理之间的相互影响,并消除了将图像由低质量的RAW图像转换成高质量的RGB彩色图像的过程中积累的错误。还有一点,由于在训练深度学习网络时,输入的是多帧低质量的图像,输出的是一帧高质量的图像,训练得到的深度学习网络还具备多帧融合的能力,基于此,在进行图像处理时,输入多帧低质量的RAW图,深度学习网络可以将多帧图像的有效信息结合在一起,进一步提升经过深度学习 网络处理后输出的图像的质量。
下面介绍一下深度学习网络的损失函数(loss function),损失函数为用于衡量预测值和目标值的差异的重要方程。因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量,比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。而如何比较预测值与目标值之间的差异是由损失函数或目标函数来定义的。损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。本申请实施例可以将网络的输出和目标图像之间的L1 Loss或L2 Loss作为损失函数,可选的,可以将L1 Loss与结构相似性(Structural similarity,SSIM)和对抗Loss相结合作为损失函数,或者,也可以将L2 Loss与SSIM和对抗Loss相结合作为损失函数。
关于深度学习网络的训练方法:本申请实施例可以采用自适应矩阵估计(adaptive moment estimation,Adma)方法优化网络参数,当损失下降到比较收敛的状态时,即可认为训练完成。
本申请实施例提供的图像处理的框架和图像处理的方法适用于多种应用场景或者适用于多种不同的拍照场景。下面介绍一下在不同应用场景下的应用:
暗光场景:该场景对降噪的效果要求较高,多帧融合技术很重要。因此,在暗光场景下,可以增加输入图像的帧数,例如,如果光线较亮的场景下输入4帧图像,则在暗光场景下可以输入6帧、8帧或者9帧图像。
变焦模式:对于不同变焦倍数,深度学习网络的结构也有区别。这里讲一下4倍变焦情况。区别于2倍变焦的网络结构,4倍变焦的深度学习网络中需要3个上采样卷积块。在生成训练数据时,将高质量的图像处理为低质量图像时,要进行4倍的下采样,应当理解,4倍的下采样表示下采样之后的图像的长和宽分别是原始图像的长和宽的四分之一,也即,下采样之后的图像的面积是原始图像的面积的十六分之一。
HDR场景:输入采用多帧短曝光的图像,尽量保证高亮区域不过曝,然后基于细节恢复网络恢复图像的细节,尤其是图像的暗部细节,进一步的,再利用亮度增强模块对细节恢复网络输出的图像进行亮度增强,从而恢复整幅图像的动态范围,从而实现HDR功能。HDR场景下,输入数据为多帧短曝光的RAW图像,例如可以为6帧或8帧。对应的,在训练HDR深度学习网络时,训练数据中需要增加一些短曝光的训练数据,本申请实施例提供一种获取短曝光训练数据的方法:
随机选择一些曝光合理的高质量图像,对这些高质量图像进行反向Gamma校正,得到反伽马校正图像,该反伽马校正图像的亮度范围符合摄像头获取的原始RAW图像的亮度范围;
将第一中间图像的每个像素值均除以一个数字,该数字表示将该曝光合理的图像的曝光降低的程度,例如,当每个像素值除以2,表示模拟得到的短曝光图像是原始曝光合理的图像的曝光时间的1/2,除以4表示曝光时间是1/4,以此类推。可选的,该数字的取值取决于实际抓取图像时可能选择的降曝光的比例,示例性的,该数值可以选择2、4、8、16等。
本申请实施例还提供一种自适应选择深度学习网络的方法,如图16所示,该方法包括:
1601、基于第一指示信息从深度学习网络资源池中选择该第一指示信息对应的目标深度学习网络,该深度学习网络资源池中包括多种不同功能的深度学习网络;
示例性的,多种不同功能的深度学习网络包括:多种不同变焦场景下的深度学习网络,HDR场景下的深度学习网络,暗光场景下的深度学习网络,夜景模式下的深度学习网络,具有去马赛克、降噪和SR重建功能的第一细节恢复网络,具有马赛克、降噪、SR重建和锐化功能的第二细节恢复网络,具有马赛克、降噪、SR重建和坏点校正功能的第三细节恢复网络等等。该多种不同功能的深度学习网络为提前训练得到的,固化或者存储在移动终端的存储器内或者移动终端的处理器的存储单元内。在一种可选的情况中,该深度学习网络也可以是实时训练并不断更新的。在一种可选的方案中,多种不同功能的深度学习网络以软件算法实现,并基于这些软件算法调用NPU或GPU中的硬件计算资源实现深度学习网络的处理功能,应当理解,硬件资源也可以为NPU或GPU之外的硬件资源。在一种可选的方案中,不同功能的深度学习网络分别固化在不同的人工智能AI引擎中,一种深度学习网络对应一个AI引擎,该AI引擎为硬件模块或专用的硬件电路,多个AI引擎可以共用计算资源池中的计算资源。
示例性的,该第一指示信息可以是用户基于自身需求或者基于当前的场景特点选择并发送的,例如用户通过触摸应用程序APP界面上的模式选择按钮选择适用或偏好的应用场景并所发送与该应用场景对应的第一指示信息,并将该第一指示信息发送给移动终端或处理器中的AI控制器,进一步的,AI控制器基于该第一指示信息选通或使能对应的AI引擎或对应的深度学习网络,或者AI控制器基于该第一指示信息读取对应的深度学习网络并载入处理器中。
在一种可选的情况中,该第一指示信息为分析当前摄像头获取的预览图像的特性得到的,预览图像的特性与当前应用场景相关,也可以说,不同应用场景下获取的预览图像的特性是有所区别的,通过分析预览图像的特性可以确定当前的应用场景,并得到用于指示当前应用场景的第一指示信息,AI控制器基于该第一指示信息从深度学习网络资源池中选择适用于当前应用场景的深度学习网络。例如,如果当前的预览图像的特性与暗光场景相匹配,则AI控制器选择暗光深度学习网络为目标深度学习网络,进一步的,控制摄像头拍摄多帧合理曝光的图像作为输入,应当理解,暗光场景需要考虑降噪的效果,需要适当增加输入的图像的帧数;如果当前的预览图像的特性与HDR场景相匹配,则AI控制器选择HDR深度学习网络为目标深度学习网络,进一步的,控制摄像头拍摄多帧短曝光的图像作为输入,可选的,也可以控制摄像头获取多帧曝光时间不同的图像作为输入,该多帧曝光时间不同的图像可以包括若干曝光时间较长的图像和若干曝光时间较短的图像。
在一种可选的情况中,该第一指示信息为输入数据携带的,示例性的,该第一指示信息为输入数据携带的变焦倍数,AI控制器在接收到输入数据携带的变焦倍数时,选通或使能变焦倍数对应的深度学习网络。
1602、基于目标深度学习网络对输入的图像数据进行处理,得到第一输出图像。
可选的,该第一输出图像可以为最终输出的目标高质量图像。
在一种可选的情况中,该方法还包括:
1603、对第一输出图像进行亮度增强和颜色增强,得到第二输出图像。
在一种可选的情况中,该方法还包括:
1604、对第二输出图像进行色域转换或者颜色格式转换,得到能够显示到显示屏上的目标输出图像。
在一种可选的情况中,在1601之前,该方法还包括:
获取N帧RAW图像;
对获取的N帧RAW图像进行预处理,得到输入深度学习网络的输入图像数据。
示例性的,该预处理包括图像配准、运动补偿、通道拆分和像素重排列等。
在一种可选的情况中,在1603之后,还可以对第二输出图像进行锐化。
本申请实施例提供的自适应选择深度学习网络的方法中,可以根据用户的需求或者输入数据的特性或者根据输入数据携带的参数从深度学习网络资源池中选择或使能最合适的深度学习网络,最大程度满足不同用户或不同场景的需求,并且可以做到在不同的场景下均可以提供最优的深度学习网络,提供最好的图像处理效果,优化用户体验,提升移动终端或图像处理器的图像处理性能,增强竞争力。
本申请实施例还提供一种自适应选择深度学习网络的装置,如图17所示,该装置包括:接收接口、人工智能控制器和深度学习网络资源池,该深度学习网络资源池包括多种功能的深度学习网络。
该接收接口用于接收图像数据、指示信息或者各种控制信号,例如可以用于接收用户在移动终端的显示屏幕上的应用程序APP界面上选择的模式或场景指示信息,或者可以用于接收摄像头获取的图像数据等。
该人工智能AI控制器与深度学习网络资源池相耦合,人工智能控制器基于第一指示信息从深度学习网络资源池中选择与第一指示信息对应的目标深度学习网络。可选的,该第一指示信息可以是通过接收接口接收的来自用户的指示信息,也可以是装置对摄像头获取的预览图像进行特性分析得到的与场景相关的指示信息,或者可以是输入的图像数据本身携带的指示信息。示例性的,该人工智能控制器可以由专用的硬件电路实现,或者由通用处理器或CPU实现,也可以由运行在处理器上的软件模块实现。深度学习网络由AI引擎实现,该AI引擎为硬件模块或一种专用的硬件电路,或者该深度学习网络由软件代码或软件模块实现;当深度学习网络以软件代码或者软件模块实现时,深度学习网络资源池存储在存储器中。
可选的,该装置还包括:处理器,该处理器例如可以为GPU、NPU、ISP、通用的AP或者其他智能处理器,该处理器基于目标深度学习网络对输入图像进行处理,得到第一输出图像。当深度学习网络以软件代码或者软件模块实现时,深度学习网络运行在处理器上,示例性的,AI控制器从深度学习网络资源池中将目标深度学习网络读出来,加载到处理器中,然后由处理器运行该目标深度学习网络以实现该目标深度学习网络对应的功能。例如,可以将选择的目标深度学习网络加载到如图17所示的细节恢复网络中。
可选的,该装置还包括:硬件计算资源,该硬件计算资源包括:加、减、乘、除、指数运算、对数运算和大小比较等,该硬件计算资源可以被多个深度学习网络复用。 具体的,处理器在运行目标深度学习网络时,基于目标深度学习网络的指示调用硬件计算资源中的计算资源对输入图像进行处理,从而实现目标深度学习网络对应的功能。
可选的,该装置还包括:预处理模块,该预处理模块用于在深度学习网络之前,对初始输入的RAW图像进行预处理,该预处理可以包括如302部分所说明的预处理。可选的,预处理模块还可以分析摄像头获取的预览图像的特性,并将特性信号发送给AI控制器,该AI控制器基于该特性信号从深度学习网络资源池中选择对应的深度学习网络。可选的,分析原始RAW图像的特性也可以是由专用的图像特性分析模块实现的或者由通用的处理器实现。
可选的,该装置还包括:颜色增强模块和亮度增强模块,颜色增强模块用于对深度学习网络输出的第一输出图像进行颜色增强,亮度增强模块用于对深度学习网络输出的第一输出图像进行亮度增强。应当理解,颜色增强和亮度增强也可以是由同一个模块实现的,并且,颜色增强和亮度增强可以由硬件模块实现,也可以由软件模块实现,或者由软件模块结合硬件模块实现。
可选的,该装置还包括:颜色格式转换模块,用于将图像转换为显示屏支持的图像格式或者用户指定的目标格式。
应当理解,预处理模块、颜色增强和亮度增强模块以及颜色格式转换模块均可以是由处理器实现的。
本申请实施例提供的自适应选择深度学习网络的装置包括一个深度学习网络资源池,可以根据用户选择的模式选择合适的深度学习网络,或者自适应分析输入图像的特性选择合适的深度学习网络,或者根据输入图像携带的特性参数选择合适的深度学习网络,在多种应用场景中均能基于最优深度学习网络对图像进行处理,在各种场景中均可以达到最佳的图像处理效果,提升用户体验,提升移动终端或图像处理器的图像处理性能,增强竞争力。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。上述信号处理装置的各组成模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在所述计算机可读取存储介质中。
基于这样的理解,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机或处理器上运行时,使得计算机或处理器执行本申请实施例提供的任一个方法。本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备或其中的处理器执行本申请各个实施例所述方法的全部或部分步骤。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。例如,装置实施例中的一些具体操作可以参考之前的方法实施例。
Claims (39)
- 一种图像处理的方法,其特征在于,所述方法包括:获取多帧原始RAW图像;对所述多帧RAW图像进行预处理,得到第一中间图像,所述预处理包括:通道拆分和像素重排列,所述第一中间图像包括属于多个通道的子图像,其中,每个通道的子图像只包含一种颜色分量;基于第一深度学习网络对所述第一中间图像进行处理,得到第一目标图像,所述第一深度学习网络的功能包括:去马赛克DM和降噪;对所述第一目标图像进行亮度增强或颜色增强中的至少一项,得到第二目标图像。
- 根据权利要求1所述的方法,其特征在于,所述第一深度学习网络的功能还包括:超分辨率SR重建,所述RAW图像具有第一分辨率,所述第一目标图像具有第二分辨率,所述第二分辨率大于所述第一分辨率。
- 根据权利要求1或2所述的方法,其特征在于,所述第一深度学习网络的功能还包括:坏点校正或相位点补偿中的至少一项。
- 根据权利要求1或2所述的方法,其特征在于,所述预处理还包括:坏点校正或相位点补偿中的至少一项。
- 根据权利要求1至4任一项所述的方法,其特征在于,所述第一深度学习网络的功能还包括:锐化。
- 根据权利要求1至4任一项所述的方法,其特征在于,所述方法还包括:对所述第二目标图像进行锐化,得到第三目标图像;将所述第三目标图像发送到显示屏或者存储器。
- 根据权利要求1至6任一项所述的方法,其特征在于,所述RAW图像为RYYB图像或包含4个不同颜色分量的图像,在所述对所述第一目标图像进行亮度增强和颜色增强,得到第二目标图像之前,所述方法还包括:对所述第一目标图像经过颜色转换,得到RGB彩色图像;所述对所述第一目标图像进行亮度增强和颜色增强,得到第二目标图像,具体包括:对所述RGB彩色图像进行亮度增强或颜色增强中的至少一项,得到所述第二目标图像。
- 根据权利要求1至7任一项所述的方法,其特征在于,所述第一深度学习网络的功能还包括:图像对齐。
- 根据权利要求1至7任一项所述的方法,其特征在于,所述预处理还包括:图像对齐。
- 根据权利要求9所述的方法,其特征在于,所述预处理具体包括:对所述多帧RAW图像进行通道拆分和像素重排列,得到分属M个通道的多帧子图像,其中,每个通道中的子图像的帧数等于所述多帧RAW图像的帧数;分别对齐每个通道中的多帧子图像。
- 根据权利要求10所述的方法,其特征在于,所述分别对齐每个通道中的多帧子图像,具体包括:对齐第一通道中的多帧子图像,所述第一通道为所述M个通道中的任一个通道;基于对齐所述第一通道时所使用的对齐方式对齐其他通道。
- 根据权利要求1至11任一项所述的方法,其特征在于,所述亮度增强或颜色增强包括如下至少一项:黑电平校正BLC、自动白平衡AWB、镜头阴影校正LSC、色调映射Tone Mapping、颜色校正Color Mapping、对比度增加或者伽马gamma校正。
- 根据权利要求1至7任一项所述的方法,其特征在于,所述预处理具体包括:对所述多帧RAW图像进行黑电平校正BLC、自动白平衡AWB或镜头阴影校正LSC中的至少一项处理,得到多帧第一预处理后的RAW图像;对所述多帧第一预处理后的RAW图像进行通道拆分和像素重排列,得到分属于M个通道的多帧子图像,其中,每个通道中的子图像的帧数与所述多帧RAW图像的帧数相等;对齐每个通道中的多帧子图像。
- 根据权利要求1至13任一项所述的方法,其特征在于,所述第一中间图像包括的子图像所属的通道数等于所述RAW图像的最小重复单元包含的像素个数。
- 根据权利要求1至14任一项所述的方法,其特征在于,所述预处理还包括:估计图像的噪声强度区域分布图或锐化强度图的至少一项;所述第一深度学习网络具体用于实现下述的至少一项:基于所述噪声强度区域分布图控制所述第一中间图像的不同区域的降噪程度;基于所述锐度强化图控制所述第一中间图像的不同区域的锐化强度。
- 根据权利要求1至15任一项所述的方法,其特征在于,所述第一深度学习网络包括:多个残差网络卷积模块、至少一个上采样卷积块以及第二特征融合卷积模块,所述第二特征卷积模块的输出为所述第一深度学习网络的输出,所述第二特征融合卷积模块的特征通道数为3或4。
- 根据权利要求16所述的方法,其特征在于,所述上采样卷积块的个数与所述RAW图像的格式、所述RAW图像的尺寸和所述第一目标图像的尺寸有关。
- 根据权利要求16或17所述的方法,其特征在于,所述第一深度学习网络还包括:特征提取卷积模块和第一特征融合模块,所述多个残差网络卷积模块的输出为所述第一特征融合模块的输入。
- 根据权利要求1至18任一项所述的方法,其特征在于,当所述方法应用于HDR场景时,所述多帧RAW图像为多帧短曝光的RAW图像,所述第一深度学习网络的训练数据包括多帧短曝光训练图像,所述短曝光训练图像根据如下方法获得:对曝光合理的高质量图像进行反向Gamma校正,得到反向伽马校正图像;将所述反向伽马校正图像的每个像素值均除以一个数字,得到所述短曝光训练图像。
- 根据权利要求1至18任一项所述的方法,其特征在于,当所述方法应用于暗光场景时,增加输入的所述RAW图像的帧数;当所述方法应用于变焦模式时,所述第一深度学习网络中的上采样卷积块的个数与变焦倍数有关。
- 根据权利要求1至19任一项所述的方法,其特征在于,所述第一深度学习网络为根据第一指示信息从深度学习网络资源池中选择的目标深度学习网络,所述第一指示信息为用户在应用程序APP界面上选择的与应用场景相关的指示信息;或者,所述第一指示信息为分析摄像头获取的预览图像的特性得到的与应用场景相关的指示信息;或者,所述第一指示信息为输入所述多帧RAW图像携带的倍率信息。
- 一种图像处理的装置,其特征在于,所述装置包括:预处理模块,用于对多帧RAW图像进行预处理,得到第一中间图像,所述预处理包括:通道拆分和像素重排列,所述第一中间图像包括属于多个通道的子图像,其中,每个通道的子图像只包含一种颜色分量;第一深度学习网络,用于对所述第一中间图像进行处理,得到第一目标图像,所述第一深度学习网络的功能包括:去马赛克DM和降噪;增强模块,用于对所述第一目标图像进行亮度增强或颜色增强中的至少一项,得到第二目标图像。
- 根据权利要求22所述的装置,其特征在于,所述第一深度学习网络的功能还包括:超分辨率SR重建,所述RAW图像具有第一分辨率,所述第一目标图像具有第二分辨率,所述第二分辨率大于所述第一分辨率。
- 根据权利要求22或23所述的装置,其特征在于,所述第一深度学习网络的功能还包括:坏点校正或相位点补偿中的至少一项;或者,所述预处理还包括:坏点校正或相位点补偿中的至少一项。
- 根据权利要求22至24任一项所述的装置,其特征在于,所述第一深度学习网络的功能还包括:锐化。
- 根据权利要求22至24任一项所述的装置,其特征在于,所述装置还包括:锐化模块,用于对所述第二目标图像进行锐化,得到第三目标图像;发送接口,用于将所述第三目标图像发送到显示屏或者存储器。
- 根据权利要求22至26任一项所述的装置,其特征在于,所述RAW图像为RYYB图像或包含4个不同颜色分量的图像,所述装置还包括:颜色转换模块,用于对所述第一目标图像经过颜色转换,得到RGB彩色图像;所述增强模块,具体用于对所述RGB彩色图像进行亮度增强或颜色增强中的至少一项,得到所述第二目标图像。
- 根据权利要求22至27任一项所述的装置,其特征在于,所述第一深度学习网络的功能还包括:图像对齐,或者,所述预处理还包括:图像对齐。
- 根据权利要求22至27任一项所述的装置,其特征在于,所述预处理还包括图像对齐,所述预处理模块,具体用于:对所述多帧RAW图像进行通道拆分和像素重排列,得到分属M个通道的多帧子图像,其中,每个通道中的子图像的帧数等于所述多帧RAW图像的帧数;对齐第一通道中的多帧子图像,所述第一通道为所述M个通道中的任一个通道;基于对齐所述第一通道时所使用的对齐方式对齐其他通道。
- 根据权利要求22至29任一项所述的装置,其特征在于,所述增强模块具体用于实现如下至少一项:黑电平校正BLC、自动白平衡AWB、镜头阴影校正LSC、色调映射Tone Mapping、 颜色校正Color Mapping、对比度增加或者伽马gamma校正。
- 根据权利要求22至27任一项所述的装置,其特征在于,所述预处理模块,具体用于:对所述多帧RAW图像进行黑电平校正BLC、自动白平衡AWB或镜头阴影校正LSC中的至少一项处理,得到多帧第一预处理后的RAW图像;对所述多帧第一预处理后的RAW图像进行通道拆分和像素重排列,得到分属于M个通道的多帧子图像,其中,每个通道中的子图像的帧数与所述多帧RAW图像的帧数相等;对齐每个通道中的多帧子图像。
- 根据权利要求22至31任一项所述的装置,其特征在于,所述第一中间图像包括的子图像所属的通道数等于所述RAW图像的最小重复单元包含的像素个数。
- 根据权利要求22至32任一项所述的装置,其特征在于,所述预处理模块还用于:估计图像的噪声强度区域分布图或锐化强度图的至少一项;所述第一深度学习网络具体用于实现下述的至少一项:基于所述噪声强度区域分布图控制所述第一中间图像的不同区域的降噪程度;基于所述锐度强化图控制所述第一中间图像的不同区域的锐化强度。
- 根据权利要求22至33任一项所述的装置,其特征在于,所述第一深度学习网络包括:多个残差网络卷积模块、至少一个上采样卷积块以及第二特征融合卷积模块,所述第二特征卷积模块的输出为所述第一深度学习网络的输出,所述第二特征融合卷积模块的特征通道数为3或4。
- 根据权利要求22至34任一项所述的装置,其特征在于,当所述装置应用于HDR场景时,所述多帧RAW图像为多帧短曝光的RAW图像;当所述装置应用于暗光场景时,增加输入的所述RAW图像的帧数;当所述装置应用于变焦模式时,所述第一深度学习网络中的上采样卷积块的个数与变焦倍数有关。
- 根据权利要求22至35任一项所述的装置,其特征在于,所述第一深度学习网络为根据第一指示信息从深度学习网络资源池中选择的目标深度学习网络,所述第一指示信息为用户在应用程序APP界面上选择的与应用场景相关的指示信息;或者,所述第一指示信息为分析摄像头获取的预览图像的特性得到的与应用场景相关的指示信息;或者,所述第一指示信息为输入所述多帧RAW图像携带的倍率信息。
- 一种图像处理的装置,其特征在于,所述装置包括:接收接口和处理器,所述处理器上运行有第一深度学习网络,所述第一深度学习网络的功能包括:去马赛克DM和降噪;所述接收接口,用于接收摄像头获取的多帧RAW图像;所述处理器,用于调用存储器中存储的软件代码,以执行如权利要求1至21任一项权利要求所述的方法。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机或处理器上运行时,使得所述计算机或处理器执行如权利要求1至21任一 项所述的方法。
- 一种包含指令的计算机程序产品,当所述指令在计算机或处理器上运行时,使得所述计算机或处理器执行如权利要求1至21任一项所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20866370.8A EP4024323A4 (en) | 2019-09-18 | 2020-07-22 | IMAGE PROCESSING METHOD AND APPARATUS |
US17/698,698 US20220207680A1 (en) | 2019-09-18 | 2022-03-18 | Image Processing Method and Apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910882529.3A CN112529775A (zh) | 2019-09-18 | 2019-09-18 | 一种图像处理的方法和装置 |
CN201910882529.3 | 2019-09-18 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/698,698 Continuation US20220207680A1 (en) | 2019-09-18 | 2022-03-18 | Image Processing Method and Apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021051996A1 true WO2021051996A1 (zh) | 2021-03-25 |
Family
ID=74883690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/103377 WO2021051996A1 (zh) | 2019-09-18 | 2020-07-22 | 一种图像处理的方法和装置 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220207680A1 (zh) |
EP (1) | EP4024323A4 (zh) |
CN (1) | CN112529775A (zh) |
WO (1) | WO2021051996A1 (zh) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113570532A (zh) * | 2021-07-28 | 2021-10-29 | Oppo广东移动通信有限公司 | 图像处理方法、装置、终端和可读存储介质 |
CN114549383A (zh) * | 2022-02-23 | 2022-05-27 | 浙江大华技术股份有限公司 | 一种基于深度学习的图像增强方法、装置、设备及介质 |
CN115115518A (zh) * | 2022-07-01 | 2022-09-27 | 腾讯科技(深圳)有限公司 | 高动态范围图像的生成方法、装置、设备、介质及产品 |
WO2023125657A1 (zh) * | 2021-12-28 | 2023-07-06 | 维沃移动通信有限公司 | 图像处理方法、装置和电子设备 |
CN116485728A (zh) * | 2023-04-03 | 2023-07-25 | 东北石油大学 | 抽油杆表面缺陷检测方法及装置、电子设备和存储介质 |
CN116630220A (zh) * | 2023-07-25 | 2023-08-22 | 江苏美克医学技术有限公司 | 一种荧光图像景深融合成像方法、装置及存储介质 |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11900561B2 (en) * | 2020-11-25 | 2024-02-13 | Electronics And Telecommunications Research Institute | Deep learning-based image stitching method and apparatus |
US11823354B2 (en) * | 2021-04-08 | 2023-11-21 | GE Precision Healthcare LLC | System and method for utilizing a deep learning network to correct for a bad pixel in a computed tomography detector |
CN113112428A (zh) * | 2021-04-16 | 2021-07-13 | 维沃移动通信有限公司 | 图像处理方法、装置、电子设备及可读存储介质 |
WO2022226732A1 (zh) * | 2021-04-26 | 2022-11-03 | 华为技术有限公司 | 电子装置和电子装置的图像处理方法 |
CN113115112B (zh) * | 2021-06-16 | 2021-09-21 | 上海齐感电子信息科技有限公司 | 验证平台和验证方法 |
CN115601274B (zh) * | 2021-07-07 | 2024-06-14 | 荣耀终端有限公司 | 图像处理方法、装置和电子设备 |
CN115601244B (zh) * | 2021-07-07 | 2023-12-12 | 荣耀终端有限公司 | 图像处理方法、装置和电子设备 |
US12100120B2 (en) * | 2021-07-21 | 2024-09-24 | Black Sesame Technologies Inc. | Multi-frame image super resolution system |
CN113674231B (zh) * | 2021-08-11 | 2022-06-07 | 宿迁林讯新材料有限公司 | 基于图像增强的轧制过程中氧化铁皮检测方法与系统 |
CN113781326A (zh) * | 2021-08-11 | 2021-12-10 | 北京旷视科技有限公司 | 解马赛克方法、装置、电子设备及存储介质 |
CN113689335B (zh) * | 2021-08-24 | 2024-05-07 | Oppo广东移动通信有限公司 | 图像处理方法与装置、电子设备及计算机可读存储介质 |
CN116917934A (zh) * | 2021-08-31 | 2023-10-20 | 华为技术有限公司 | 图像处理方法、装置和车辆 |
CN113808037A (zh) * | 2021-09-02 | 2021-12-17 | 深圳东辉盛扬科技有限公司 | 一种图像优化方法及装置 |
CN115802183B (zh) * | 2021-09-10 | 2023-10-20 | 荣耀终端有限公司 | 图像处理方法及其相关设备 |
CN113837942A (zh) * | 2021-09-26 | 2021-12-24 | 平安科技(深圳)有限公司 | 基于srgan的超分辨率图像生成方法、装置、设备及存储介质 |
CN113935934A (zh) * | 2021-10-08 | 2022-01-14 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备和计算机可读存储介质 |
CN113962884B (zh) * | 2021-10-10 | 2023-03-24 | 杭州知存智能科技有限公司 | Hdr视频获取方法、装置、电子设备以及存储介质 |
CN114283486B (zh) * | 2021-12-20 | 2022-10-28 | 北京百度网讯科技有限公司 | 图像处理、模型训练、识别方法、装置、设备及存储介质 |
CN114418873B (zh) * | 2021-12-29 | 2022-12-20 | 英特灵达信息技术(深圳)有限公司 | 一种暗光图像降噪方法及装置 |
CN114331916B (zh) * | 2022-03-07 | 2022-07-22 | 荣耀终端有限公司 | 图像处理方法及电子设备 |
CN116939225A (zh) * | 2022-03-31 | 2023-10-24 | 京东方科技集团股份有限公司 | 影像处理方法、影像处理装置、存储介质 |
CN118379197A (zh) * | 2023-01-18 | 2024-07-23 | 英特灵达信息技术(深圳)有限公司 | 一种图像处理方法、装置及电子设备 |
CN116152132B (zh) * | 2023-04-19 | 2023-08-04 | 山东仕达思医疗科技有限公司 | 一种显微镜图像的景深叠加方法、装置及设备 |
CN116402724B (zh) * | 2023-06-08 | 2023-08-11 | 江苏游隼微电子有限公司 | 一种ryb格式的raw图像色彩还原方法 |
CN117437131B (zh) * | 2023-12-21 | 2024-03-26 | 珠海视新医用科技有限公司 | 内窥镜图像电子染色方法及装置、设备、存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108492265A (zh) * | 2018-03-16 | 2018-09-04 | 西安电子科技大学 | 基于gan的cfa图像去马赛克联合去噪方法 |
CN109410123A (zh) * | 2018-10-15 | 2019-03-01 | 深圳市能信安科技股份有限公司 | 基于深度学习的去除马赛克的方法、装置及电子设备 |
CN109889800A (zh) * | 2019-02-28 | 2019-06-14 | 深圳市商汤科技有限公司 | 图像增强方法和装置、电子设备、存储介质 |
CN109978788A (zh) * | 2019-03-25 | 2019-07-05 | 厦门美图之家科技有限公司 | 卷积神经网络生成方法、图像去马赛克方法及相关装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4982897B2 (ja) * | 2007-08-27 | 2012-07-25 | 株式会社メガチップス | 画像処理装置 |
CN107767343B (zh) * | 2017-11-09 | 2021-08-31 | 京东方科技集团股份有限公司 | 图像处理方法、处理装置和处理设备 |
CN109255758B (zh) * | 2018-07-13 | 2021-09-21 | 杭州电子科技大学 | 基于全1*1卷积神经网络的图像增强方法 |
CN110072051B (zh) * | 2019-04-09 | 2021-09-03 | Oppo广东移动通信有限公司 | 基于多帧图像的图像处理方法和装置 |
CN110072052B (zh) * | 2019-04-09 | 2021-08-27 | Oppo广东移动通信有限公司 | 基于多帧图像的图像处理方法、装置、电子设备 |
CN110222758B (zh) * | 2019-05-31 | 2024-04-23 | 腾讯科技(深圳)有限公司 | 一种图像处理方法、装置、设备及存储介质 |
-
2019
- 2019-09-18 CN CN201910882529.3A patent/CN112529775A/zh active Pending
-
2020
- 2020-07-22 EP EP20866370.8A patent/EP4024323A4/en active Pending
- 2020-07-22 WO PCT/CN2020/103377 patent/WO2021051996A1/zh unknown
-
2022
- 2022-03-18 US US17/698,698 patent/US20220207680A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108492265A (zh) * | 2018-03-16 | 2018-09-04 | 西安电子科技大学 | 基于gan的cfa图像去马赛克联合去噪方法 |
CN109410123A (zh) * | 2018-10-15 | 2019-03-01 | 深圳市能信安科技股份有限公司 | 基于深度学习的去除马赛克的方法、装置及电子设备 |
CN109889800A (zh) * | 2019-02-28 | 2019-06-14 | 深圳市商汤科技有限公司 | 图像增强方法和装置、电子设备、存储介质 |
CN109978788A (zh) * | 2019-03-25 | 2019-07-05 | 厦门美图之家科技有限公司 | 卷积神经网络生成方法、图像去马赛克方法及相关装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4024323A4 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113570532A (zh) * | 2021-07-28 | 2021-10-29 | Oppo广东移动通信有限公司 | 图像处理方法、装置、终端和可读存储介质 |
CN113570532B (zh) * | 2021-07-28 | 2024-04-12 | Oppo广东移动通信有限公司 | 图像处理方法、装置、终端和可读存储介质 |
WO2023125657A1 (zh) * | 2021-12-28 | 2023-07-06 | 维沃移动通信有限公司 | 图像处理方法、装置和电子设备 |
CN114549383A (zh) * | 2022-02-23 | 2022-05-27 | 浙江大华技术股份有限公司 | 一种基于深度学习的图像增强方法、装置、设备及介质 |
CN115115518A (zh) * | 2022-07-01 | 2022-09-27 | 腾讯科技(深圳)有限公司 | 高动态范围图像的生成方法、装置、设备、介质及产品 |
CN115115518B (zh) * | 2022-07-01 | 2024-04-09 | 腾讯科技(深圳)有限公司 | 高动态范围图像的生成方法、装置、设备、介质及产品 |
CN116485728A (zh) * | 2023-04-03 | 2023-07-25 | 东北石油大学 | 抽油杆表面缺陷检测方法及装置、电子设备和存储介质 |
CN116630220A (zh) * | 2023-07-25 | 2023-08-22 | 江苏美克医学技术有限公司 | 一种荧光图像景深融合成像方法、装置及存储介质 |
CN116630220B (zh) * | 2023-07-25 | 2023-11-21 | 江苏美克医学技术有限公司 | 一种荧光图像景深融合成像方法、装置及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP4024323A1 (en) | 2022-07-06 |
CN112529775A (zh) | 2021-03-19 |
US20220207680A1 (en) | 2022-06-30 |
EP4024323A4 (en) | 2023-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021051996A1 (zh) | 一种图像处理的方法和装置 | |
CN111418201B (zh) | 一种拍摄方法及设备 | |
US10916036B2 (en) | Method and system of generating multi-exposure camera statistics for image processing | |
JP6803982B2 (ja) | 光学撮像方法および装置 | |
EP2533520B1 (en) | Image sensor having HDR capture capability | |
US9633417B2 (en) | Image processing device and image capture device performing restoration processing using a restoration filter based on a point spread function | |
US9906732B2 (en) | Image processing device, image capture device, image processing method, and program | |
US20120002890A1 (en) | Alignment of digital images and local motion detection for high dynamic range (hdr) imaging | |
CN104380727B (zh) | 图像处理装置和图像处理方法 | |
CN113850367B (zh) | 网络模型的训练方法、图像处理方法及其相关设备 | |
CN116438804A (zh) | 帧处理和/或捕获指令系统及技术 | |
CN114693580B (zh) | 图像处理方法及其相关设备 | |
WO2020215180A1 (zh) | 图像处理方法、装置和电子设备 | |
KR20220064170A (ko) | 이미지 센서를 포함하는 전자 장치 및 그 동작 방법 | |
CN115802183B (zh) | 图像处理方法及其相关设备 | |
KR20210101941A (ko) | 전자 장치 및 그의 hdr 영상 생성 방법 | |
CN115706870B (zh) | 视频处理方法、装置、电子设备和存储介质 | |
WO2024174625A1 (zh) | 图像处理方法和电子设备 | |
CN114298889A (zh) | 图像处理电路和图像处理方法 | |
US12126910B2 (en) | Electronic device including image sensor and operating method thereof | |
US9288461B2 (en) | Apparatus and method for processing image, and computer-readable storage medium | |
CN115170554A (zh) | 图像检测方法和电子设备 | |
US20240292112A1 (en) | Image signal processor, operating method thereof, and application processor including the image signal processor | |
CN116051368B (zh) | 图像处理方法及其相关设备 | |
WO2023077938A1 (zh) | 生成视频帧的方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20866370 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020866370 Country of ref document: EP Effective date: 20220331 |